Re: Grails Cassandra plugin
great, I'm happy you found Hector useful :) btw, in hector 0.5.0-8 I added some interesting performance JMX counters so may be worth to update yours from 0.5.0-6 to -8 when you have time. On Fri, Mar 12, 2010 at 11:55 PM, Ned Wolpert ned.wolp...@imemories.comwrote: Document updated On Fri, Mar 12, 2010 at 2:50 PM, Jonathan Ellis jbel...@gmail.com wrote: Great! You should also link it from http://wiki.apache.org/cassandra/ClientExamples (click Login at the top to create an account.) On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert ned.wolp...@imemories.com wrote: Folks- I put together a quick n' dirty grails plugin for Cassandra, wrapped with Hector. Its available at http://github.com/wolpert/grails-cassandra in its initial state. I wouldn't call it 'production-ready' yet. :-) We're using Cassandra at work and I wanted an easy way to access Cassandra from a grails application, but couldn't find anything. I have some plans on how where I want it to go, but I'm open to suggestions. I'll submit the code to grails plugins once I get a bit further along with it. Its pretty basic at this point. -- Virtually, Ned Wolpert Settle thy studies, Faustus, and begin... --Marlowe -- Virtually, Ned Wolpert Settle thy studies, Faustus, and begin... --Marlowe
Re: MapReduce in Cassandra 0.6
fwiw, I read the instructions at contrib/word_count/README and it has like 3 manual steps, so using an embedded cassandra instance may simplify this into one single step and let the program do all setup and teardown it requires. http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ On Sun, Feb 28, 2010 at 2:52 AM, Jonathan Ellis jbel...@gmail.com wrote: There's an example in contrib/word_count, as mentioned in NEWS. Basic hadoop knowledge is assumed. :) Johan has been making fixes to the 0.6 branch that are not in beta2, so you will probably want to get that from svn. I've added to CHANGES in the 0.6 branch too, thanks for the heads up. -Jonathan On Sat, Feb 27, 2010 at 4:34 PM, Scott Delap sde...@riotgames.com wrote: I've seen rumblings that 0.6 supports map reduce. I don't see anything specific about this in the changelog for beta2. I looked at the dev and user mailing lists and found some older stuff about node based data retrieval. I also didn't see anything in JIRA explicitly about it. Is there anywhere I can find further information? Scott
Hector - a Java Cassandra client
I've written a java library for cassandra I've been using internally, would love to get your feedback and hope you find it useful. Blog post: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ Source: http://github.com/rantav/hector High level features: o A high-level object oriented interface to cassandra. In short: just some nicities around thrift. o Failover support. If a client is connected to one host in the ring and this host goes down, the client will automatically and transparently search for other available hosts to perform the operation before giving up. You may choose to FAIL_FAST (no retry, just fail if there are errors, nothing smart), ON_FAIL_TRY_ONE_NEXT_AVAILABLE (try one more host before giving up) or ON_FAIL_TRY_ALL_AVAILABLE (try all available hosts before giving up). o Connection pooling. Needless to say, it's a must. o JMX support. Hector exposes JMX for many runtime metrics, such as number of available connections, idle connections, error statistics etc. o Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.
Re: Hector - a Java Cassandra client
it supports supercolumns, yes although I personally have only used regular columns so far (you can see the unit tests here http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java, search for super) On Tue, Feb 23, 2010 at 4:25 PM, Richard Grossman rich...@bee.tv wrote: Hi Ran, Is it support operation on super column ? Thanks On Tue, Feb 23, 2010 at 4:13 PM, Ran Tavory ran...@gmail.com wrote: I've written a java library for cassandra I've been using internally, would love to get your feedback and hope you find it useful. Blog post: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ Source: http://github.com/rantav/hector High level features: o A high-level object oriented interface to cassandra. In short: just some nicities around thrift. o Failover support. If a client is connected to one host in the ring and this host goes down, the client will automatically and transparently search for other available hosts to perform the operation before giving up. You may choose to FAIL_FAST (no retry, just fail if there are errors, nothing smart), ON_FAIL_TRY_ONE_NEXT_AVAILABLE (try one more host before giving up) or ON_FAIL_TRY_ALL_AVAILABLE (try all available hosts before giving up). o Connection pooling. Needless to say, it's a must. o JMX support. Hector exposes JMX for many runtime metrics, such as number of available connections, idle connections, error statistics etc. o Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.
Re: StackOverflowError on high load
This sort of explain this, yes, but what solution can I use? I do see the OPP writes go faster than the RP, so this makes sense that when using the OPP there's higher chance that a host will fall behind with compaction and eventually crash. It's not a nice feature, but hopefully there are mitigations to this. So my question is - what are the mitigations? What should I tell my admin to do in order to prevent this? Telling him increase the directory size 2x isn't going to cut it as the directory just keeps growing and is not bound... I'm also no clear whether CASSANDRA-804 is going to be a real fix. Thanks On Sat, Feb 20, 2010 at 9:36 PM, Jonathan Ellis jbel...@gmail.com wrote: if OPP is configured w/ imbalanced ranges (or less balanced than RP) then that would explain it. OPP is actually slightly faster in terms of raw speed. On Sat, Feb 20, 2010 at 2:31 PM, Ran Tavory ran...@gmail.com wrote: interestingly, I ran the same load but this time with a random partitioner and, although from time to time test2 was a little behind with its compaction task, it did not crash and was able to eventually close the gaps that were opened. Does this make sense? Is there a reason why random partitioner is less likely to be faulty in this scenario? The scenario is of about 1300 writes/sec of small amounts of data to a single CF on a cluster with two nodes and no replication. With the order-preserving-partitioner after a few hours of load the compaction pool is behind on one of the hosts and eventually this host crashes, but with the random partitioner it doesn't crash. thanks On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis jbel...@gmail.com wrote: looks like test1 started gc storming, so test2 treats it as dead and starts doing hinted handoff for it, which increases test2's load, even though test1 is not completely dead yet. On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory ran...@gmail.com wrote: I found another interesting graph, attached. I looked at the write-count and write-latency of the CF I'm writing to and I see a few interesting things: 1. the host test2 crashed at 18:00 2. At 16:00, after a few hours of load both hosts dropped their write-count. test1 (which did not crash) started slowing down first and then test2 slowed. 3. At 16:00 I start seeing high write-latency on test2 only. This takes about 2h until finally at 18:00 it crashes. Does this help? On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory ran...@gmail.com wrote: I ran the process again and after a few hours the same node crashed the same way. Now I can tell for sure this is indeed what Jonathan proposed - the data directory needs to be 2x of what it is, but it looks like a design problem, how large to I need to tell my admin to set it then? Here's what I see when the server crashes: $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 46G 47G 50% /outbrain/cassandra/data The directory is 97G and when the host crashes it's at 50% use. I'm also monitoring various JMX counters and I see that COMPACTION-POOL PendingTasks grows for a while on this host (not on the other host, btw, which is fine, just this host) and then flats for 3 hours. After 3 hours of flat it crashes. I'm attaching the graph. When I restart cassandra on this host (not changed file allocation size, just restart) it does manage to compact the data files pretty fast, so after a minute I get 12% use, so I wonder what made it crash before that doesn't now? (could be the load that's not running now) $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data The question is what size does the data directory need to be? It's not 2x the size of the data I expect to have (I only have 11G of real data after compaction and the dir is 97G, so it should have been enough). If it's 2x of something dynamic that keeps growing and isn't bound then it'll just grow infinitely, right? What's the bound? Alternatively, what jmx counter thresholds are the best indicators for the crash that's about to happen? Thanks On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta tsalora...@gmail.com wrote: On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory ran...@gmail.com wrote: If it's the data directory, then I have a pretty big one. Maybe it's something else $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data Perhaps a temporary file? JVM defaults to /tmp, which may be on a smaller (root) partition? -+ Tatu +-
Re: StackOverflowError on high load
interestingly, I ran the same load but this time with a random partitioner and, although from time to time test2 was a little behind with its compaction task, it did not crash and was able to eventually close the gaps that were opened. Does this make sense? Is there a reason why random partitioner is less likely to be faulty in this scenario? The scenario is of about 1300 writes/sec of small amounts of data to a single CF on a cluster with two nodes and no replication. With the order-preserving-partitioner after a few hours of load the compaction pool is behind on one of the hosts and eventually this host crashes, but with the random partitioner it doesn't crash. thanks On Sat, Feb 20, 2010 at 6:27 AM, Jonathan Ellis jbel...@gmail.com wrote: looks like test1 started gc storming, so test2 treats it as dead and starts doing hinted handoff for it, which increases test2's load, even though test1 is not completely dead yet. On Thu, Feb 18, 2010 at 1:16 AM, Ran Tavory ran...@gmail.com wrote: I found another interesting graph, attached. I looked at the write-count and write-latency of the CF I'm writing to and I see a few interesting things: 1. the host test2 crashed at 18:00 2. At 16:00, after a few hours of load both hosts dropped their write-count. test1 (which did not crash) started slowing down first and then test2 slowed. 3. At 16:00 I start seeing high write-latency on test2 only. This takes about 2h until finally at 18:00 it crashes. Does this help? On Thu, Feb 18, 2010 at 7:44 AM, Ran Tavory ran...@gmail.com wrote: I ran the process again and after a few hours the same node crashed the same way. Now I can tell for sure this is indeed what Jonathan proposed - the data directory needs to be 2x of what it is, but it looks like a design problem, how large to I need to tell my admin to set it then? Here's what I see when the server crashes: $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 46G 47G 50% /outbrain/cassandra/data The directory is 97G and when the host crashes it's at 50% use. I'm also monitoring various JMX counters and I see that COMPACTION-POOL PendingTasks grows for a while on this host (not on the other host, btw, which is fine, just this host) and then flats for 3 hours. After 3 hours of flat it crashes. I'm attaching the graph. When I restart cassandra on this host (not changed file allocation size, just restart) it does manage to compact the data files pretty fast, so after a minute I get 12% use, so I wonder what made it crash before that doesn't now? (could be the load that's not running now) $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data The question is what size does the data directory need to be? It's not 2x the size of the data I expect to have (I only have 11G of real data after compaction and the dir is 97G, so it should have been enough). If it's 2x of something dynamic that keeps growing and isn't bound then it'll just grow infinitely, right? What's the bound? Alternatively, what jmx counter thresholds are the best indicators for the crash that's about to happen? Thanks On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta tsalora...@gmail.com wrote: On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory ran...@gmail.com wrote: If it's the data directory, then I have a pretty big one. Maybe it's something else $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data Perhaps a temporary file? JVM defaults to /tmp, which may be on a smaller (root) partition? -+ Tatu +-
StackOverflowError on high load
I'm running some high load writes on a pair of cassandra hosts using an OrderPresenrvingPartitioner and ran into the following error after which one of the hosts killed itself. Has anyone seen it and can advice? (cassandra v0.5.0) ERROR [HINTED-HANDOFF-POOL:1] 2010-02-17 04:50:09,602 CassandraDaemon.java (line 71) Fatal exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main] java.lang.StackOverflowError at sun.nio.cs.UTF_8$Encoder.encodeArrayLoop(UTF_8.java:341) at sun.nio.cs.UTF_8$Encoder.encodeLoop(UTF_8.java:447) at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:544) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:240) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at java.io.UnixFileSystem.getSpace(Native Method) at java.io.File.getUsableSpace(File.java:1660) at org.apache.cassandra.config.DatabaseDescriptor.getDataFileLocationForTable(DatabaseDescriptor.java:891) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:876) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) ... at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) INFO [ROW-MUTATION-STAGE:28] 2010-02-17 04:50:53,230 ColumnFamilyStore.java (line 393) DocumentMapping has reached its threshold; switching in a fresh Memtable INFO [ROW-MUTATION-STAGE:28] 2010-02-17 04:50:53,230 ColumnFamilyStore.java (line 1035) Enqueuing flush of Memtable(DocumentMapping)@122980220 INFO [FLUSH-SORTER-POOL:1] 2010-02-17 04:50:53,230 Memtable.java (line 183) Sorting Memtable(DocumentMapping)@122980220 INFO [FLUSH-WRITER-POOL:1] 2010-02-17 04:50:53,386 Memtable.java (line 192) Writing Memtable(DocumentMapping)@122980220 ERROR [FLUSH-WRITER-POOL:1] 2010-02-17 04:50:54,010 DebuggableThreadPoolExecutor.java (line 162) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: No space left on device at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:154) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.IOException: No space left on device at org.apache.cassandra.db.ColumnFamilyStore$3$1.run(ColumnFamilyStore.java:1060) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.write(Native Method) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilter.java:158) at org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilter.java:153) at org.apache.cassandra.io.SSTableWriter.closeAndOpenReader(SSTableWriter.java:123) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:207) at org.apache.cassandra.db.ColumnFamilyStore$3$1.run(ColumnFamilyStore.java:1056) ... 6 more
Re: StackOverflowError on high load
Are we talking about the CommitLogDirectory that needs to be up 2x? So it needs to be 2x of what? Did I miss this in the config file somewhere? On Wed, Feb 17, 2010 at 3:52 PM, Jonathan Ellis jbel...@gmail.com wrote: you temporarily need up to 2x your current space used to perform compactions. disk too full is almost certainly actually the problem. created https://issues.apache.org/jira/browse/CASSANDRA-804 to fix this. On Wed, Feb 17, 2010 at 5:59 AM, Ran Tavory ran...@gmail.com wrote: no, that's not it, disk isn't full. After restarting the server I can write again. Still, however, this error is troubling... On Wed, Feb 17, 2010 at 12:24 PM, ruslan usifov ruslan.usi...@gmail.com wrote: I think that you have not enough room for your data. run df -h to see that one of your discs is full 2010/2/17 Ran Tavory ran...@gmail.com I'm running some high load writes on a pair of cassandra hosts using an OrderPresenrvingPartitioner and ran into the following error after which one of the hosts killed itself. Has anyone seen it and can advice? (cassandra v0.5.0) ERROR [HINTED-HANDOFF-POOL:1] 2010-02-17 04:50:09,602 CassandraDaemon.java (line 71) Fatal exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main] java.lang.StackOverflowError at sun.nio.cs.UTF_8$Encoder.encodeArrayLoop(UTF_8.java:341) at sun.nio.cs.UTF_8$Encoder.encodeLoop(UTF_8.java:447) at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:544) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:240) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at java.io.UnixFileSystem.getSpace(Native Method) at java.io.File.getUsableSpace(File.java:1660) at org.apache.cassandra.config.DatabaseDescriptor.getDataFileLocationForTable(DatabaseDescriptor.java:891) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:876) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) ... at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:884) INFO [ROW-MUTATION-STAGE:28] 2010-02-17 04:50:53,230 ColumnFamilyStore.java (line 393) DocumentMapping has reached its threshold; switching in a fresh Memtable INFO [ROW-MUTATION-STAGE:28] 2010-02-17 04:50:53,230 ColumnFamilyStore.java (line 1035) Enqueuing flush of Memtable(DocumentMapping)@122980220 INFO [FLUSH-SORTER-POOL:1] 2010-02-17 04:50:53,230 Memtable.java (line 183) Sorting Memtable(DocumentMapping)@122980220 INFO [FLUSH-WRITER-POOL:1] 2010-02-17 04:50:53,386 Memtable.java (line 192) Writing Memtable(DocumentMapping)@122980220 ERROR [FLUSH-WRITER-POOL:1] 2010-02-17 04:50:54,010 DebuggableThreadPoolExecutor.java (line 162) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: No space left on device at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:154) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.IOException: No space left on device at org.apache.cassandra.db.ColumnFamilyStore$3$1.run(ColumnFamilyStore.java:1060) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
Re: StackOverflowError on high load
I ran the process again and after a few hours the same node crashed the same way. Now I can tell for sure this is indeed what Jonathan proposed - the data directory needs to be 2x of what it is, but it looks like a design problem, how large to I need to tell my admin to set it then? Here's what I see when the server crashes: $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 46G 47G 50% /outbrain/cassandra/data The directory is 97G and when the host crashes it's at 50% use. I'm also monitoring various JMX counters and I see that COMPACTION-POOL PendingTasks grows for a while on this host (not on the other host, btw, which is fine, just this host) and then flats for 3 hours. After 3 hours of flat it crashes. I'm attaching the graph. When I restart cassandra on this host (not changed file allocation size, just restart) it does manage to compact the data files pretty fast, so after a minute I get 12% use, so I wonder what made it crash before that doesn't now? (could be the load that's not running now) $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data The question is what size does the data directory need to be? It's not 2x the size of the data I expect to have (I only have 11G of real data after compaction and the dir is 97G, so it should have been enough). If it's 2x of something dynamic that keeps growing and isn't bound then it'll just grow infinitely, right? What's the bound? Alternatively, what jmx counter thresholds are the best indicators for the crash that's about to happen? Thanks On Wed, Feb 17, 2010 at 9:00 PM, Tatu Saloranta tsalora...@gmail.comwrote: On Wed, Feb 17, 2010 at 6:40 AM, Ran Tavory ran...@gmail.com wrote: If it's the data directory, then I have a pretty big one. Maybe it's something else $ df -h /outbrain/cassandra/data/ FilesystemSize Used Avail Use% Mounted on /dev/mapper/cassandra-data 97G 11G 82G 12% /outbrain/cassandra/data Perhaps a temporary file? JVM defaults to /tmp, which may be on a smaller (root) partition? -+ Tatu +- attachment: Zenoss_ test2.nydc1.outbrain.com.png
Re: How to unit test my code calling Cassandra with Thift
I've committed to trunk all the required code and posted about it, hope you find it useful http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ On Sun, Jan 24, 2010 at 12:20 PM, Richard Grossman richie...@gmail.comwrote: Great Ran, I think I've missed the .setDaemon to keep the server alive. Thanks Richard On Sun, Jan 24, 2010 at 12:02 PM, Ran Tavory ran...@gmail.com wrote: Here's the code I've just written over the weekend and started using in test: package com.outbrain.data.cassandra.service; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.service.CassandraDaemon; import org.apache.cassandra.utils.FileUtils; import org.apache.thrift.transport.TTransportException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * An in-memory cassandra storage service that listens to the thrift interface. * Useful for unit testing, * * @author Ran Tavory (r...@outbain.com) * */ public class InProcessCassandraServer implements Runnable { private static final Logger log = LoggerFactory.getLogger(InProcessCassandraServer.class); CassandraDaemon cassandraDaemon; public void init() { try { prepare(); } catch (IOException e) { log.error(Cannot prepare cassandra., e); } try { cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); } catch (TTransportException e) { log.error(TTransportException, e); } catch (IOException e) { log.error(IOException, e); } } @Override public void run() { cassandraDaemon.start(); } public void stop() { cassandraDaemon.stop(); rmdir(tmp); } /** * Creates all files and directories needed * @throws IOException */ private void prepare() throws IOException { // delete tmp dir first rmdir(tmp); // make a tmp dir and copy storag-conf.xml and log4j.properties to it copy(/cassandra/storage-conf.xml, tmp); copy(/cassandra/log4j.properties, tmp); System.setProperty(storage-config, tmp); // make cassandra directories. for (String s: DatabaseDescriptor.getAllDataFileLocations()) { mkdir(s); } mkdir(DatabaseDescriptor.getBootstrapFileLocation()); mkdir(DatabaseDescriptor.getLogFileLocation()); } /** * Copies a resource from within the jar to a directory. * * @param resourceName * @param directory * @throws IOException */ private void copy(String resource, String directory) throws IOException { mkdir(directory); InputStream is = getClass().getResourceAsStream(resource); String fileName = resource.substring(resource.lastIndexOf(/) + 1); File file = new File(directory + System.getProperty(file.separator) + fileName); OutputStream out = new FileOutputStream(file); byte buf[] = new byte[1024]; int len; while ((len = is.read(buf)) 0) { out.write(buf, 0, len); } out.close(); is.close(); } /** * Creates a directory * @param dir * @throws IOException */ private void mkdir(String dir) throws IOException { FileUtils.createDirectory(dir); } /** * Removes a directory from file system * @param dir */ private void rmdir(String dir) { FileUtils.deleteDir(new File(dir)); } } And in the test class: public class XxxTest { private static InProcessCassandraServer cassandra; @BeforeClass public static void setup() throws TTransportException, IOException, InterruptedException { cassandra = new InProcessCassandraServer(); cassandra.init(); Thread t = new Thread(cassandra); t.setDaemon(true); t.start(); } @AfterClass public static void shutdown() { cassandra.stop(); } ... test } Now you can connect to localhost:9160. Assumptions: The code assumes you have two files in your classpath: /cassandra/stogage-config.xml and /cassandra/log4j.xml. This is convenient if you use maven, just throw them at /src/test/resources/cassandra/ If you don't work with maven or would like to configure the configuration files differently it should be fairly easy, just change the prepare() method. On Sun, Jan 24, 2010 at 10:54 AM, Richard Grossman richie...@gmail.comwrote: So Is there anybody ? Unit testing is important people ... Thanks On Thu, Jan 21, 2010 at 12:09 PM, Richard Grossman richie...@gmail.comwrote: Here is the code I use class startServer implements Runnable { @Override public void run() { try { CassandraDaemon cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); cassandraDaemon.start(); } catch (TTransportException e
Re: How to unit test my code calling Cassandra with Thift
Here's the code I've just written over the weekend and started using in test: package com.outbrain.data.cassandra.service; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.service.CassandraDaemon; import org.apache.cassandra.utils.FileUtils; import org.apache.thrift.transport.TTransportException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * An in-memory cassandra storage service that listens to the thrift interface. * Useful for unit testing, * * @author Ran Tavory (r...@outbain.com) * */ public class InProcessCassandraServer implements Runnable { private static final Logger log = LoggerFactory.getLogger(InProcessCassandraServer.class); CassandraDaemon cassandraDaemon; public void init() { try { prepare(); } catch (IOException e) { log.error(Cannot prepare cassandra., e); } try { cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); } catch (TTransportException e) { log.error(TTransportException, e); } catch (IOException e) { log.error(IOException, e); } } @Override public void run() { cassandraDaemon.start(); } public void stop() { cassandraDaemon.stop(); rmdir(tmp); } /** * Creates all files and directories needed * @throws IOException */ private void prepare() throws IOException { // delete tmp dir first rmdir(tmp); // make a tmp dir and copy storag-conf.xml and log4j.properties to it copy(/cassandra/storage-conf.xml, tmp); copy(/cassandra/log4j.properties, tmp); System.setProperty(storage-config, tmp); // make cassandra directories. for (String s: DatabaseDescriptor.getAllDataFileLocations()) { mkdir(s); } mkdir(DatabaseDescriptor.getBootstrapFileLocation()); mkdir(DatabaseDescriptor.getLogFileLocation()); } /** * Copies a resource from within the jar to a directory. * * @param resourceName * @param directory * @throws IOException */ private void copy(String resource, String directory) throws IOException { mkdir(directory); InputStream is = getClass().getResourceAsStream(resource); String fileName = resource.substring(resource.lastIndexOf(/) + 1); File file = new File(directory + System.getProperty(file.separator) + fileName); OutputStream out = new FileOutputStream(file); byte buf[] = new byte[1024]; int len; while ((len = is.read(buf)) 0) { out.write(buf, 0, len); } out.close(); is.close(); } /** * Creates a directory * @param dir * @throws IOException */ private void mkdir(String dir) throws IOException { FileUtils.createDirectory(dir); } /** * Removes a directory from file system * @param dir */ private void rmdir(String dir) { FileUtils.deleteDir(new File(dir)); } } And in the test class: public class XxxTest { private static InProcessCassandraServer cassandra; @BeforeClass public static void setup() throws TTransportException, IOException, InterruptedException { cassandra = new InProcessCassandraServer(); cassandra.init(); Thread t = new Thread(cassandra); t.setDaemon(true); t.start(); } @AfterClass public static void shutdown() { cassandra.stop(); } ... test } Now you can connect to localhost:9160. Assumptions: The code assumes you have two files in your classpath: /cassandra/stogage-config.xml and /cassandra/log4j.xml. This is convenient if you use maven, just throw them at /src/test/resources/cassandra/ If you don't work with maven or would like to configure the configuration files differently it should be fairly easy, just change the prepare() method. On Sun, Jan 24, 2010 at 10:54 AM, Richard Grossman richie...@gmail.comwrote: So Is there anybody ? Unit testing is important people ... Thanks On Thu, Jan 21, 2010 at 12:09 PM, Richard Grossman richie...@gmail.comwrote: Here is the code I use class startServer implements Runnable { @Override public void run() { try { CassandraDaemon cassandraDaemon = new CassandraDaemon(); cassandraDaemon.init(null); cassandraDaemon.start(); } catch (TTransportException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } Thread thread = new Thread(new startServer()); thread.start(); the code to test here On Thu, Jan 21, 2010 at 12:08 PM, Richard Grossman richie...@gmail.comwrote: Yes I've seen this and also check it but if I start the server then it block the current thread I can
Re: Cassandra guarantees reads and writes to be atomic within a single ColumnFamily.
Thanks, so maybe to rephrase: Cassandra guarantees reads and writes to be atomic within a single row. But this isn't saying much... so maybe just take it off... On Thu, Jan 14, 2010 at 12:40 AM, Jonathan Ellis jbel...@gmail.com wrote: It's correct, if understood correctly. We should probably just remove it since it's confusing as written. What it means is, if a write for a given row is acked, eventually, _all_ the data updated _in that row_ will be available for reads. So no, it's not atomic at the batch_mutate level but at the listColumnOrSuperColumn level. -Jonathan On Mon, Jan 11, 2010 at 3:01 PM, Ran Tavory ran...@gmail.com wrote: The front page http://incubator.apache.org/cassandra/ states that Cassandra guarantees reads and writes to be atomic within a single ColumnFamily. What exactly does that mean, and where can I learn more about this? It sounds like it means that batch_insert() and batch_mutate() for two different rows but in the same CF is atomic. Is this correct?
Cassandra guarantees reads and writes to be atomic within a single ColumnFamily.
The front page http://incubator.apache.org/cassandra/ states that Cassandra guarantees reads and writes to be atomic within a single ColumnFamily. What exactly does that mean, and where can I learn more about this? It sounds like it means that batch_insert() and batch_mutate() for two different rows but in the same CF is atomic. Is this correct?
Re: MultiThread Client problem with thrift
Would connection pooling work for you? This Java client http://code.google.com/p/cassandra-java-client/ has connection pooling. I haven't put the client under stress yet so I can't testify, but this may be a good solution for you On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote: I agree it's solve my problem but can give a bigger one. The problem is I can't succeed to prevent opening a lot of connection On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.com wrote: Hi, I don't know the particulars of java implementation, but if it works the same way as Unix native socket API, then I would not recommend setting linger to zero. SO_LINGER option with zero value will cause TCP connection to be aborted immediately as soon as the socket is closed. That is, (1) remaining data in the send buffer will be discarded, (2) no proper disconnect handshake and (3) receiving end will get TCP reset. Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and is there to avoid packets from old connection being delivered to new incarnation of the connection. Instead of avoiding the state, the application should be changed so that TIME_WAIT will not be a problem. How many open files you can see when the exception happens? Might be that you're out of file descriptors. -Jaakko On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com wrote: Hi To all is interesting I've found a solution seems not recommended but working. When opening a Socket set this: tSocket.getSocket().setReuseAddress(true); tSocket.getSocket().setSoLinger(true, 0); it's prevent to have a lot of connection TIME_WAIT state but not recommended.
Re: MultiThread Client problem with thrift
I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it, but I haven't done so myself, I'm using 0.4.2 On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.comwrote: Yes of course but do you have updated to cassandra 0.5.0-beta2 ? On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote: Would connection pooling work for you? This Java client http://code.google.com/p/cassandra-java-client/ has connection pooling. I haven't put the client under stress yet so I can't testify, but this may be a good solution for you On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote: I agree it's solve my problem but can give a bigger one. The problem is I can't succeed to prevent opening a lot of connection On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.comwrote: Hi, I don't know the particulars of java implementation, but if it works the same way as Unix native socket API, then I would not recommend setting linger to zero. SO_LINGER option with zero value will cause TCP connection to be aborted immediately as soon as the socket is closed. That is, (1) remaining data in the send buffer will be discarded, (2) no proper disconnect handshake and (3) receiving end will get TCP reset. Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and is there to avoid packets from old connection being delivered to new incarnation of the connection. Instead of avoiding the state, the application should be changed so that TIME_WAIT will not be a problem. How many open files you can see when the exception happens? Might be that you're out of file descriptors. -Jaakko On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com wrote: Hi To all is interesting I've found a solution seems not recommended but working. When opening a Socket set this: tSocket.getSocket().setReuseAddress(true); tSocket.getSocket().setSoLinger(true, 0); it's prevent to have a lot of connection TIME_WAIT state but not recommended.
Re: MultiThread Client problem with thrift
Not at expert in this field, but I think what you want is use a connection pool and NOT close the connections - reuse them. Only idle connections are released after, say 1sec. Also, with a connection pool it's easy to throttle the application, you can tell the pool to block if all 50 connections, or how many you define are allowed. On Tue, Dec 22, 2009 at 4:01 PM, Richard Grossman richie...@gmail.comwrote: So I can't use it. But I've make my own connection pool. This are not fix nothing because the problem is lower than even java. In fact the socket is closed and java consider it as close but the system keep the Socket in the state TIME_WAIT. Then the port used is actually still in use. So my question is that is there people that manage to open multiple connection and ride off the TIME_WAIT. No matter in which language PHP or Python etc... Thanks On Tue, Dec 22, 2009 at 2:55 PM, Ran Tavory ran...@gmail.com wrote: I don't have a 0.5.0-beta2 version, no. It's not too difficult to add it, but I haven't done so myself, I'm using 0.4.2 On Tue, Dec 22, 2009 at 2:42 PM, Richard Grossman richie...@gmail.comwrote: Yes of course but do you have updated to cassandra 0.5.0-beta2 ? On Tue, Dec 22, 2009 at 2:30 PM, Ran Tavory ran...@gmail.com wrote: Would connection pooling work for you? This Java client http://code.google.com/p/cassandra-java-client/ has connection pooling. I haven't put the client under stress yet so I can't testify, but this may be a good solution for you On Tue, Dec 22, 2009 at 2:22 PM, Richard Grossman richie...@gmail.comwrote: I agree it's solve my problem but can give a bigger one. The problem is I can't succeed to prevent opening a lot of connection On Tue, Dec 22, 2009 at 1:51 PM, Jaakko rosvopaalli...@gmail.comwrote: Hi, I don't know the particulars of java implementation, but if it works the same way as Unix native socket API, then I would not recommend setting linger to zero. SO_LINGER option with zero value will cause TCP connection to be aborted immediately as soon as the socket is closed. That is, (1) remaining data in the send buffer will be discarded, (2) no proper disconnect handshake and (3) receiving end will get TCP reset. Sure this will avoid TIME_WAIT state, but TIME_WAIT is our friend and is there to avoid packets from old connection being delivered to new incarnation of the connection. Instead of avoiding the state, the application should be changed so that TIME_WAIT will not be a problem. How many open files you can see when the exception happens? Might be that you're out of file descriptors. -Jaakko On Tue, Dec 22, 2009 at 8:17 PM, Richard Grossman richie...@gmail.com wrote: Hi To all is interesting I've found a solution seems not recommended but working. When opening a Socket set this: tSocket.getSocket().setReuseAddress(true); tSocket.getSocket().setSoLinger(true, 0); it's prevent to have a lot of connection TIME_WAIT state but not recommended.
Images store in Cassandra
As we're designing our systems for a move from mysql to Cassandra we're considering moving our file storage to Cassandra as well. Is this wise? We're currently using mogilefs to store media items (images) of average size of 30Mb (400k images, and growing). Cassandra looks like a performance improvement over mogilefs (saves roundtrip, no sql in the middle) but I was wondering whether the fact that cassandra stores byte arrays should encourage us to store images in it. Is Cassandra a good fit? Has anyone had any similar experience or can send guidelines? To phrase the question in more general terms: What's cassandra's sweet spot in terms of Value size per column or total row size? Thanks
Re: vector clocks?
So, currently, is Cassandra using the client provided timestamps for conflict resolution? (Or something else?) Do clients have insight to conflicts Cassandra cannot resolve (assuming it tries to resolve them)? What's the semantic of eventual consistency in Cassandra's case then? On Mon, Dec 7, 2009 at 12:34 AM, Kelvin Kakugawa kakug...@gmail.com wrote: Cassandra, right now, doesn't use vector clocks internally. However, an implementation is being worked on, here: https://issues.apache.org/jira/browse/CASSANDRA-580 -Kelvin On Sun, Dec 6, 2009 at 10:41 AM, Ran Tavory ran...@gmail.com wrote: As a Cassandra newbe, after having read the Dynamo paper, I was wondering - how does Cassandra use timestamps? - Does it use them internally to resolve conflicts? - Does it expose vector clocks to clients when internal conflict resolution fails? Thanks
vector clocks?
As a Cassandra newbe, after having read the Dynamo paper, I was wondering - how does Cassandra use timestamps? - Does it use them internally to resolve conflicts? - Does it expose vector clocks to clients when internal conflict resolution fails? Thanks