Re: Overwhelming a cluster with writes?
I'm running the nodes with a JVM heap size of 6GB, and here are the related options from my storage-conf.xml. As mentioned in the first email, I left everything at the default value. I briefly googled around for Cassandra performance tuning etc but haven't found a definitive guide ... any help with tuning these parameters is greatly appreciated! DiskAccessModeauto/DiskAccessMode RowWarningThresholdInMB512/RowWarningThresholdInMB SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB64/MemtableThroughputInMB BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB MemtableOperationsInMillions0.3/MemtableOperationsInMillions MemtableFlushAfterMinutes60/MemtableFlushAfterMinutes ConcurrentReads8/ConcurrentReads ConcurrentWrites64/ConcurrentWrites CommitLogSyncperiodic/CommitLogSync CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS GCGraceSeconds864000/GCGraceSeconds -- Ilya On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman shulm...@gmail.com wrote: You are running out of memory on your nodes. Before the final crash your nodes are probably slow due to GC. What is your memtable size? What cache options did you configure? On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov ivmay...@gmail.com wrote: Hi all, I've just started experimenting with Cassandra to get a feel for the system. I've set up a test cluster and to get a ballpark idea of its performance I wrote a simple tool to load some toy data into the system. Surprisingly, I am able to overwhelm my 4-node cluster with writes from a single client. I'm trying to figure out if this is a problem with my setup, if I'm hitting bugs in the Cassandra codebase, or if this is intended behavior. Sorry this email is kind of long, here is the TLDR version: While writing to Cassandra from a single node, I am able to get the cluster into a bad state, where nodes are randomly disconnecting from each other, write performance plummets, and sometimes nodes even crash. Further, the nodes do not recover as long as the writes continue (even at a much lower rate), and sometimes do not recover at all unless I restart them. I can get this to happen simply by throwing data at the cluster fast enough, and I'm wondering if this is a known issue or if I need to tweak my setup. Now, the details. First, a little bit about the setup: 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched in. Node specs: 8-core Intel Xeon e5...@2.00ghz 8GB RAM 1Gbit ethernet Red Hat Linux 2.6.18 JVM 1.6.0_19 64-bit 1TB spinning disk houses both commitlog and data directories (which I know is not ideal). The client machine is on the same local network and has very similar specs. The cassandra nodes are started with the following JVM options: ./cassandra JVM_OPTS=-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC I'm using default settings for all of the tunable stuff at the bottom of storage-conf.xml. I also selected my initial tokens to evenly partition the key space when the cluster was bootstrapped. I am using the RandomPartitioner. Now, about the test. Basically I am trying to get an idea of just how fast I can make this thing go. I am writing ~250M data records into the cluster, replicated at 3x, using Ran Tavory's Hector client (Java), writing with ConsistencyLevel.ZERO and FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8 threads talking to each of the 4 nodes in the cluster. Records are identified by a numeric id, and I'm writing them in batches of up to 10k records per row, with each record in its own column. The row key identifies the bucket into which records fall. So, records with ids 0 - are written to row 0, 1 - 1 are written to row 1, etc. Each record is a JSON object with ~10-20 fields. Records: { // Column Family 0 : { // row key for the start of the bucket. Buckets span a range of up to 1 records 1 : { /* some JSON */ }, // Column for record with id=1 3 : { /* some more JSON */ }, // Column for record with id=3 ... : { /* ... */ } }, 1 : { // row key for the start of the next bucket 10001 : ... 10004 : } I am reading the data out of a local, sorted file on the client, so I only write a row to Cassandra once all records for that row have been read, and each row is written to exactly once. I'm using a producer-consumer queue to pump data from the input reader thread to the output writer threads. I found that I have to throttle the reader thread heavily in order to get good behavior. So, if I make the reader sleep for 7 seconds every 1M records, everything is fine - the data loads in about an hour, half of which
Re: Overwhelming a cluster with writes?
Do you see one of the disks used by cassandra filled up when a node crashes? On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov ivmay...@gmail.com wrote: I'm running the nodes with a JVM heap size of 6GB, and here are the related options from my storage-conf.xml. As mentioned in the first email, I left everything at the default value. I briefly googled around for Cassandra performance tuning etc but haven't found a definitive guide ... any help with tuning these parameters is greatly appreciated! DiskAccessModeauto/DiskAccessMode RowWarningThresholdInMB512/RowWarningThresholdInMB SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB64/MemtableThroughputInMB BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB MemtableOperationsInMillions0.3/MemtableOperationsInMillions MemtableFlushAfterMinutes60/MemtableFlushAfterMinutes ConcurrentReads8/ConcurrentReads ConcurrentWrites64/ConcurrentWrites CommitLogSyncperiodic/CommitLogSync CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS GCGraceSeconds864000/GCGraceSeconds -- Ilya On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman shulm...@gmail.com wrote: You are running out of memory on your nodes. Before the final crash your nodes are probably slow due to GC. What is your memtable size? What cache options did you configure? On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov ivmay...@gmail.com wrote: Hi all, I've just started experimenting with Cassandra to get a feel for the system. I've set up a test cluster and to get a ballpark idea of its performance I wrote a simple tool to load some toy data into the system. Surprisingly, I am able to overwhelm my 4-node cluster with writes from a single client. I'm trying to figure out if this is a problem with my setup, if I'm hitting bugs in the Cassandra codebase, or if this is intended behavior. Sorry this email is kind of long, here is the TLDR version: While writing to Cassandra from a single node, I am able to get the cluster into a bad state, where nodes are randomly disconnecting from each other, write performance plummets, and sometimes nodes even crash. Further, the nodes do not recover as long as the writes continue (even at a much lower rate), and sometimes do not recover at all unless I restart them. I can get this to happen simply by throwing data at the cluster fast enough, and I'm wondering if this is a known issue or if I need to tweak my setup. Now, the details. First, a little bit about the setup: 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched in. Node specs: 8-core Intel Xeon e5...@2.00ghz 8GB RAM 1Gbit ethernet Red Hat Linux 2.6.18 JVM 1.6.0_19 64-bit 1TB spinning disk houses both commitlog and data directories (which I know is not ideal). The client machine is on the same local network and has very similar specs. The cassandra nodes are started with the following JVM options: ./cassandra JVM_OPTS=-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC I'm using default settings for all of the tunable stuff at the bottom of storage-conf.xml. I also selected my initial tokens to evenly partition the key space when the cluster was bootstrapped. I am using the RandomPartitioner. Now, about the test. Basically I am trying to get an idea of just how fast I can make this thing go. I am writing ~250M data records into the cluster, replicated at 3x, using Ran Tavory's Hector client (Java), writing with ConsistencyLevel.ZERO and FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8 threads talking to each of the 4 nodes in the cluster. Records are identified by a numeric id, and I'm writing them in batches of up to 10k records per row, with each record in its own column. The row key identifies the bucket into which records fall. So, records with ids 0 - are written to row 0, 1 - 1 are written to row 1, etc. Each record is a JSON object with ~10-20 fields. Records: { // Column Family 0 : { // row key for the start of the bucket. Buckets span a range of up to 1 records 1 : { /* some JSON */ }, // Column for record with id=1 3 : { /* some more JSON */ }, // Column for record with id=3 ... : { /* ... */ } }, 1 : { // row key for the start of the next bucket 10001 : ... 10004 : } I am reading the data out of a local, sorted file on the client, so I only write a row to Cassandra once all records for that row have been read, and each row is written to exactly once. I'm using a producer-consumer queue to pump data from the input reader thread to the output
Re: Overwhelming a cluster with writes?
No, the disks on all nodes have about 750GB free space. Also as mentioned in my follow-up email, writing with ConsistencyLevel.ALL makes the slowdowns / crashes go away. -- Ilya On Mon, Apr 5, 2010 at 11:46 PM, Ran Tavory ran...@gmail.com wrote: Do you see one of the disks used by cassandra filled up when a node crashes? On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov ivmay...@gmail.com wrote: I'm running the nodes with a JVM heap size of 6GB, and here are the related options from my storage-conf.xml. As mentioned in the first email, I left everything at the default value. I briefly googled around for Cassandra performance tuning etc but haven't found a definitive guide ... any help with tuning these parameters is greatly appreciated! DiskAccessModeauto/DiskAccessMode RowWarningThresholdInMB512/RowWarningThresholdInMB SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB64/MemtableThroughputInMB BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB MemtableOperationsInMillions0.3/MemtableOperationsInMillions MemtableFlushAfterMinutes60/MemtableFlushAfterMinutes ConcurrentReads8/ConcurrentReads ConcurrentWrites64/ConcurrentWrites CommitLogSyncperiodic/CommitLogSync CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS GCGraceSeconds864000/GCGraceSeconds -- Ilya On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman shulm...@gmail.com wrote: You are running out of memory on your nodes. Before the final crash your nodes are probably slow due to GC. What is your memtable size? What cache options did you configure? On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov ivmay...@gmail.com wrote: Hi all, I've just started experimenting with Cassandra to get a feel for the system. I've set up a test cluster and to get a ballpark idea of its performance I wrote a simple tool to load some toy data into the system. Surprisingly, I am able to overwhelm my 4-node cluster with writes from a single client. I'm trying to figure out if this is a problem with my setup, if I'm hitting bugs in the Cassandra codebase, or if this is intended behavior. Sorry this email is kind of long, here is the TLDR version: While writing to Cassandra from a single node, I am able to get the cluster into a bad state, where nodes are randomly disconnecting from each other, write performance plummets, and sometimes nodes even crash. Further, the nodes do not recover as long as the writes continue (even at a much lower rate), and sometimes do not recover at all unless I restart them. I can get this to happen simply by throwing data at the cluster fast enough, and I'm wondering if this is a known issue or if I need to tweak my setup. Now, the details. First, a little bit about the setup: 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched in. Node specs: 8-core Intel Xeon e5...@2.00ghz 8GB RAM 1Gbit ethernet Red Hat Linux 2.6.18 JVM 1.6.0_19 64-bit 1TB spinning disk houses both commitlog and data directories (which I know is not ideal). The client machine is on the same local network and has very similar specs. The cassandra nodes are started with the following JVM options: ./cassandra JVM_OPTS=-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC I'm using default settings for all of the tunable stuff at the bottom of storage-conf.xml. I also selected my initial tokens to evenly partition the key space when the cluster was bootstrapped. I am using the RandomPartitioner. Now, about the test. Basically I am trying to get an idea of just how fast I can make this thing go. I am writing ~250M data records into the cluster, replicated at 3x, using Ran Tavory's Hector client (Java), writing with ConsistencyLevel.ZERO and FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8 threads talking to each of the 4 nodes in the cluster. Records are identified by a numeric id, and I'm writing them in batches of up to 10k records per row, with each record in its own column. The row key identifies the bucket into which records fall. So, records with ids 0 - are written to row 0, 1 - 1 are written to row 1, etc. Each record is a JSON object with ~10-20 fields. Records: { // Column Family 0 : { // row key for the start of the bucket. Buckets span a range of up to 1 records 1 : { /* some JSON */ }, // Column for record with id=1 3 : { /* some more JSON */ }, // Column for record with id=3 ... : { /* ... */ } }, 1 : { // row key for the start of the next bucket 10001 : ... 10004 : } I am reading the data out of a
Re: Overwhelming a cluster with writes?
You are blowing away the mostly saner JVM_OPTS running it that way. Edit cassandra.in.sh (or wherever config is on your system) to increase mx to 4G (not 6G, for now) and leave everything else untouched and do not specify JVM_OPTS on the command line. See if you get the same behavior. b On Mon, Apr 5, 2010 at 11:48 PM, Ilya Maykov ivmay...@gmail.com wrote: No, the disks on all nodes have about 750GB free space. Also as mentioned in my follow-up email, writing with ConsistencyLevel.ALL makes the slowdowns / crashes go away. -- Ilya On Mon, Apr 5, 2010 at 11:46 PM, Ran Tavory ran...@gmail.com wrote: Do you see one of the disks used by cassandra filled up when a node crashes? On Tue, Apr 6, 2010 at 9:39 AM, Ilya Maykov ivmay...@gmail.com wrote: I'm running the nodes with a JVM heap size of 6GB, and here are the related options from my storage-conf.xml. As mentioned in the first email, I left everything at the default value. I briefly googled around for Cassandra performance tuning etc but haven't found a definitive guide ... any help with tuning these parameters is greatly appreciated! DiskAccessModeauto/DiskAccessMode RowWarningThresholdInMB512/RowWarningThresholdInMB SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB64/MemtableThroughputInMB BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB MemtableOperationsInMillions0.3/MemtableOperationsInMillions MemtableFlushAfterMinutes60/MemtableFlushAfterMinutes ConcurrentReads8/ConcurrentReads ConcurrentWrites64/ConcurrentWrites CommitLogSyncperiodic/CommitLogSync CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS GCGraceSeconds864000/GCGraceSeconds -- Ilya On Mon, Apr 5, 2010 at 11:26 PM, Boris Shulman shulm...@gmail.com wrote: You are running out of memory on your nodes. Before the final crash your nodes are probably slow due to GC. What is your memtable size? What cache options did you configure? On Tue, Apr 6, 2010 at 7:31 AM, Ilya Maykov ivmay...@gmail.com wrote: Hi all, I've just started experimenting with Cassandra to get a feel for the system. I've set up a test cluster and to get a ballpark idea of its performance I wrote a simple tool to load some toy data into the system. Surprisingly, I am able to overwhelm my 4-node cluster with writes from a single client. I'm trying to figure out if this is a problem with my setup, if I'm hitting bugs in the Cassandra codebase, or if this is intended behavior. Sorry this email is kind of long, here is the TLDR version: While writing to Cassandra from a single node, I am able to get the cluster into a bad state, where nodes are randomly disconnecting from each other, write performance plummets, and sometimes nodes even crash. Further, the nodes do not recover as long as the writes continue (even at a much lower rate), and sometimes do not recover at all unless I restart them. I can get this to happen simply by throwing data at the cluster fast enough, and I'm wondering if this is a known issue or if I need to tweak my setup. Now, the details. First, a little bit about the setup: 4-node cluster of identical machines, running cassandra-0.6.0-rc1 with the fixes for CASSANDRA-933, CASSANDRA-934, and CASSANDRA-936 patched in. Node specs: 8-core Intel Xeon e5...@2.00ghz 8GB RAM 1Gbit ethernet Red Hat Linux 2.6.18 JVM 1.6.0_19 64-bit 1TB spinning disk houses both commitlog and data directories (which I know is not ideal). The client machine is on the same local network and has very similar specs. The cassandra nodes are started with the following JVM options: ./cassandra JVM_OPTS=-Xms6144m -Xmx6144m -XX:+UseConcMarkSweepGC -d64 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+DisableExplicitGC I'm using default settings for all of the tunable stuff at the bottom of storage-conf.xml. I also selected my initial tokens to evenly partition the key space when the cluster was bootstrapped. I am using the RandomPartitioner. Now, about the test. Basically I am trying to get an idea of just how fast I can make this thing go. I am writing ~250M data records into the cluster, replicated at 3x, using Ran Tavory's Hector client (Java), writing with ConsistencyLevel.ZERO and FailoverPolicy.FAIL_FAST. The client is using 32 threads with 8 threads talking to each of the 4 nodes in the cluster. Records are identified by a numeric id, and I'm writing them in batches of up to 10k records per row, with each record in its own column. The row key identifies the bucket into which records fall. So, records with ids 0 - are written to row 0, 1 - 1 are written to row 1, etc. Each record is a JSON object with ~10-20 fields. Records: { // Column Family 0 : { // row key for the
Re: Memcached protocol?
On Mon, Apr 5, 2010 at 5:10 PM, Paul Prescod p...@ayogo.com wrote: On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta tsalora...@gmail.com wrote: ... I would think that there is also possibility of losing some increments, or perhaps getting duplicate increments? I believe that with vector clocks in Cassandra 0.7 you won't lose anything. The conflict resolver will do the summation for you properly. If I'm wrong, I'd love to hear more, though. I think the key is that this is not automatic -- there is no general mechanism for aggregating distinct modifications. Point being that you could choose one amongst right answers, but not what to do with concurrent modifications. So what is done instead is have application-specific resolution strategy which makes use of semantics of operations, to know how to combine such concurrent modifications into correct answer. I don't know if this is trivial for case of counter increments: especially since two concurrent increments give same new value; yet correct combined result would be one higher (both used base, added one). That is to say, my understanding was that vector clocks would be required but not sufficient for reconciliation of concurrent value updates. I may be off here; apologies if I have misunderstood some crucial piece. -+ Tatu +-
Re: Overwhelming a cluster with writes?
On 4/5/10 11:48 PM, Ilya Maykov wrote: No, the disks on all nodes have about 750GB free space. Also as mentioned in my follow-up email, writing with ConsistencyLevel.ALL makes the slowdowns / crashes go away. I am not sure if the above is consistent with the cause of #896, but the other symptoms (I inserted a bunch of data really fast via Thrift and GC melted my machine!) sound like it.. https://issues.apache.org/jira/browse/CASSANDRA-896 =Rob
Re: Flush Commit Log
Yes, no problem with my live Cassandra server. Thanks, Jonathan. On Mon, Apr 5, 2010 at 11:19 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Apr 5, 2010 at 9:11 PM, JKnight JKnight beukni...@gmail.com wrote: Thanks Jonathan, When I run nodeprobe flush with parameter -host is Cassandra server setup on my computer, my computer is hang up by Cassandra. (When I kill all Java process, the computer will work well) Sounds like flush generates a lot of i/o. Not surprising. Yesterday, when run nodeprobe flush on my live server, I didn't flush all keyspace so that commit log files weren't deleted. Today, after flush for all keyspace, commit log files were deleted So... no problem, right? -Jonathan -- Best regards, JKnight
How do vector clocks and conflicts work?
This may be the blind leading the blind... On Mon, Apr 5, 2010 at 11:54 PM, Tatu Saloranta tsalora...@gmail.comwrote: ... I think the key is that this is not automatic -- there is no general mechanism for aggregating distinct modifications. Point being that you could choose one amongst right answers, but not what to do with concurrent modifications. So what is done instead is have application-specific resolution strategy which makes use of semantics of operations, to know how to combine such concurrent modifications into correct answer. I agree with all of that. I don't know if this is trivial for case of counter increments: especially since two concurrent increments give same new value; yet correct combined result would be one higher (both used base, added one). As long as the conflict resolver knows that two writers each tried to increment, then it can increment twice. The conflict resolver must know about the semantics of increment or decrement or string append or binary patch or whatever other merge strategy you choose. You'll register your strategy with Cassandra and it will apply it. Presumably it will also maintain enough context about what you were trying to accomplish to allow the merge strategy plugin to do it properly. That is to say, my understanding was that vector clocks would be required but not sufficient for reconciliation of concurrent value updates. I agree. They are necessary, but not sufficient. The other half is the merge strategy plugin thing, which is analogous to custom comparators in Cassandra today. In CASSANDRA-580, Pedro Gomes talks about the plugins like this: I suppose for the beginning of the discussion that some sort of interface will be implemented to allow pluggable logic to be added to the server, personalized scripts were an idea, I have heard. Kevin Kakugawa replies that they'll just use Java class libraries as a first pass. Paul Prescod
Re: Overwhelming a cluster with writes?
That does sound similar. It's possible that the difference I'm seeing between ConsistencyLevel.ZERO and ConsistencyLevel.ALL is simply due to the fact that using ALL slows down the writers enough that the GC can keep up. I could do a test with multiple clients writing at ALL in parallel tomorrow. If there are still no problems writing at ALL even with extra load from additional clients, that might point to problems in how async writes are handled vs. sync writes. I will also do some profiling of the server processes with both ZERO and ALL writer behaviors and report back. RE: JVM_OPTS, I will try running with the more sane options (but a larger heap) as well. -- Ilya On Mon, Apr 5, 2010 at 11:59 PM, Rob Coli rc...@digg.com wrote: On 4/5/10 11:48 PM, Ilya Maykov wrote: No, the disks on all nodes have about 750GB free space. Also as mentioned in my follow-up email, writing with ConsistencyLevel.ALL makes the slowdowns / crashes go away. I am not sure if the above is consistent with the cause of #896, but the other symptoms (I inserted a bunch of data really fast via Thrift and GC melted my machine!) sound like it.. https://issues.apache.org/jira/browse/CASSANDRA-896 =Rob
Re: Overwhelming a cluster with writes?
Right, I meant 4GB heap vs. the standard 1GB. And all other options in cassandra.in.sh at their defaults. Sorry I am a bit new to JVM tuning, and very new to Cassandra :) -- Ilya On Tue, Apr 6, 2010 at 12:16 AM, Benjamin Black b...@b3k.us wrote: I am specifically suggesting you NOT use a heap that large with your 8GB machines. Please test with 4GB first. On Tue, Apr 6, 2010 at 12:13 AM, Ilya Maykov ivmay...@gmail.com wrote: That does sound similar. It's possible that the difference I'm seeing between ConsistencyLevel.ZERO and ConsistencyLevel.ALL is simply due to the fact that using ALL slows down the writers enough that the GC can keep up. I could do a test with multiple clients writing at ALL in parallel tomorrow. If there are still no problems writing at ALL even with extra load from additional clients, that might point to problems in how async writes are handled vs. sync writes. I will also do some profiling of the server processes with both ZERO and ALL writer behaviors and report back. RE: JVM_OPTS, I will try running with the more sane options (but a larger heap) as well. -- Ilya On Mon, Apr 5, 2010 at 11:59 PM, Rob Coli rc...@digg.com wrote: On 4/5/10 11:48 PM, Ilya Maykov wrote: No, the disks on all nodes have about 750GB free space. Also as mentioned in my follow-up email, writing with ConsistencyLevel.ALL makes the slowdowns / crashes go away. I am not sure if the above is consistent with the cause of #896, but the other symptoms (I inserted a bunch of data really fast via Thrift and GC melted my machine!) sound like it.. https://issues.apache.org/jira/browse/CASSANDRA-896 =Rob
Re: Memcached protocol?
On Tue, Apr 6, 2010 at 2:10 AM, Paul Prescod p...@ayogo.com wrote: On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta tsalora...@gmail.com wrote: ... I would think that there is also possibility of losing some increments, or perhaps getting duplicate increments? I believe that with vector clocks in Cassandra 0.7 you won't lose anything. The conflict resolver will do the summation for you properly. If I'm wrong, I'd love to hear more, though. I keep reading this in the list, but why would vector clocks allow consistent counters in a conflicting update? Say nodes A,B,C where A,B get concurrent updates, if we do read-and-set this does not seem useful as we'd end up with a vector A:x+1,B:x+1 but why would x+1 be the correct value compared to x+2 ? Or are we imagining spreading pairs key,INCR, key,DECR in which we assume the writer client did not look at the existing value? -- blog en: http://www.riffraff.info blog it: http://riffraff.blogsome.com
Re: how to store list data in Apache Cassndra ?
Another option is to use a SuperColumnFamily, but that extends the depth of all such values to be arrays. The name and age columns would therefore also need to be SuperColumns -- just with a single sub-column each. Like many things in Cassandra, the preferred storage method depends on your application's access patterns. It's quite unlike the normalization procedure for an RDBMS, which is possible without knowing future queries. On 2010-04-06 09:12, Michael Pearson wrote: Column Families are keyed attribute/value pairs, your 'girls' column will need to be serialised on save, and deserialiased on load so that it can treated as your intended array. Pickle will do this for you (http://docs.python.org/library/pickle.html) eg: import pycassa import pickle client = pycassa.connect() cf = pycassa.ColumnFamily(client, 'mygame', 'user') key = '1234567890' value = { 'name': 'Lee Li', 'age'; '21', 'girls': pickle.dumps(['java', 'actionscript', 'python']) } cf.insert(key, value) hope that helps -michael On Tue, Apr 6, 2010 at 6:49 PM, Shuge Lee shuge@gmail.com wrote: Dear firends: how to store list data in Apache Cassndra ? For example: user['lee'] = { 'name': 'lee', 'age'; '21', 'girls': ['java', 'actionscript', 'python'], } Notice key `gils` I using pycassa (a python lib of cassandra) import pycassa client = pycassa.connect() cf = pycassa.ColumnFamily(client, 'mygame', 'user') key = '1234567890' value = { 'name': 'Lee Li', 'age'; '21', 'girls': ['java', 'actionscript', 'python'], } cf.insert(key, value) Oops, get error while save a `value` like above. So, how to store list data in Apache Cassndra ? Thanks for reply. -- Shuge Lee | Lee Li -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct] signature.asc Description: OpenPGP digital signature
i have one mistake in Cassandra.java when i build it
hi: i want to take some experiments on cassandra by java, but when i write client,a mistake can not convert int to ConsistencyLevel appear, so how can i solve ? thanks very much.
Re: i have one mistake in Cassandra.java when i build it
This means you rebuilt the Thrift code with an old compiler. If you look in lib/ the thrift jar is tagged with the svn revision we built with. Thrift has frequent regressions, so using that same revision is the best way to avoid unpleasant surprises. On Tue, Apr 6, 2010 at 4:34 AM, 叶江 yejiang...@gmail.com wrote: hi: i want to take some experiments on cassandra by java, but when i write client,a mistake can not convert int to ConsistencyLevel appear, so how can i solve ? thanks very much.
Re: Overwhelming a cluster with writes?
On Tue, Apr 6, 2010 at 2:13 AM, Ilya Maykov ivmay...@gmail.com wrote: That does sound similar. It's possible that the difference I'm seeing between ConsistencyLevel.ZERO and ConsistencyLevel.ALL is simply due to the fact that using ALL slows down the writers enough that the GC can keep up. No, it's mostly due to ZERO meaning buffer this locally and write it when it's convenient, and buffering takes memory. If you check your tpstats you will see the pending ops through the roof on the node handling the thrift connections.
Re: Memcached protocol?
On Mon, Apr 5, 2010 at 6:48 PM, Tatu Saloranta tsalora...@gmail.com wrote: I would think that there is also possibility of losing some increments, or perhaps getting duplicate increments? It is not just isolation but also correctness that is hard to maintain but correctness also. This can be more easily worked around in cases where there is additional data that can be used to resolve potentially ambiguous changes (like inferring which of shopping cart additions are real, which duplicates). With more work I am sure it is possible to get things mostly working, it's just question of cost/benefit for specific use cases. Let me inject a couple useful references: http://pl.atyp.us/wordpress/?p=2601 http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/
if cassandra isn't ideal for keep track of counts, how does digg count diggs?
From what I read in another thread, Cassandra isn't used for isn't 'ideal' for keeping track of counts. For example, I would undertand this to mean keeping track of which stories were dugg. If this is true, how would a site like digg keep track of the 'dugg' counter? Also, I am assuming with eventual consistancy the number *may* not be 100% accurate. If you wanted it to be accurate, would you just use the Quorom flag? (I believe quorom is to ensure all writes are written to disk)
odd problem retrieving binary values using C++
Hi all... I am having a pretty tough time retrieving binary values out of my DB... I am using cassandra 0.5.1 on Centos 5.4 with java 1.6.0-19 Here is the simple test I am trying to run in C++ /* snip initialization */ { transport-open(); ColumnPath new_col; new_col.__isset.column = true; /* this is required! */ new_col.column_family.assign(Standard2); new_col.super_column.assign(); new_col.column.assign(testing); char *data_cstr=this\0 is\0 data!; std::string data; data.assign(data_cstr, 15); printf(Data '%s' has length %lu\n, data.c_str(), data.length()); // This properly returns 15 client.insert(Keyspace1,newone,new_col,data,55,ONE); ColumnOrSuperColumn ret_val; client.get(ret_val,Keyspace1,newone,new_col,ONE); printf(Column name retrieved is: %s\n, ret_val.column.name.c_str()); printf(Value in column retrieved is: %s\n, ret_val.column.value.c_str()); // This only ever returns 'this' (i.e., everything before the first \0) // I understand null termination in %s... see below printf(Value has length %lu\n, ret_val.column.value.length()); // and this gives me 4 transport-close(); } /* snip the rest too! */ Am I missing something major in proceeding this way? I have tried GDB and eventually all I get back is a string containing 'this'. Here is the dumped content of Keyspace1/Standard2-1-Data.db... od -c /u01/cassandra/data/Keyspace1/Standard2-1-Data.db 000 \0 - 1 1 5 5 7 1 6 5 7 6 3 3 4 2 020 7 0 7 9 0 1 4 5 2 8 3 5 8 0 2 3 040 7 5 1 9 9 5 2 8 : n e w o n e \0 060 \0 \0 264 \0 \0 \0 U \0 \0 \0 003 254 355 \0 005 s 100 r \0 020 j a v a . u t i l . B i t 120 S e t n 375 210 ~ 9 4 253 ! 003 \0 001 [ \0 140 004 b i t s t \0 002 [ J x p u r \0 002 160 [ J x 004 265 022 261 u 223 002 \0 \0 x p \0 200 \0 \0 001 \0 202 \b \0 \0 \0 \0 \0 x \0 \0 \0 220 \0 \a t e s t i n g \0 \a t e s t i 240 n g \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 260 \0 % 200 \0 \0 \0 200 \0 \0 \0 \0 \0 \0 \0 \0 \0 300 \0 001 \0 \a t e s t i n g \0 \0 \0 \0 \0 320 \0 \0 \0 7 \0 \0 \0 017 t h i s \0 i s 340 \0 d a t a ! 347 This shows that the data is stored properly to the db file. # bin/cassandra-cli -host localhost Connected to localhost/9160 Welcome to cassandra CLI. cassandra get Keyspace1.Standard2['newone'] = (column=testing, value=this is data!, timestamp=55) Returned 1 results. Shows the same thing! It's there !!! I would lean towards a Thrift interface problem... In any case... I'd be thankful if someone had a pointer/workaround to this show-stopper of mine... Best Chris.
Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?
Chris, When you so patch, does that mean for Cassandra or your own internal codebase? Sounds interesting thanks! On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet goffi...@digg.com wrote: That's not true. We have been using the Zookeper work we posted on jira. That's what we are using internally and have been for months. We are now just wrapping up our vector clocks + distributed counter patch so we can begin transitioning away from the Zookeeper approach because there are problems with it long-term. -Chris On Apr 6, 2010, at 9:50 AM, Ryan King wrote: They don't use cassandra for it yet. -ryan On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed sahmed1...@gmail.com wrote: From what I read in another thread, Cassandra isn't used for isn't 'ideal' for keeping track of counts. For example, I would undertand this to mean keeping track of which stories were dugg. If this is true, how would a site like digg keep track of the 'dugg' counter? Also, I am assuming with eventual consistancy the number *may* not be 100% accurate. If you wanted it to be accurate, would you just use the Quorom flag? (I believe quorom is to ensure all writes are written to disk)
Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?
http://issues.apache.org/jira/browse/CASSANDRA-704 http://issues.apache.org/jira/browse/CASSANDRA-721 We have our own internal codebase of Cassandra at Digg. But we are using those above patches until we have the vector clock work cleaned up, that patch will also goto jira. Most likely the vector clock work will go into 0.7, but since we run 0.6 and built it for that version, we will share that patch too. -Chris On Apr 6, 2010, at 10:17 AM, S Ahmed wrote: Chris, When you so patch, does that mean for Cassandra or your own internal codebase? Sounds interesting thanks! On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet goffi...@digg.com wrote: That's not true. We have been using the Zookeper work we posted on jira. That's what we are using internally and have been for months. We are now just wrapping up our vector clocks + distributed counter patch so we can begin transitioning away from the Zookeeper approach because there are problems with it long-term. -Chris On Apr 6, 2010, at 9:50 AM, Ryan King wrote: They don't use cassandra for it yet. -ryan On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed sahmed1...@gmail.com wrote: From what I read in another thread, Cassandra isn't used for isn't 'ideal' for keeping track of counts. For example, I would undertand this to mean keeping track of which stories were dugg. If this is true, how would a site like digg keep track of the 'dugg' counter? Also, I am assuming with eventual consistancy the number *may* not be 100% accurate. If you wanted it to be accurate, would you just use the Quorom flag? (I believe quorom is to ensure all writes are written to disk)
Re: how to store list data in Apache Cassndra ?
On Tue, Apr 6, 2010 at 8:06 AM, Shuge Lee shuge@gmail.com wrote: 'girls': pickle.dumps(['java', 'actionscript', 'python']) I think this is a really bad idea, I can't do any search if using Pickle. Just to be sure: are you thinking of traditional queries, lookups by values (find entries that have certain element in a list value)? If so, you may be in trouble anyway: you can only do efficient queries by primary entry key, not by values (Cassandra at least can do range queries on keys, but still). -+ Tatu +-
Re: A question of 'referential integrity'...
On Tue, Apr 6, 2010 at 10:12 AM, Steve sjh_cassan...@shic.co.uk wrote: On 06/04/2010 15:26, Eric Evans wrote: ... I've read all about QUORUM, and it is generally useful, but as far as I can tell, it can't give me a transaction... Correct. Only individual operations are atomic, and ordering of insertions is not guaranteed. I think there were some logged Jira issues to allow grouping of operations into what seems to amount to transactions, which could help a lot here... but I can't find it now (or maybe it has only been discussed so far?). If I understand this correctly, it would just mean that you could send a sequence of operations, to be completed as a unit (first into journal, then into memtable etc). -+ Tatu +-
Re: Overwhelming a cluster with writes?
On Tue, Apr 6, 2010 at 8:17 AM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Apr 6, 2010 at 2:13 AM, Ilya Maykov ivmay...@gmail.com wrote: That does sound similar. It's possible that the difference I'm seeing between ConsistencyLevel.ZERO and ConsistencyLevel.ALL is simply due to the fact that using ALL slows down the writers enough that the GC can keep up. No, it's mostly due to ZERO meaning buffer this locally and write it when it's convenient, and buffering takes memory. If you check your tpstats you will see the pending ops through the roof on the node handling the thrift connections. This sounds like a great FAQ entry? (apologies if it's already included) So that ideally users would only use this setting if they (think they) know what they are doing. :-) -+ Tatu +-
Re: How do vector clocks and conflicts work?
On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone m...@simplegeo.com wrote: As long as the conflict resolver knows that two writers each tried to increment, then it can increment twice. The conflict resolver must know about the semantics of increment or decrement or string append or binary patch or whatever other merge strategy you choose. You'll register your strategy with Cassandra and it will apply it. Presumably it will also maintain enough context about what you were trying to accomplish to allow the merge strategy plugin to do it properly. That is to say, my understanding was that vector clocks would be required but not sufficient for reconciliation of concurrent value updates. The way I envisioned eventually consistent counters working would require something slightly more sophisticated... but not too bad. As incr/decr operations happen on distributed nodes, each node would keep a (vector clock, delta) tuple for that node's local changes. When a client fetched the value of the counter the vector clock deltas and the reconciled count would be combined into a single result. Similarly, when a replication / hinted-handoff / read-repair reconciliation occurred the counts would be merged into a single (vector clock, count) tuple. Maybe there's a more elegant solution, but that's how I had been thinking about this particular problem. I doubt there is any simple and elegant solution -- if there was, it would have been invented in 50s if there was. :-) Given this, yes, something along these lines sounds realistic. It also sounds like implementation would greatly benefit (if not require) foundational support from core, as opposed to being done outside of Cassandra (which I understand you are suggesting). I wasn't sure if the idea was to try to do this completely separate (aside from vector clock support). -+ Tatu +-
Net::Cassandra::Easy deletion failed
Seems to be internal to java/cassandra itself. I have some tests and I want to make sure that I have a clean slate each time I run the test. Clean as far as my code cares is that value is not defined. I'm running bin/cassandra -f with the default install/options. So at the beginning of my test I run: $rc = $c-mutate([$key], family = 'Standard1', deletions = { byname = ['value']}); Alas, the cassandra terminal/cassandra itself barfs out: ERROR 10:59:15,779 Error in ThreadPoolExecutor java.lang.RuntimeException: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.db.SuperColumn.timestamp(SuperColumn.java:137) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:65) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:73) at org.apache.cassandra.db.RowMutationSerializer.freezeTheMaps(RowMutation.java:334) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:346) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:319) at org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:275) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:200) at org.apache.cassandra.service.StorageProxy$3.runMayThrow(StorageProxy.java:310) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more ERROR 10:59:15,786 Fatal exception in thread Thread[ROW-MUTATION-STAGE:21,5,main] java.lang.RuntimeException: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.db.SuperColumn.timestamp(SuperColumn.java:137) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:65) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:73) at org.apache.cassandra.db.RowMutationSerializer.freezeTheMaps(RowMutation.java:334) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:346) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:319) at org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:275) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:200) at org.apache.cassandra.service.StorageProxy$3.runMayThrow(StorageProxy.java:310) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Anyone have any ideas what I'm doing wrong? The value field is just a json encoded digit so something like (30) not a real supercolumn but the Net::Cassandra::Easy docs didn't have any examples of removing a non supercolumns data. Really what I'd like to do is delete the whole row, but again I didn't find any examples of how to do this.
Re: A question of 'referential integrity'...
On 06/04/2010 18:50, Benjamin Black wrote: I'm finding this exchange very confusing. What exactly about Cassandra 'looks absolutely ideal' to you for your project? The write performance, the symmetric, peer to peer architecture, etc? Reasons I like Cassandra for this project: * Columnar rather than tabular data structures with an extensible 'schemata' - permitting evolution of back-end data structures to support new features without down-time. * Decentralised architecture with fault tolerance/redundancy permitting high availability on shoestring budget hardware in an easily scalable pool - in spite of needing to track rapidly changing data that precludes meaningful backup. * Easy to establish that data will be efficiently sharded - allowing many concurrent reads and writes - i.e. systemic IO bandwidth is scalable - both for reading and writing. * Lightweight, free and open-source physical data model that minimises risk of vendor lock-in or insurmountable problems with glitches in commercial closed-source libraries. A shorter answer might be that, in all ways other than depending upon 'referential integrity' between two 'maps' of hash-values, the data for the rest of my application looks remarkably like that of large sites that we know already use Cassandra. I'm trying to establish the most effective Cassandra approach to achieve the logical 'referential integrity' while minimising resource (memory/disk/CPU) use in order to minimise hardware costs for any given deployment scale - all the while, retaining the above advantages.
Re: A question of 'referential integrity'...
On 06/04/2010 18:53, Tatu Saloranta wrote: I've read all about QUORUM, and it is generally useful, but as far as I can tell, it can't give me a transaction... Correct. Only individual operations are atomic, and ordering of insertions is not guaranteed. As I thought. I think there were some logged Jira issues to allow grouping of operations into what seems to amount to transactions, which could help a lot here... but I can't find it now (or maybe it has only been discussed so far?). If I understand this correctly, it would just mean that you could send a sequence of operations, to be completed as a unit (first into journal, then into memtable etc). I think we're on the same page. I need an atomic 'transaction' affecting multiple keys - so I write a tuple of all the updates (inserts/deletes) as a single value into a 'merge-pending' keyset... and (somehow - perhaps with memtable) I modify data read from other keysets to be as-if this 'merge-pending' data had already been been applied to the independent keysets to which it relates. A process/thread on each node would continuously attempt to apply the multiple updates from the merge-pending data before deleting it and dropping the associated merge-data from the in-memory transformations. Latency should be very low (like with a log-based file-system) and throughput should be reasonably high because there should be a lot of flexibility in batch processing the 'merge-pending' data. This way, if there's a failure during merging, there's sufficient durable record to complete the merge before serving any more remote requests. To the remote client, it appears indistinguishable from an atomic transaction that affected more than one key.
Re: How do vector clocks and conflicts work?
On Tue, Apr 6, 2010 at 9:11 AM, Paul Prescod pres...@gmail.com wrote: This may be the blind leading the blind... On Mon, Apr 5, 2010 at 11:54 PM, Tatu Saloranta tsalora...@gmail.com wrote: ... I think the key is that this is not automatic -- there is no general mechanism for aggregating distinct modifications. Point being that you could choose one amongst right answers, but not what to do with concurrent modifications. So what is done instead is have application-specific resolution strategy which makes use of semantics of operations, to know how to combine such concurrent modifications into correct answer. I agree with all of that. I don't know if this is trivial for case of counter increments: especially since two concurrent increments give same new value; yet correct combined result would be one higher (both used base, added one). As long as the conflict resolver knows that two writers each tried to increment, then it can increment twice. The conflict resolver must know about the semantics of increment or decrement or string append or binary patch or whatever other merge strategy you choose. You'll register your strategy with Cassandra and it will apply it. Presumably it will also maintain enough context about what you were trying to accomplish to allow the merge strategy plugin to do it properly. as long as operations are commutative, isn't the conflict resolution simply apply all ? A large number of useful operations can be implemented this way (numeric incr/decr, set ops etc)
problem with Net::Cassanda::Easy deleting columns
Hello I tried to post this earlier but something seems to have gone wrong with sending the message. I have a test perl script that I'm using to test the behaviour of some of my existing code. It is important that the values start in a clean state at the beginning of the tests, as I'm incrementing values, checking scores etc during the test and need to test that the values I expect are actually what gets stored. To try to clear this I'm using the following Net::Cassandra::Easy call (its a perl thrift wrapper): $rc = $c-mutate([$key], family = 'Standard1', deletions = { byname = ['value']}); The perl module is pretty poorly documented as are all the other ones I've looked at (if someone has a better one to use I'd be interested). Particularly the examples show that something like this is to be used to delete a supercolumn from a key, but there is no examples of regular columns or how to delete a whole key row from the datastore. Really all my code cares is that the value comes back undefined until the test has actually added a value for the value column. When I run the code java/cassandra barfs and the test dies (cassandra seems to keep running happily other than the exception dumps). Here is what cassandra dumps out to the terminal (running bin/cassandra -f for the test): da...@vader$ bin/cassandra -f ERROR 13:01:43,301 Error in ThreadPoolExecutor java.lang.RuntimeException: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.db.SuperColumn.timestamp(SuperColumn.java:137) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:65) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:73) at org.apache.cassandra.db.RowMutationSerializer.freezeTheMaps(RowMutation.java:334) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:346) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:319) at org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:275) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:200) at org.apache.cassandra.service.StorageProxy$3.runMayThrow(StorageProxy.java:310) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more ERROR 13:01:43,308 Fatal exception in thread Thread[ROW-MUTATION-STAGE:3,5,main] java.lang.RuntimeException: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: This operation is not supported for Super Columns. at org.apache.cassandra.db.SuperColumn.timestamp(SuperColumn.java:137) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:65) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:73) at org.apache.cassandra.db.RowMutationSerializer.freezeTheMaps(RowMutation.java:334) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:346) at org.apache.cassandra.db.RowMutationSerializer.serialize(RowMutation.java:319) at org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:275) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:200) at org.apache.cassandra.service.StorageProxy$3.runMayThrow(StorageProxy.java:310) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Anyone have an idea of what would cause this? Importantly: I don't necessarily know that the value exists in Cassandra before I want to delete it, I want more of a delete if exists kind of behaviour. The end result should be if the key exists the column gets removed or the row continues to not exist. Ideally I would delete the whole row (there is only one column at the moment but in the future there will be more than one so manually deleting the value for each of the columns is a bit of a pain).
Re: Net::Cassandra::Easy deletion failed
On Tue, 06 Apr 2010 11:07:03 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Seems to be internal to java/cassandra itself. MG I have some tests and I want to make sure that I have a clean slate MG each time I run the test. Clean as far as my code cares is that MG value is not defined. I'm running bin/cassandra -f with the MG default install/options. So at the beginning of my test I run: Mike, you can submit bugs and questions directly to me, here, or through http://rt.cpan.org (the CPAN bug tracker). It's a good idea to test an operation from the CLI that comes with Cassandra to make sure the problem is not with the Net::Cassandra::Easy module. Also, if you set $Net::Cassandra::Easy::DEBUG to 1, you'll see the actual Thrift objects that get constructed. In this case (N::C::Easy 0.08) I was constructing a super_column parameter which was wrong. MG $rc = $c-mutate([$key], family = 'Standard1', deletions = { byname = ['value']}); ... MG Anyone have any ideas what I'm doing wrong? The value field is just a MG json encoded digit so something like (30) not a real supercolumn but MG the Net::Cassandra::Easy docs didn't have any examples of removing a MG non supercolumns data. Really what I'd like to do is delete the whole MG row, but again I didn't find any examples of how to do this. It's a bug in N::C::Easy. I fixed it in 0.09 so it will work properly with: $rc = $c-mutate([$key], family = 'Standard1', deletions = { standard = 1, byname = ['column1', 'column2']}); AFAIK I can't specify delete all columns in a non-super CF using Deletions so byname is required (I end up filling the column_names field in the predicate). OTOH I can just delete a SuperColumn so the above is possible in a super CF. The docs and tests were updated as well. Let me know if you have problems; it worked for me. In the next release I'll update cassidy.pl to work with non-super CFs as well. Sorry for the inconvenience. Thanks Ted
Re: Net::Cassandra::Easy deletion failed
On Tue, 06 Apr 2010 13:24:45 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Thanks for the reply. The newest version of the module I see on CPAN MG is 0.08b. I actually had 0.07 installed and am using 0.6beta3 for MG cassandra. Is there somewhere else I should look for the 0.09 version MG of the module? I'll also upgrade to the release candidate version of MG Cassandra and see if that helps. It takes a few hours for CPAN to update all its mirrors. I'm attaching 0.09 here since it's a tiny tarball. Ted Net-Cassandra-Easy-0.09.tar.gz Description: Binary data
Re: How do vector clocks and conflicts work?
On Tue, Apr 6, 2010 at 11:03 AM, Tatu Saloranta tsalora...@gmail.comwrote: On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone m...@simplegeo.com wrote: As long as the conflict resolver knows that two writers each tried to increment, then it can increment twice. The conflict resolver must know about the semantics of increment or decrement or string append or binary patch or whatever other merge strategy you choose. You'll register your strategy with Cassandra and it will apply it. Presumably it will also maintain enough context about what you were trying to accomplish to allow the merge strategy plugin to do it properly. That is to say, my understanding was that vector clocks would be required but not sufficient for reconciliation of concurrent value updates. The way I envisioned eventually consistent counters working would require something slightly more sophisticated... but not too bad. As incr/decr operations happen on distributed nodes, each node would keep a (vector clock, delta) tuple for that node's local changes. When a client fetched the value of the counter the vector clock deltas and the reconciled count would be combined into a single result. Similarly, when a replication / hinted-handoff / read-repair reconciliation occurred the counts would be merged into a single (vector clock, count) tuple. Maybe there's a more elegant solution, but that's how I had been thinking about this particular problem. I doubt there is any simple and elegant solution -- if there was, it would have been invented in 50s if there was. :-) Given this, yes, something along these lines sounds realistic. It also sounds like implementation would greatly benefit (if not require) foundational support from core, as opposed to being done outside of Cassandra (which I understand you are suggesting). I wasn't sure if the idea was to try to do this completely separate (aside from vector clock support). I'd probably put it in core. Or at least put some more generic support for this sort of conflict resolution in core. I'm looking forward to seeing Digg's patch for this stuff. Mike
Re: Net::Cassandra::Easy deletion failed
On 04/06/2010 01:36 PM, Ted Zlatanov wrote: On Tue, 06 Apr 2010 13:24:45 -0700 Mike Gallamoremike.e.gallam...@googlemail.com wrote: MG Thanks for the reply. The newest version of the module I see on CPAN MG is 0.08b. I actually had 0.07 installed and am using 0.6beta3 for MG cassandra. Is there somewhere else I should look for the 0.09 version MG of the module? I'll also upgrade to the release candidate version of MG Cassandra and see if that helps. It takes a few hours for CPAN to update all its mirrors. I'm attaching 0.09 here since it's a tiny tarball. Ted Great it works. Or at least the Cassandra/thrift part seems to work. My tests don't pass but I think it is actual logic errors in the test now, the column does appear to be getting cleared okay with the new version of the module. Thanks.