Re: Weird GC
Thanks for your help. I've added those flags as well as some others I saw in another thread that redirects stdout to a file. What information is it that you need? 2014-01-29 Benedict Elliott Smith belliottsm...@datastax.com: It's possible the time attributed to GC is actually spent somewhere else; a multitude of tasks may occur during the same safepoint as a GC. We've seen some batch revoke of biased locks take a long time, for instance; *if* this is happening in your case, and we can track down which objects, I would consider it a bug and we may be able to fix it. -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Reverting from VirtualNode
Once we set nodes to act as virtualnodes, there is an way to revert to manual assigned token? I have two nodes for testing and there I set 'num_tokens: 256' and let initial_token line commented. VirtualNodes worked fine. But then I tried to switch back by commenting 'num_tokens' line and uncommenting 'initial_token', although after starting cassandra and typing, ./nodetool -h 'ip' ring there are still the default 256 tokens per node. What am I missing? Att, *Víctor Hugo Molinar*
Re: Weird GC
You should expect to see lines of output like: vmop[threads: total initially_running wait_to_block][time: spin block sync cleanup vmop] page_trap_count 0.436: Deoptimize [ 10 0 0] [ 0 0 0 0 0] 0 1.437: no vm operation [ 18 0 1] [ 0 0 0 0 0] 0 1.762: Deoptimize [ 21 0 0] [ 0 0 0 0 0] 0 2.764: no vm operation [ 160 0 1] [ 0 0 0 0 0] 0 2.876: Deoptimize [ 161 0 0] [ 0 0 0 0 0] 0 4.503: EnableBiasedLocking [ 164 0 0] [ 0 0 0 0 0] 0 6.916: RevokeBias [ 164 0 0] [ 0 0 0 0 0] 0 You're looking for any of these lines printed at or around one of your unexpectedly long pauses. On 31 January 2014 10:40, Joel Samuelsson samuelsson.j...@gmail.com wrote: Thanks for your help. I've added those flags as well as some others I saw in another thread that redirects stdout to a file. What information is it that you need? 2014-01-29 Benedict Elliott Smith belliottsm...@datastax.com: It's possible the time attributed to GC is actually spent somewhere else; a multitude of tasks may occur during the same safepoint as a GC. We've seen some batch revoke of biased locks take a long time, for instance; *if* this is happening in your case, and we can track down which objects, I would consider it a bug and we may be able to fix it. -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Ultra wide row anti pattern
Hello all I've read some materials on the net about Cassandra anti patterns, among which is mentionned the very large wide-row anti pattern. The main rationale to avoid too wide rows are: 1) fragmentation of data on multiple SStables when the row is very wide, leading to very slow reads by slice query 2) repair un-efficient. During repair C* exchanges hashes of row data. Event if only one column differs, C* still exchange the whole row. Having very wide rows make repair very expensive 3) bad scaling. Having wide rows localized on some nodes of your cluster will create hotspots 4) hard limit of 2*10⁹ columns per physical row All those recommendations are quite sensible. Now my customer has a quite specific use case: a. no repair nor durability. C* is used to dump massive data (heavy write + read) for temporary processing. The tables are truncated at the end of a long running processing. So the point 2) does not apply here b. maximum number of items to be processed is 24*10⁶, far below the hard limit of 2*10⁹ columns so point 4) does not apply either c. small cluster of only 2 nodes, so load balancing is quite straightforward (50% roughly on each node). Therefore point 3) does not apply either The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? Of course, splitting wide row into several rows using bucketing technique is one solution but it forces us to keep track of the bucket number and it's not convenient. We have one process (jvm) that insert data and another process (jvm) that read data. Using bucketing, we need to synchronize the bucket number between the 2 processes. For information, below is the wide row table definition: create table widerow( status text, insertiondate timeuuid, userid long, PRIMARY KEY (status,insertiondate)); the widerow serves to track user insertion status (status : {TODO, IMPORTED,CHECKED}). The read pattern is always: SELECT userid FROM widerow WHERE status = 'xxx' AND insertiondate{last_processed_user_insertion_date} I'll be interested by your insights and remarks about this data model. Regards Duy Hai DOAN
exception during add node due to test beforeAppend on SSTableWriter
4 node, byte ordered, LCS, 3 Compaction Executors, replication factor 1 Code is 2.0.4 version but with patch for CASSANDRA-6638https://issues.apache.org/jira/browse/CASSANDRA-6638 However, no cleanup is run so patch should not play a roll 4 node cluster is started and insert/queries are done up to about only 10 GB of data on each node. Then decommission one node, and delete local files. Then add node again. Exception : see below. Any idea? Regards, Ignace Desimpel * 2014-01-31 17:12:02.600 == Bootstrap is streaming data from other nodes... Please wait ... * 2014-01-31 17:12:02.600 == Bootstrap stream state : rx= 29.00 tx= 100.00 Please wait ... * 2014-01-31 17:12:18.908 Enqueuing flush of Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops) * 2014-01-31 17:12:18.908 Writing Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops) * 2014-01-31 17:12:19.009 Completed flushing ../../../../data/cdi.cassandra.cdi/dbdatafile/system/compactions_in_progress/system-compactions_in_progress-jb-74-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1391184546183, position=561494) * 2014-01-31 17:12:19.018 Exception in thread Thread[CompactionExecutor:1,1,main] * java.lang.RuntimeException: Last written key DecoratedKey(8afc9237010380178575, 8afc9237010380178575) = current key DecoratedKey(6e0bb955010383dfdd1d, 6e0bb955010383dfdd1d) writing into /media/datadrive1/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardLongFunction/Ks100K-ForwardLongFunction-tmp-jb-159-Data.db * at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_40] * at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40] * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_40] * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_40] * at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
exception during add node due to test beforeAppend on SSTableWriter
The join with auto bootstrap itself was finished. So I restarted the added node. During restart I saw a message indicating that something is wrong about this row and sstable. Of course, in my case I did not drop sstable from another node. But I did decommission and add the node, so that is still a kind of 'data-from-another-node'. At level 2, SSTableReader(path='../../../../data/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardStringFunction/Ks100K-ForwardStringFunction-jb-67-Data.db') [DecoratedKey(065864ce01024e4e505300, 065864ce01024e4e505300), DecoratedKey(14c9d35e0102646973706f736974696f6e7300, 14c9d35e0102646973706f736974696f6e7300)] overlaps SSTableReader(path='../../../../data/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardStringFunction/Ks100K-ForwardStringFunction-jb-64-Data.db') [DecoratedKey(068c2e4101024d6f64616c207665726200, 068c2e4101024d6f64616c207665726200), DecoratedKey(06c566b4010244657465726d696e657200, 06c566b4010244657465726d696e657200)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable From: Desimpel, Ignace Sent: vrijdag 31 januari 2014 17:43 To: user@cassandra.apache.org Subject: exception during add node due to test beforeAppend on SSTableWriter 4 node, byte ordered, LCS, 3 Compaction Executors, replication factor 1 Code is 2.0.4 version but with patch for CASSANDRA-6638https://issues.apache.org/jira/browse/CASSANDRA-6638 However, no cleanup is run so patch should not play a roll 4 node cluster is started and insert/queries are done up to about only 10 GB of data on each node. Then decommission one node, and delete local files. Then add node again. Exception : see below. Any idea? Regards, Ignace Desimpel * 2014-01-31 17:12:02.600 == Bootstrap is streaming data from other nodes... Please wait ... * 2014-01-31 17:12:02.600 == Bootstrap stream state : rx= 29.00 tx= 100.00 Please wait ... * 2014-01-31 17:12:18.908 Enqueuing flush of Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops) * 2014-01-31 17:12:18.908 Writing Memtable-compactions_in_progress@350895652(0/0 serialized/live bytes, 1 ops) * 2014-01-31 17:12:19.009 Completed flushing ../../../../data/cdi.cassandra.cdi/dbdatafile/system/compactions_in_progress/system-compactions_in_progress-jb-74-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1391184546183, position=561494) * 2014-01-31 17:12:19.018 Exception in thread Thread[CompactionExecutor:1,1,main] * java.lang.RuntimeException: Last written key DecoratedKey(8afc9237010380178575, 8afc9237010380178575) = current key DecoratedKey(6e0bb955010383dfdd1d, 6e0bb955010383dfdd1d) writing into /media/datadrive1/cdi.cassandra.cdi/dbdatafile/Ks100K/ForwardLongFunction/Ks100K-ForwardLongFunction-tmp-jb-159-Data.db * at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] * at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_40] * at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40] * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_40] * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_40] * at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
Question: ConsistencyLevel.ONE with multiple datacenters
Hey, When adding a new data center to our production C* datacenter using the procedure described in [1], some of our application requests were returning null/empty values. Rebuild was not complete in the new datacenter, so my guess is that some requests were being directed to the brand new datacenter which still didn't have the data. Our Hector client was connected only to the original nodes, with autoDiscoverHosts=false and we use ConsistencyLevel.ONE for reads. The keyspace schema was already configured to use both data centers. My question is: is it possible that the dynamic snitch is choosing the nodes in the new (empty) datacenter when CL=ONE? In this case, it's mandatory to use CL=LOCAL_ONE during bootstrap/rebuild of a new datacenter, otherwise empty data might be returned, correct? Cheers, [1] http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/operations/ops_add_dc_to_cluster_t.html -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200 +55 83 9690-1314
Re: Ultra wide row anti pattern
On Fri, Jan 31, 2014 at 6:52 AM, DuyHai Doan doanduy...@gmail.com wrote: 4) hard limit of 2*10⁹ columns per physical row b. maximum number of items to be processed is 24*10⁶, far below the hard limit of 2*10⁹ columns so point 4) does not apply either Before discarding this point, try writing an example row this large with your actual keys and column names and values, and then reading it and compacting it. Given the lack of need for durability, I suggest turning durable_writes off. =Rob
Re: Reverting from VirtualNode
On Fri, Jan 31, 2014 at 5:08 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Once we set nodes to act as virtualnodes, there is an way to revert to manual assigned token? On a given node? My understanding is that there is no officially supported way. You now have 256 contiguous tokens. You can decommission the node, have it stream all its data to other nodes, and then re-add it with a single token. You could also removetoken (or better (?) unsafeassassinate) each of the tokens and then restart the node with auto_bootstrap:false to join the node at its old token, then repair it. That would probably work because your node still has the data for the ranges it had before you converted it, but has a risk of stale reads at CL.ONE before repair completes. The benefit here is that you avoid bootstrapping data that you already have on your node. https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/ =Rob
Re: Ultra wide row anti pattern
Durable write has been removed from the entire keyspace already. I'll run a bench on a 24*10⁶ wide row and give feedback soon On Fri, Jan 31, 2014 at 7:55 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jan 31, 2014 at 6:52 AM, DuyHai Doan doanduy...@gmail.com wrote: 4) hard limit of 2*10⁹ columns per physical row b. maximum number of items to be processed is 24*10⁶, far below the hard limit of 2*10⁹ columns so point 4) does not apply either Before discarding this point, try writing an example row this large with your actual keys and column names and values, and then reading it and compacting it. Given the lack of need for durability, I suggest turning durable_writes off. =Rob
Fwd: {kundera-discuss} Kundera 2.10 released
fyi -- Forwarded message -- From: Vivek Mishra vivek.mis...@impetus.co.in Date: Sat, Feb 1, 2014 at 1:18 AM Subject: {kundera-discuss} Kundera 2.10 released To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com Hi All, We are happy to announce the Kundera 2.10 release. Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB, Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational databases. Major Changes: == 1) Support added for bean validation. Github Bug Fixes: === https://github.com/impetus-opensource/Kundera/issues/208 https://github.com/impetus-opensource/Kundera/issues/380 https://github.com/impetus-opensource/Kundera/issues/408 https://github.com/impetus-opensource/Kundera/issues/453 https://github.com/impetus-opensource/Kundera/issues/454 https://github.com/impetus-opensource/Kundera/issues/456 https://github.com/impetus-opensource/Kundera/issues/460 https://github.com/impetus-opensource/Kundera/issues/465 https://github.com/impetus-opensource/Kundera/issues/476 https://github.com/impetus-opensource/Kundera/issues/478 https://github.com/impetus-opensource/Kundera/issues/479 https://github.com/impetus-opensource/Kundera/issues/484 https://github.com/impetus-opensource/Kundera/issues/494 https://github.com/impetus-opensource/Kundera/issues/509 https://github.com/impetus-opensource/Kundera/issues/514 https://github.com/impetus-opensource/Kundera/issues/516 https://github.com/impetus-opensource/Kundera/issues/517 https://github.com/impetus-opensource/Kundera/issues/518 How to Download: To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest released tag version is 2.10 Kundera maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus Sample codes and examples for using Kundera can be found here: https://github.com/impetus-opensource/Kundera/tree/trunk/src/kundera-tests Survey/Feedback: http://www.surveymonkey.com/s/BMB9PWG Thank you all for your contributions and using Kundera! PS: Group artifactId has been changed with 2.9.1 release onward. Please refer https://github.com/impetus-opensource/Kundera/blob/trunk/src/README.md#notefor the same. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- You received this message because you are subscribed to the Google Groups kundera-discuss group. To unsubscribe from this group and stop receiving emails from it, send an email to kundera-discuss+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Ultra wide row anti pattern
The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. Of course, splitting wide row into several rows using bucketing technique is one solution but it forces us to keep track of the bucket number and it's not convenient. We have one process (jvm) that insert data and another process (jvm) that read data. Using bucketing, we need to synchronize the bucket number between the 2 processes. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Ultra wide row anti pattern
Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using *composite partition key*, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those *m* users with 'TODO:*n*', with *n* higher than the current bucket number so it can be processed later. *But then it screws up all the counting system*. Process A which inserts data will not know that there are already *m* users in row *n*, so will happily add 100 000 columns, making the row size grow to *100 000 + m. *When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing *m* elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. Of course, splitting wide row into several rows using bucketing technique is one solution but it forces us to keep track of the bucket number and it's not convenient. We have one process (jvm) that insert data and another process (jvm) that read data. Using bucketing, we need to synchronize the bucket number between the 2 processes. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com