Re: Bootstrap performance.
On Mon, Apr 20, 2015 at 8:09 PM, Dikang Gu dikan...@gmail.com wrote: Why do you say steaming is single threaded? I see a lot of background streaming threads running, for example: Imprecise : Each stream is a single thread. As I said, first place to look is throttles... but I would not be surprised if the overall number of threads available to streaming is a meaningful bound. =Rob
Re: Handle Write Heavy Loads in Cassandra 2.0.3
Thanks Brice!! We are using Red Hat Linux 6.4..24 cores...64Gb Ram..SSDs in RAID5..CPU are not overloaded even in peak load..I dont think IO is an issue as iostat shows await17 all times..util attrbute in iostat usually increases from 0 to 100..and comes back immediately..m not an expert on analyzing IO but things look ok..We are using STCS..and not using Logged batches..We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row. We have further reduced in_memory_compaction_limit_in_mb to 125.Though still getting logs saying compacting large row. We are planning to upgrade to 2.0.14 as 2.1 is not yet production ready. I would appreciate if you could answer the queries posted in initial mail. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Brice Dutheil brice.duth...@gmail.com Date:Tue, 21 Apr, 2015 at 10:22 pm Subject:Re: Handle Write Heavy Loads in Cassandra 2.0.3 This is an intricate matter, I cannot say for sure what are good parameters from the wrong ones, too many things changed at once. However there’s many things to consider What is your OS ?Do your nodes have SSDs or mechanical drives ? How many cores do you have ?Is it the CPUs or IOs that are overloaded ?What is the write request/s per node and cluster wide ?What is the compaction strategy of the tables you are writing into ?Are you using LOGGED BATCH statement. With heavy writes, it is NOT recommend to use LOGGED BATCH statements. In our 2.0.14 cluster we have experimented node unavailability due to long Full GC pauses. We discovered bogus legacy data, a single outlier was so wrong that it updated hundred thousand time the same CQL rows with duplicate data. Given the tables we were writing to were configured to use LCS, this resulted in keeping Memtables in memory long enough to promote them in the old generation (the MaxTenuringThreshold default is 1). Handling this data proved to be the thing to fix, with default GC settings the cluster (10 nodes) handle 39 write requests/s. Note Memtables are allocated on heap with 2.0.x. With 2.1.x they will be allocated off-heap. -- Brice On Tue, Apr 21, 2015 at 5:12 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Any suggestions or comments on this one?? Thanks Anuj Wadhera Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 11:51 pm Subject:Re: Handle Write Heavy Loads in Cassandra 2.0.3 Small correction: we are making writes in 5 cf an reading frm one at high speeds. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 7:53 pm Subject:Handle Write Heavy Loads in Cassandra 2.0.3 Hi, Recently, we discovered that millions of mutations were getting dropped on our cluster. Eventually, we solved this problem by increasing the value of memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. New changes also include: concurrent_compactors: 12 (earlier it was default) compaction_throughput_mb_per_sec: 32(earlier it was default) in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) memtable_flush_writers: 3 (earlier 1) After, making above changes, our write heavy workload scenarios started giving promotion failed exceptions in gc logs. We have done JVM tuning and Cassandra config changes to solve this: MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) HEAP_NEWSIZE=3G JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write load and we thought that minor collections were directly promoting objects to Tenured generation) JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving from Eden to Tenured on each minor collection..may be related to medium life objects related to Memtables and compactions as suggested by heapdump) JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent failures we reduced value) Cassandra config: compaction_throughput_mb_per_sec: 24 memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which creates more long lived objects) Questions: 1.
Re: Cassandra tombstones being created by updating rows with TTL's
Whats ur sstable count for the CF? I hope compactions are working fine. Also check the full stacktrace of FileNotFoundException ..if its related to compactionyou can try cleaning compactions_in_progress folder in system folder in data directory..there are JIRA issues relating to that. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Laing, Michael michael.la...@nytimes.com Date:Tue, 21 Apr, 2015 at 10:21 pm Subject:Re: Cassandra tombstones being created by updating rows with TTL's Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense. We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen stephen.wa...@aspect.com wrote: Thanks for all your help Michael, Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear. I’d imagine the entire table maybe expire and start over 7-10 times a day. But on the GC topic, now java Driver now gives this error on the query I also get “Request did not complete within rpc_timeout.” In cqlsh. # com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na] Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na] # These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL). Also seeing a lot of this this stuff in the log file # ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main] java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db Maybe this is a 1 step back 2 steps forward approach? Any ideas? From: Laing, Michael [mailto:michael.la...@nytimes.com] Sent: 21 April 2015 17:09 To: user@cassandra.apache.org Subject: Re: Cassandra tombstones being created by updating rows with TTL's Discussions previously on the list show why this is not a problem in much more detail. If something changes in your cluster: node down, new node, etc - you run repair for sure. We also run periodic repairs prophylactically. But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds. On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: Maybe thanks Michael, I will give these setting a go, How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also.
Re: LCS Strategy, compaction pending tasks keep increasing
sorry i take that back we will modify different keys across threads not the same key, our storm topology is going to use field grouping to get updates for same keys to same set of bolts. On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com wrote: @Bruice : I dont think so as i am giving each thread a specific key range with no overlaps this does not seem to be the case now. However we will have to test where we have to modify the same key across threads -- do u think that will cause a problem ? As far as i have read LCS is recommended for such cases. should i just switch back to SizeTiredCompactionStrategy. On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com wrote: Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
@Bruice : I dont think so as i am giving each thread a specific key range with no overlaps this does not seem to be the case now. However we will have to test where we have to modify the same key across threads -- do u think that will cause a problem ? As far as i have read LCS is recommended for such cases. should i just switch back to SizeTiredCompactionStrategy. On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com wrote: Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
I’m not sure I get everything about storm stuff, but my understanding of LCS is that compaction count may increase the more one update data (that’s why I was wondering about duplicate primary keys). Another option is that the code is sending too much write request/s to the cassandra cluster. I don’t know haw many nodes you have, but the less node there is the more compactions. Also I’d look at the CPU / load, maybe the config is too *restrictive*, look at the following properties in the cassandra.yaml - compaction_throughput_mb_per_sec, by default the value is 16, you may want to increase it but be careful on mechanical drives, if already in SSD IO is rarely the issue, we have 64 (with SSDs) - multithreaded_compaction by default it is false, we enabled it. Compaction thread are niced, so it shouldn’t be much an issue for serving production r/w requests. But you never know, always keep an eye on IO and CPU. — Brice On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal anis...@gmail.com wrote: sorry i take that back we will modify different keys across threads not the same key, our storm topology is going to use field grouping to get updates for same keys to same set of bolts. On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com wrote: @Bruice : I dont think so as i am giving each thread a specific key range with no overlaps this does not seem to be the case now. However we will have to test where we have to modify the same key across threads -- do u think that will cause a problem ? As far as i have read LCS is recommended for such cases. should i just switch back to SizeTiredCompactionStrategy. On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com wrote: Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Cassandra tombstones being created by updating rows with TTL's
We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14 To Summarize We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered We use 1 keyspace with 1 table Each row have about 40 columns Each row has a TTL of 10 seconds We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead) We query the entire table once per second **This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more. Seems every second we insert, the rows are never deleted by the TTL, or so we thought. After some time we got this message on the query side ### ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 10 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold) ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main] java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException ### So we know tombstones are infact being created. Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds. This worked for 20 seconds, then we saw this ### Read 500 live and 3 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} ### So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones) So now we have the gc_grace_seconds set to 10 seoncds. But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly. What are we doing wrong? We shouldn't increase the tombstone threshold as that is extremely dangerous. Best Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Handle Write Heavy Loads in Cassandra 2.0.3
Any suggestions or comments on this one?? Thanks Anuj Wadhera Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 11:51 pm Subject:Re: Handle Write Heavy Loads in Cassandra 2.0.3 Small correction: we are making writes in 5 cf an reading frm one at high speeds. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 7:53 pm Subject:Handle Write Heavy Loads in Cassandra 2.0.3 Hi, Recently, we discovered that millions of mutations were getting dropped on our cluster. Eventually, we solved this problem by increasing the value of memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. New changes also include: concurrent_compactors: 12 (earlier it was default) compaction_throughput_mb_per_sec: 32(earlier it was default) in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) memtable_flush_writers: 3 (earlier 1) After, making above changes, our write heavy workload scenarios started giving promotion failed exceptions in gc logs. We have done JVM tuning and Cassandra config changes to solve this: MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) HEAP_NEWSIZE=3G JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write load and we thought that minor collections were directly promoting objects to Tenured generation) JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving from Eden to Tenured on each minor collection..may be related to medium life objects related to Memtables and compactions as suggested by heapdump) JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent failures we reduced value) Cassandra config: compaction_throughput_mb_per_sec: 24 memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which creates more long lived objects) Questions: 1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb caused promotion failures in JVM? Does more memtable_flush_writers mean more memtables in memory? 2. Still, objects are getting promoted at high speed to Tenured space. CMS is running on Old gen every 4-5 minutes under heavy write load. Around 750+ minor collections of upto 300ms happened in 45 mins. Do you see any problems with new JVM tuning and Cassandra config? Is the justification given against those changes sounds logical? Any suggestions? 3. What is the best practice for reducing heap fragmentation/promotion failure when allocation and promotion rates are high? Thanks Anuj
Re: Cassandra tombstones being created by updating rows with TTL's
If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14 To Summarize We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered We use 1 keyspace with 1 table Each row have about 40 columns Each row has a TTL of 10 seconds We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead) We query the entire table once per second **This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more. Seems every second we insert, the rows are never deleted by the TTL, or so we thought. After some time we got this message on the query side ### ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 10 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold) ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main] java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException ### So we know tombstones are infact being created. Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds. This worked for 20 seconds, then we saw this ### Read 500 live and 3 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} ### So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones) So now we have the gc_grace_seconds set to 10 seoncds. But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly. What are we doing wrong? We shouldn’t increase the tombstone threshold as that is extremely dangerous. Best Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: LCS Strategy, compaction pending tasks keep increasing
Are you on version 2.1.x? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek -- --
RE: Connecting to Cassandra cluster in AWS from local network
Thanks everyone for the suggestions! I have used the following code to create my cluster from my dev environment and it seems to be working perfectly: cluster = Cluster.*builder* ().addContactPoints(nodes).withAddressTranslater(*new* AddressTranslater() { *public* InetSocketAddress translate(InetSocketAddress address) { String newAddress = *null*; *if*(address != *null* address.getAddress() != *null*) { *if*(address.getHostName().equals( 172.x.x.237)) newAddress = 54.x.x.157; *if*(address.getHostName().equals( 172.x.x.170)) newAddress = 54.x.x.208; *if*(address.getHostName().equals( 172.x.x.150)) newAddress = 54.x.x.142; } *return* *new* InetSocketAddress(newAddress, address.getPort()); } }).build(); Cheers, Matt *From:* Russell Bradberry [mailto:rbradbe...@gmail.com] *Sent:* 20 April 2015 19:06 *To:* user@cassandra.apache.org *Subject:* Re: Connecting to Cassandra cluster in AWS from local network I would like to note that this will require all clients connect over the external IP address. If you have clients within Amazon that need to connect over the private IP address, this would not be possible. If you have a mix of clients that need to connect over private IP address and public, then one of the solutions outlined in https://datastax-oss.atlassian.net/browse/JAVA-145 may be more appropriate. -Russ *From: *Alex Popescu *Reply-To: *user@cassandra.apache.org *Date: *Monday, April 20, 2015 at 2:00 PM *To: *user *Subject: *Re: Connecting to Cassandra cluster in AWS from local network You'll have to configure your nodes to: 1. use AWS internal IPs for inter-node connection (check listen_address) and 2. use the AWS public IP for client-to-node connections (check rpc_address) Depending on the setup, there might be other interesting conf options in cassandra.yaml (broadcast_address, listen_interface, rpc_interface). [1]: http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad j...@jonhaddad.com wrote: Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added* *Connected to cluster: Test Cluster* *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1* *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1* *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042)* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042* *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot connect* Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: *- seeds: sales3* *listen_address: sales3* *rpc_address: sales3* Current setup in other nodes (eg sales2) cassandra.yaml: *- seeds: sales3* *listen_address: sales2*
Re: LCS Strategy, compaction pending tasks keep increasing
Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
RE: Cassandra tombstones being created by updating rows with TTL's
Maybe thanks Michael, I will give these setting a go, How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also. https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair From: Laing, Michael [mailto:michael.la...@nytimes.com] Sent: 21 April 2015 16:26 To: user@cassandra.apache.org Subject: Re: Cassandra tombstones being created by updating rows with TTL's If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote: We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14 To Summarize We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered We use 1 keyspace with 1 table Each row have about 40 columns Each row has a TTL of 10 seconds We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead) We query the entire table once per second **This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more. Seems every second we insert, the rows are never deleted by the TTL, or so we thought. After some time we got this message on the query side ### ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 10 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold) ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main] java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException ### So we know tombstones are infact being created. Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds. This worked for 20 seconds, then we saw this ### Read 500 live and 3 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647tel:2147483647} ### So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones) So now we have the gc_grace_seconds set to 10 seoncds. But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly. What are we doing wrong? We shouldn’t increase the tombstone threshold as that is extremely dangerous. Best Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Cassandra tombstones being created by updating rows with TTL's
Discussions previously on the list show why this is not a problem in much more detail. If something changes in your cluster: node down, new node, etc - you run repair for sure. We also run periodic repairs prophylactically. But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds. On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: Maybe thanks Michael, I will give these setting a go, How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also. https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* 21 April 2015 16:26 *To:* user@cassandra.apache.org *Subject:* Re: Cassandra tombstones being created by updating rows with TTL's If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14 To Summarize We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered We use 1 keyspace with 1 table Each row have about 40 columns Each row has a TTL of 10 seconds We insert about 500 rows per second in a prepared batch** (about 3mb in network overhead) We query the entire table once per second **This is too enable consistent data, E.G batch in transactional, so we get all queried data from one insert and not a mix of 2 or more. Seems every second we insert, the rows are never deleted by the TTL, or so we thought. After some time we got this message on the query side ### ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) Scanned over 10 tombstones in keyspace.table; query aborted (see tombstone_failure_threshold) ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:91,5,main] java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException ### So we know tombstones are infact being created. Solution was to change the table schema and set gc_grace_seconds to run every 60 seconds. This worked for 20 seconds, then we saw this ### Read 500 live and 3 tombstoned cells in keyspace.table (see tombstone_warn_threshold). 1 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} ### So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones) So now we have the gc_grace_seconds set to 10 seoncds. But its feels very wrong to have it at a low number, especially if we move to a larger cluster. This just wont fly. What are we doing wrong? We shouldn’t increase the tombstone threshold as that is extremely dangerous. Best Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Is 2.1.5 ready for upgrade?
Hi guys, We have some issues with streaming in 2.1.2. We find that there are a lot of patches in 2.1.5. Is it ready for upgrade? Thanks. -- Dikang
Re: Is 2.1.5 ready for upgrade?
Robert, Can you elaborate more please? Cheers, Brian On Tuesday, April 21, 2015, Robert Coli rc...@eventbrite.com wrote: On Tue, Apr 21, 2015 at 2:25 PM, Dikang Gu dikan...@gmail.com javascript:_e(%7B%7D,'cvml','dikan...@gmail.com'); wrote: We have some issues with streaming in 2.1.2. We find that there are a lot of patches in 2.1.5. Is it ready for upgrade? I personally would not run either version in production at this time, but if forced, would prefer 2.1.5 over 2.1.2. =Rob -- Cheers, Brian http://www.integrallis.com
Error while building from source code
Hi, I am trying to build a project the source bundled downloaded from http://apache.arvixe.com/cassandra/2.1.4/apache-cassandra-2.1.4-src.tar.gz but when I run ant build I get following error during build. Any idea why I am getting build Failed? Seems looking for dependencies org.apache.cassandra:cassandra-coverage-deps:jar:2.1.4-SNAPSHOT BUILD FAILED /Users/user1/apache-cassandra-2.1.4-src/build.xml:572: Unable to resolve artifact: Missing: -- 1) com.sun:tools:jar:0 Try downloading the file manually from the project website. Then, install it using the command: mvn install:install-file -DgroupId=com.sun -DartifactId=tools -Dversion=0 -Dpackaging=jar -Dfile=/path/to/file Alternatively, if you host your own repository you can deploy the file there: mvn deploy:deploy-file -DgroupId=com.sun -DartifactId=tools -Dversion=0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] Path to dependency: 1) org.apache.cassandra:cassandra-coverage-deps:jar:2.1.4-SNAPSHOT 2) net.sourceforge.cobertura:cobertura:jar:2.0.3 3) com.sun:tools:jar:0 -- 1 required artifact is missing. for artifact: org.apache.cassandra:cassandra-coverage-deps:jar:2.1.4-SNAPSHOT from the specified remote repositories: central (http://repo1.maven.org/maven2) Thanks, Jay
Re: LCS Strategy, compaction pending tasks keep increasing
I want to draw a distinction between a) multithreaded compaction (the jira I just pointed to) and b) concurrent_compactors. I'm not clear on which one you are recommending at this stage. a) Multithreaded compaction is what I warned against in my last note. b) Concurrent compactors is the number of separate compaction tasks (on different tables) that can run simultaneously. You can crank this up without much risk though the old default of num cores was too aggressive (CASSANDRA-7139). 2 seems to be the sweet-spot. Cassandra is, more often than not, disk constrained though this can change for some workloads with SSD's. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil brice.duth...@gmail.com wrote: Oh, thank you Sebastian for this input and the ticket reference ! We did notice an increase in CPU usage, but kept the concurrent compaction low enough for our usage, by default it takes the number of cores. We did use a number up to 30% of our available cores. But under heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper threaded cores per node. In a related topic : I’m a bit concerned by datastax communication, usually people talk about IO as being the weak spot, but in our case it’s more about CPU. Fortunately the Moore law doesn’t really apply anymore vertically, now we have have multi core processors *and* the trend is going that way. Yet Datastax terms feels a bit *antiquated* and maybe a bit too much Oracle-y : http://www.datastax.com/enterprise-terms Node licensing is more appropriate for this century. -- Brice On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: Do not enable multithreaded compaction. Overhead usually outweighs any benefit. It's removed in 2.1 because it harms more than helps: https://issues.apache.org/jira/browse/CASSANDRA-6142 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil brice.duth...@gmail.com wrote: I’m not sure I get everything about storm stuff, but my understanding of LCS is that compaction count may increase the more one update data (that’s why I was wondering about duplicate primary keys). Another option is that the code is sending too much write request/s to the cassandra cluster. I don’t know haw many nodes you have, but the less node there is the more compactions. Also I’d look at the CPU / load, maybe the config is too *restrictive*, look at the following properties in the cassandra.yaml - compaction_throughput_mb_per_sec, by default the value is 16, you may want to increase it but be careful on mechanical drives, if already in SSD IO is rarely the issue, we have 64 (with SSDs) - multithreaded_compaction by default it is false, we enabled it. Compaction thread are niced, so it shouldn’t be much an issue for serving production r/w requests. But you never know, always keep an eye on IO and CPU. — Brice On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal anis...@gmail.com wrote: sorry i take that back we will modify different keys across threads not the same key, our storm topology is going to use field grouping to get updates for same keys to same set of bolts. On Tue, Apr 21, 2015 at 6:17 PM,
Re: LCS Strategy, compaction pending tasks keep increasing
Do not enable multithreaded compaction. Overhead usually outweighs any benefit. It's removed in 2.1 because it harms more than helps: https://issues.apache.org/jira/browse/CASSANDRA-6142 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil brice.duth...@gmail.com wrote: I’m not sure I get everything about storm stuff, but my understanding of LCS is that compaction count may increase the more one update data (that’s why I was wondering about duplicate primary keys). Another option is that the code is sending too much write request/s to the cassandra cluster. I don’t know haw many nodes you have, but the less node there is the more compactions. Also I’d look at the CPU / load, maybe the config is too *restrictive*, look at the following properties in the cassandra.yaml - compaction_throughput_mb_per_sec, by default the value is 16, you may want to increase it but be careful on mechanical drives, if already in SSD IO is rarely the issue, we have 64 (with SSDs) - multithreaded_compaction by default it is false, we enabled it. Compaction thread are niced, so it shouldn’t be much an issue for serving production r/w requests. But you never know, always keep an eye on IO and CPU. — Brice On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal anis...@gmail.com wrote: sorry i take that back we will modify different keys across threads not the same key, our storm topology is going to use field grouping to get updates for same keys to same set of bolts. On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com wrote: @Bruice : I dont think so as i am giving each thread a specific key range with no overlaps this does not seem to be the case now. However we will have to test where we have to modify the same key across threads -- do u think that will cause a problem ? As far as i have read LCS is recommended for such cases. should i just switch back to SizeTiredCompactionStrategy. On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com wrote: Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: Handle Write Heavy Loads in Cassandra 2.0.3
Hi, I cannot really answer your question as some rock solid truth. When we had problems, we did mainly two things - Analyzed the GC logs (with censum from jClarity, this tool IS really awesome, it’s good investment even better if the production is running other java applications) - Heap dumped cassandra when there was a GC, this helped in narrowing down the actual issue I don’t know precisely how to answer, but : - concurrent_compactors could be lowered to 10, it seems from another thread here that it can be harmful, see https://issues.apache.org/jira/browse/CASSANDRA-6142 - memtable_flush_writers we set it to 2 - compaction_throughput_mb_per_sec could probably be increased, on SSDs that should help - trickle_fsync don’t forget this one too if you’re on SSDs Touching JVM heap parameters can be hazardous, increasing heap may seem like a nice thing, but it can increase GC time in the worst case scenario. Also increasing the MaxTenuringThreshold is probably wrong too, as you probably know it means objects will be copied from Eden to Survivor 0/1 and to the other Survivor on the next collection until that threshold is reached, then it will be copied in Old generation. That means that’s being applied to Memtables, so it *may* mean several copies to be done on each GCs, and memtables are not small objects that could take a little while for an *available* system. Another fact to take account for is that upon each collection the active survivor S0/S1 has to be big enough for the memtable to fit there, and there’s other objects too. So I would rather work on the real cause. rather than GC. One thing brought my attention Though still getting logs saying “compacting large row”. Could it be that the model is based on wide rows ? That could be a problem, for several reasons not limited to compactions. If that is so I’d advise to revise the datamodel -- Brice On Tue, Apr 21, 2015 at 7:53 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Thanks Brice!! We are using Red Hat Linux 6.4..24 cores...64Gb Ram..SSDs in RAID5..CPU are not overloaded even in peak load..I dont think IO is an issue as iostat shows await17 all times..util attrbute in iostat usually increases from 0 to 100..and comes back immediately..m not an expert on analyzing IO but things look ok..We are using STCS..and not using Logged batches..We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row. We have further reduced in_memory_compaction_limit_in_mb to 125.Though still getting logs saying compacting large row. We are planning to upgrade to 2.0.14 as 2.1 is not yet production ready. I would appreciate if you could answer the queries posted in initial mail. Thanks Anuj Wadehra Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- *From*:Brice Dutheil brice.duth...@gmail.com *Date*:Tue, 21 Apr, 2015 at 10:22 pm *Subject*:Re: Handle Write Heavy Loads in Cassandra 2.0.3 This is an intricate matter, I cannot say for sure what are good parameters from the wrong ones, too many things changed at once. However there’s many things to consider - What is your OS ? - Do your nodes have SSDs or mechanical drives ? How many cores do you have ? - Is it the CPUs or IOs that are overloaded ? - What is the write request/s per node and cluster wide ? - What is the compaction strategy of the tables you are writing into ? - Are you using LOGGED BATCH statement. With heavy writes, it is *NOT* recommend to use LOGGED BATCH statements. In our 2.0.14 cluster we have experimented node unavailability due to long Full GC pauses. We discovered bogus legacy data, a single outlier was so wrong that it updated hundred thousand time the same CQL rows with duplicate data. Given the tables we were writing to were configured to use LCS, this resulted in keeping Memtables in memory long enough to promote them in the old generation (the MaxTenuringThreshold default is 1). Handling this data proved to be the thing to fix, with default GC settings the cluster (10 nodes) handle 39 write requests/s. Note Memtables are allocated on heap with 2.0.x. With 2.1.x they will be allocated off-heap. -- Brice On Tue, Apr 21, 2015 at 5:12 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Any suggestions or comments on this one?? Thanks Anuj Wadhera Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- *From*:Anuj Wadehra anujw_2...@yahoo.co.in *Date*:Mon, 20 Apr, 2015 at 11:51 pm *Subject*:Re: Handle Write Heavy Loads in Cassandra 2.0.3 Small correction: we are making writes in 5 cf an reading frm one at high speeds. Thanks Anuj Wadehra Sent from Yahoo Mail on Android
Cluster imbalance caused due to #Num_Tokens
Hi, While setting up a cluster for our POC, when we installed Cassandra on the 1st node we gave num_tokens: 256 , while on next 2 nodes which were added later we left it blank in Cassandra.yaml. This made our cluster an unbalanced one with nodetool status showing 99% load on one server. Now even if I am setting up num tokens in the other 2 nodes as 256, its not seeming to effect. The wiki article http://wiki.apache.org/cassandra/VirtualNodes/Balance doesn't seem to provide steps to correct from this situation. I read that there was nodetool balance kind of command in Cassandra 0.7 but not anymore. UN Node3 23.72 MB 1 0.4% 41a71df-7e6c-40ab-902f-237697eaaf3e rack1 UN Node2 79.35 MB 1 0.5% 98c493b-f661-491e-9d1f-1803f859528b rack1 UN Node1 86.93 MB 256 99.1% a35ccca-556c-4f77-aa6d-7e3dad41ecf8 rack1 Is there something that we can do now balance the cluster? Regards, Tarun
Re: LCS Strategy, compaction pending tasks keep increasing
Thanks Brice for the input, I am confused as to how to calculate the value of concurrent_read, following is what i found recommended on sites and in configuration docs. concurrent_read : some places its 16 X number of drives or 4 X number of cores which of the above should i pick ? i have 40 core cpu with 3 disks(non ssd) one used for commitlog and other two for data directories, I am having 3 nodes in my cluster. I think there are tools out there that allow the max write speed to disk, i am going to run them too to find out the write throughput i can get to see that i am not trying to overachieve something, currently we are stuck at 35MBps @Sebastian the concurrent_compactors is at default value of 32 for us and i think that should be fine. Since we had lot of cores i thought it would be better to use multithreaded_compaction but i think i will try one set with it turned off again. Question is still, how do i find what write load should i aim for per node such that it is able to compact data while inserting, is it just try and error ? or there is a certain QPS i can target for per node ? Our business case is -- new client comes and create a new keyspace for him, initially there will be lots of new keys ( i think size tired might work better here) -- as time progresses we are going to update the existing keys very frequently ( i think LCS will work better here -- we are going with this strategy for long term benefit) On Wed, Apr 22, 2015 at 4:17 AM, Brice Dutheil brice.duth...@gmail.com wrote: Yes I was referring referring to multithreaded_compaction, but just because we didn’t get bitten by this setting just doesn’t mean it’s right, and the jira is a clear indication of that ;) @Anishek that reminds me of these settings to look at as well: - concurrent_write and concurrent_read both need to be adapted to your actual hardware though. Cassandra is, more often than not, disk constrained though this can change for some workloads with SSD’s. Yes that is typically the case, SSDs are more and more commons but so are multi-core CPUs and the trend to multiple cores is not going to stop ; just look at the next Intel *flagship* : Knights Landing http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed = *72 cores*. Nowadays it is not rare to have boxes with multicore CPU, either way if they are not used because of some IO bottleneck there’s no reason to be licensed for that, and if IO is not an issue the CPUs are most probably next in line. While node is much more about a combination of that plus much more added value like the linear scaling of Cassandra. And I’m not even listing the other nifty integration that DSE ships in. But on this matter I believe we shouldn’t hijack the original thread purpose. — Brice On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez [sebastian.este...@datastax.com](mailto:sebastian.este...@datastax.com) http://mailto:%5bsebastian.este...@datastax.com%5D(mailto:sebastian.este...@datastax.com) wrote: I want to draw a distinction between a) multithreaded compaction (the jira I just pointed to) and b) concurrent_compactors. I'm not clear on which one you are recommending at this stage. a) Multithreaded compaction is what I warned against in my last note. b) Concurrent compactors is the number of separate compaction tasks (on different tables) that can run simultaneously. You can crank this up without much risk though the old default of num cores was too aggressive (CASSANDRA-7139). 2 seems to be the sweet-spot. Cassandra is, more often than not, disk constrained though this can change for some workloads with SSD's. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil brice.duth...@gmail.com wrote: Oh, thank you Sebastian for this input and the ticket reference ! We did notice an increase in CPU usage, but kept the concurrent compaction low enough for our usage, by default it takes the number of cores. We did use a number up to 30% of our available cores. But under heavy load clearly CPU is the bottleneck and we
Re: LCS Strategy, compaction pending tasks keep increasing
Yes I was referring referring to multithreaded_compaction, but just because we didn’t get bitten by this setting just doesn’t mean it’s right, and the jira is a clear indication of that ;) @Anishek that reminds me of these settings to look at as well: - concurrent_write and concurrent_read both need to be adapted to your actual hardware though. Cassandra is, more often than not, disk constrained though this can change for some workloads with SSD’s. Yes that is typically the case, SSDs are more and more commons but so are multi-core CPUs and the trend to multiple cores is not going to stop ; just look at the next Intel *flagship* : Knights Landing http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed = *72 cores*. Nowadays it is not rare to have boxes with multicore CPU, either way if they are not used because of some IO bottleneck there’s no reason to be licensed for that, and if IO is not an issue the CPUs are most probably next in line. While node is much more about a combination of that plus much more added value like the linear scaling of Cassandra. And I’m not even listing the other nifty integration that DSE ships in. But on this matter I believe we shouldn’t hijack the original thread purpose. — Brice On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez [sebastian.este...@datastax.com](mailto:sebastian.este...@datastax.com) http://mailto:[sebastian.este...@datastax.com](mailto:sebastian.este...@datastax.com) wrote: I want to draw a distinction between a) multithreaded compaction (the jira I just pointed to) and b) concurrent_compactors. I'm not clear on which one you are recommending at this stage. a) Multithreaded compaction is what I warned against in my last note. b) Concurrent compactors is the number of separate compaction tasks (on different tables) that can run simultaneously. You can crank this up without much risk though the old default of num cores was too aggressive (CASSANDRA-7139). 2 seems to be the sweet-spot. Cassandra is, more often than not, disk constrained though this can change for some workloads with SSD's. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil brice.duth...@gmail.com wrote: Oh, thank you Sebastian for this input and the ticket reference ! We did notice an increase in CPU usage, but kept the concurrent compaction low enough for our usage, by default it takes the number of cores. We did use a number up to 30% of our available cores. But under heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper threaded cores per node. In a related topic : I’m a bit concerned by datastax communication, usually people talk about IO as being the weak spot, but in our case it’s more about CPU. Fortunately the Moore law doesn’t really apply anymore vertically, now we have have multi core processors *and* the trend is going that way. Yet Datastax terms feels a bit *antiquated* and maybe a bit too much Oracle-y : http://www.datastax.com/enterprise-terms Node licensing is more appropriate for this century. -- Brice On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: Do not enable multithreaded compaction. Overhead usually outweighs any benefit. It's removed in 2.1 because it harms more than helps: https://issues.apache.org/jira/browse/CASSANDRA-6142 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45
Re: LCS Strategy, compaction pending tasks keep increasing
@Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Network transfer to one node twice as others
Hello, We are using cassandra 2.0.14 and have a cluster of 3 nodes. I have a writer test (written in java) that runs 50 threads to populate data to a single table in a single keyspace. when i look at the iftop I see that the amount of network transfer happening on two nodes is same but on one of the nodes its almost 2ice as the other two, Any reason that would be the case ? Thanks Anishek
Re: LCS Strategy, compaction pending tasks keep increasing
nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
LCS Strategy, compaction pending tasks keep increasing
Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
I am on version 2.0.14, will update once i get the stats up for the writes again On Tue, Apr 21, 2015 at 4:46 PM, Carlos Rolo r...@pythian.com wrote: Are you on version 2.1.x? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek --
Re: Cassandra tombstones being created by updating rows with TTL's
Hmm - we read/write with Local Quorum always - I'd recommend that as that is your 'consistency' defense. We use python, so I am not familiar with the java driver - but 'file not found' indicates something is inconsistent. On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen stephen.wa...@aspect.com wrote: Thanks for all your help Michael, Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear. I’d imagine the entire table maybe expire and start over 7-10 times a day. But on the GC topic, now java Driver now gives this error on the query I also get “Request did not complete within rpc_timeout.” In cqlsh. # com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na] Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na] # These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL). Also seeing a lot of this this stuff in the log file # ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main] java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db Maybe this is a 1 step back 2 steps forward approach? Any ideas? *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* 21 April 2015 17:09 *To:* user@cassandra.apache.org *Subject:* Re: Cassandra tombstones being created by updating rows with TTL's Discussions previously on the list show why this is not a problem in much more detail. If something changes in your cluster: node down, new node, etc - you run repair for sure. We also run periodic repairs prophylactically. But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds. On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: Maybe thanks Michael, I will give these setting a go, How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also. https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* 21 April 2015 16:26 *To:* user@cassandra.apache.org *Subject:* Re: Cassandra tombstones being created by updating rows with TTL's If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the
Re: Handle Write Heavy Loads in Cassandra 2.0.3
This is an intricate matter, I cannot say for sure what are good parameters from the wrong ones, too many things changed at once. However there’s many things to consider - What is your OS ? - Do your nodes have SSDs or mechanical drives ? How many cores do you have ? - Is it the CPUs or IOs that are overloaded ? - What is the write request/s per node and cluster wide ? - What is the compaction strategy of the tables you are writing into ? - Are you using LOGGED BATCH statement. With heavy writes, it is *NOT* recommend to use LOGGED BATCH statements. In our 2.0.14 cluster we have experimented node unavailability due to long Full GC pauses. We discovered bogus legacy data, a single outlier was so wrong that it updated hundred thousand time the same CQL rows with duplicate data. Given the tables we were writing to were configured to use LCS, this resulted in keeping Memtables in memory long enough to promote them in the old generation (the MaxTenuringThreshold default is 1). Handling this data proved to be the thing to fix, with default GC settings the cluster (10 nodes) handle 39 write requests/s. Note Memtables are allocated on heap with 2.0.x. With 2.1.x they will be allocated off-heap. -- Brice On Tue, Apr 21, 2015 at 5:12 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Any suggestions or comments on this one?? Thanks Anuj Wadhera Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- *From*:Anuj Wadehra anujw_2...@yahoo.co.in *Date*:Mon, 20 Apr, 2015 at 11:51 pm *Subject*:Re: Handle Write Heavy Loads in Cassandra 2.0.3 Small correction: we are making writes in 5 cf an reading frm one at high speeds. Thanks Anuj Wadehra Sent from Yahoo Mail on Android https://overview.mail.yahoo.com/mobile/?.src=Android -- *From*:Anuj Wadehra anujw_2...@yahoo.co.in *Date*:Mon, 20 Apr, 2015 at 7:53 pm *Subject*:Handle Write Heavy Loads in Cassandra 2.0.3 Hi, Recently, we discovered that millions of mutations were getting dropped on our cluster. Eventually, we solved this problem by increasing the value of memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. New changes also include: concurrent_compactors: 12 (earlier it was default) compaction_throughput_mb_per_sec: 32(earlier it was default) in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) memtable_flush_writers: 3 (earlier 1) After, making above changes, our write heavy workload scenarios started giving promotion failed exceptions in gc logs. We have done JVM tuning and Cassandra config changes to solve this: MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) HEAP_NEWSIZE=3G JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write load and we thought that minor collections were directly promoting objects to Tenured generation) JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving from Eden to Tenured on each minor collection..may be related to medium life objects related to Memtables and compactions as suggested by heapdump) JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent failures we reduced value) Cassandra config: compaction_throughput_mb_per_sec: 24 memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which creates more long lived objects) Questions: 1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb caused promotion failures in JVM? Does more memtable_flush_writers mean more memtables in memory? 2. Still, objects are getting promoted at high speed to Tenured space. CMS is running on Old gen every 4-5 minutes under heavy write load. Around 750+ minor collections of upto 300ms happened in 45 mins. Do you see any problems with new JVM tuning and Cassandra config? Is the justification given against those changes sounds logical? Any suggestions? 3. What is the best practice for reducing heap fragmentation/promotion failure when allocation and promotion rates are high? Thanks Anuj
Re: CQL 3.x Update ...USING TIMESTAMP...
On Mon, Apr 20, 2015 at 4:02 PM, Sachin Nikam skni...@gmail.com wrote: #1. We have 2 data centers located close by with plans to expand to more data centers which are even further away geographically. #2. How will this impact light weight transactions when there is high level of network contention for cross data center traffic. If you are only expecting updates to a given document from one DC, then you could use LOCAL_SERIAL for the LWT operations. If you can't do that, then LWT are probably not a great option for you. #3. Do you know of any real examples where companies have used light weight transactions in a multi-data center traffic. I don't know who's doing that off the top of my head, but I imagine they're using LOCAL_SERIAL. -- Tyler Hobbs DataStax http://datastax.com/
Re: Is 2.1.5 ready for upgrade?
On Tue, Apr 21, 2015 at 2:25 PM, Dikang Gu dikan...@gmail.com wrote: We have some issues with streaming in 2.1.2. We find that there are a lot of patches in 2.1.5. Is it ready for upgrade? I personally would not run either version in production at this time, but if forced, would prefer 2.1.5 over 2.1.2. =Rob
Re: LCS Strategy, compaction pending tasks keep increasing
Oh, thank you Sebastian for this input and the ticket reference ! We did notice an increase in CPU usage, but kept the concurrent compaction low enough for our usage, by default it takes the number of cores. We did use a number up to 30% of our available cores. But under heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper threaded cores per node. In a related topic : I’m a bit concerned by datastax communication, usually people talk about IO as being the weak spot, but in our case it’s more about CPU. Fortunately the Moore law doesn’t really apply anymore vertically, now we have have multi core processors *and* the trend is going that way. Yet Datastax terms feels a bit *antiquated* and maybe a bit too much Oracle-y : http://www.datastax.com/enterprise-terms Node licensing is more appropriate for this century. -- Brice On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: Do not enable multithreaded compaction. Overhead usually outweighs any benefit. It's removed in 2.1 because it harms more than helps: https://issues.apache.org/jira/browse/CASSANDRA-6142 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil brice.duth...@gmail.com wrote: I’m not sure I get everything about storm stuff, but my understanding of LCS is that compaction count may increase the more one update data (that’s why I was wondering about duplicate primary keys). Another option is that the code is sending too much write request/s to the cassandra cluster. I don’t know haw many nodes you have, but the less node there is the more compactions. Also I’d look at the CPU / load, maybe the config is too *restrictive*, look at the following properties in the cassandra.yaml - compaction_throughput_mb_per_sec, by default the value is 16, you may want to increase it but be careful on mechanical drives, if already in SSD IO is rarely the issue, we have 64 (with SSDs) - multithreaded_compaction by default it is false, we enabled it. Compaction thread are niced, so it shouldn’t be much an issue for serving production r/w requests. But you never know, always keep an eye on IO and CPU. — Brice On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal anis...@gmail.com wrote: sorry i take that back we will modify different keys across threads not the same key, our storm topology is going to use field grouping to get updates for same keys to same set of bolts. On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal anis...@gmail.com wrote: @Bruice : I dont think so as i am giving each thread a specific key range with no overlaps this does not seem to be the case now. However we will have to test where we have to modify the same key across threads -- do u think that will cause a problem ? As far as i have read LCS is recommended for such cases. should i just switch back to SizeTiredCompactionStrategy. On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil brice.duth...@gmail.com wrote: Could it that the app is inserting _duplicate_ keys ? -- Brice On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson krum...@gmail.com wrote: nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via
RE: Cassandra tombstones being created by updating rows with TTL's
Thanks for all your help Michael, Our data will change through the day, so data with a TTL will eventually get dropped, and new data will appear. I’d imagine the entire table maybe expire and start over 7-10 times a day. But on the GC topic, now java Driver now gives this error on the query I also get “Request did not complete within rpc_timeout.” In cqlsh. # com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[cassandra-driver-core-2.1.4.jar:na] Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.1.4.jar:na] at com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[cassandra-driver-core-2.1.4.jar:na] # These queries where taking about 1 second to run when the gc was at 10 seconds (same duration as the TTL). Also seeing a lot of this this stuff in the log file # ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:71,5,main] java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such file or directory) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db Maybe this is a 1 step back 2 steps forward approach? Any ideas? From: Laing, Michael [mailto:michael.la...@nytimes.com] Sent: 21 April 2015 17:09 To: user@cassandra.apache.org Subject: Re: Cassandra tombstones being created by updating rows with TTL's Discussions previously on the list show why this is not a problem in much more detail. If something changes in your cluster: node down, new node, etc - you run repair for sure. We also run periodic repairs prophylactically. But if you never delete and always ttl by the same amount, you do not have to worry about zombie data being resurrected - the main reason for running repair within gc_grace_seconds. On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote: Maybe thanks Michael, I will give these setting a go, How do you do you periodic node-tool repairs in the situation, for what I read we need to start doing this also. https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair From: Laing, Michael [mailto:michael.la...@nytimes.commailto:michael.la...@nytimes.com] Sent: 21 April 2015 16:26 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Cassandra tombstones being created by updating rows with TTL's If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote: We were chatting to Jon Haddena about a week ago about our tombstone issue using Cassandra 2.0.14 To Summarize We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered