[jira] [Commented] (CASSANDRA-7957) improve active/pending compaction monitoring
[ https://issues.apache.org/jira/browse/CASSANDRA-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177915#comment-15177915 ] Nikolai Grigoriev commented on CASSANDRA-7957: -- OK, I see your point. Well, then what I was thinking about is simply not doable, I guess. > improve active/pending compaction monitoring > > > Key: CASSANDRA-7957 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7957 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Nikolai Grigoriev >Priority: Minor > > I think it might be useful to create a way to see what sstables are being > compacted into what new sstable. Something like an extension of "nodetool > compactionstats". I think it would be easier with this feature to > troubleshoot and understand how compactions are happening on your data. Not > sure how it is useful in everyday life but I could use such a feature when > dealing with CASSANDRA-7949. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7957) improve active/pending compaction monitoring
[ https://issues.apache.org/jira/browse/CASSANDRA-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175833#comment-15175833 ] Nikolai Grigoriev commented on CASSANDRA-7957: -- I was talking specifically about the *current* status. That's why I mentioned "nodetool compactionstats". I think that in case of Leveled Compaction we could expose a bit more data to make nodetool more informative. Here is the use case. I look at "nodetool compactionstats" and I see that my compaction process is doing almost nothing, like doing one compaction. At the same time, I see hundreds of pending compactions. What I was asking for is the additional information that shows what is in that list of pending compactions and what is blocked by what. I do not think that with the default log settings you can get this info today easily. If you look at CASSANDRA-7949 you will probably better understand why I was looking for this information. > improve active/pending compaction monitoring > > > Key: CASSANDRA-7957 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7957 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Nikolai Grigoriev >Priority: Minor > > I think it might be useful to create a way to see what sstables are being > compacted into what new sstable. Something like an extension of "nodetool > compactionstats". I think it would be easier with this feature to > troubleshoot and understand how compactions are happening on your data. Not > sure how it is useful in everyday life but I could use such a feature when > dealing with CASSANDRA-7949. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8740) java.lang.AssertionError when reading saved cache
Nikolai Grigoriev created CASSANDRA-8740: Summary: java.lang.AssertionError when reading saved cache Key: CASSANDRA-8740 URL: https://issues.apache.org/jira/browse/CASSANDRA-8740 Project: Cassandra Issue Type: Bug Components: Core Environment: OEL 6.5, DSE 4.6.0, Cassandra 2.0.11.83 Reporter: Nikolai Grigoriev I have started seeing it recently. Not sure from which version but now it happens relatively often one some of my nodes. {code} INFO [main] 2015-02-04 18:18:09,253 ColumnFamilyStore.java (line 249) Initializing duo_xxx INFO [main] 2015-02-04 18:18:09,254 AutoSavingCache.java (line 114) reading saved cache /var/lib/cassandra/saved_caches/duo_xxx-RowCach e-b.db ERROR [main] 2015-02-04 18:18:09,256 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.AssertionError at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:41) at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:37) at org.apache.cassandra.cache.SerializingCache.serialize(SerializingCache.java:118) at org.apache.cassandra.cache.SerializingCache.put(SerializingCache.java:177) at org.apache.cassandra.cache.InstrumentingCache.put(InstrumentingCache.java:44) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:130) at org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:592) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:305) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:419) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659) INFO [Thread-2] 2015-02-04 18:18:09,259 DseDaemon.java (line 505) DSE shutting down... ERROR [Thread-2] 2015-02-04 18:18:09,279 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-2,5,main] java.lang.AssertionError at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1274) at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:171) at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:506) at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:408) INFO [main] 2015-02-04 18:18:49,144 CassandraDaemon.java (line 135) Logging initialized INFO [main] 2015-02-04 18:18:49,169 DseDaemon.java (line 382) DSE version: 4.6.0 {code} Cassandra version: 2.0.11.83 (DSE 4.6.0) Looks like similar issues were reported and fixed in the past - like CASSANDRA-6325. Maybe I am missing something, but I think that Cassandra should not crash and stop at startup if it cannot read a saved cache. This does not make the node inoperable and does not necessarily indicate a severe data corruption. I have applied a small change to my cluster config, restarted it and 30% of my nodes did not start because of that. Of course the solution is simple, but it requires to go to every node that failed to start, wipe the cache and start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-6325) AssertionError on startup reading saved Serializing row cache
[ https://issues.apache.org/jira/browse/CASSANDRA-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev reopened CASSANDRA-6325: -- I have started seeing it recently. Not sure from which version but now it happens relatively often one some of my nodes. {code} INFO [main] 2015-02-04 18:18:09,253 ColumnFamilyStore.java (line 249) Initializing duo_xxx INFO [main] 2015-02-04 18:18:09,254 AutoSavingCache.java (line 114) reading saved cache /var/lib/cassandra/saved_caches/duo_xxx-RowCach e-b.db ERROR [main] 2015-02-04 18:18:09,256 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.AssertionError at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:41) at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:37) at org.apache.cassandra.cache.SerializingCache.serialize(SerializingCache.java:118) at org.apache.cassandra.cache.SerializingCache.put(SerializingCache.java:177) at org.apache.cassandra.cache.InstrumentingCache.put(InstrumentingCache.java:44) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:130) at org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:592) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:119) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:305) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:419) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659) INFO [Thread-2] 2015-02-04 18:18:09,259 DseDaemon.java (line 505) DSE shutting down... ERROR [Thread-2] 2015-02-04 18:18:09,279 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-2,5,main] java.lang.AssertionError at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1274) at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:171) at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:506) at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:408) INFO [main] 2015-02-04 18:18:49,144 CassandraDaemon.java (line 135) Logging initialized INFO [main] 2015-02-04 18:18:49,169 DseDaemon.java (line 382) DSE version: 4.6.0 {code} Cassandra version: 2.0.11.83 (DSE 4.6.0) AssertionError on startup reading saved Serializing row cache - Key: CASSANDRA-6325 URL: https://issues.apache.org/jira/browse/CASSANDRA-6325 Project: Cassandra Issue Type: Bug Components: Core Environment: upgrade from 1.2.9ish to 1.2.11ish Reporter: Chris Burroughs Assignee: Mikhail Stepura Priority: Minor Fix For: 1.2.12, 2.0.3 Attachments: 6325-v2.txt, CASSANDRA-1.2-6325.patch I don't see any reason what this could have to do with the upgrade, but don't have a large enough non-prod cluster to just keep restarting on. Occurred on roughly 2 out of 100 restarted nodes. {noformat} ERROR [main] 2013-11-08 14:40:13,535 CassandraDaemon.java (line 482) Exception encountered during startup java.lang.AssertionError at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:41) at org.apache.cassandra.cache.SerializingCacheProvider$RowCacheSerializer.serialize(SerializingCacheProvider.java:37) at org.apache.cassandra.cache.SerializingCache.serialize(SerializingCache.java:118) at org.apache.cassandra.cache.SerializingCache.put(SerializingCache.java:176) at org.apache.cassandra.cache.InstrumentingCache.put(InstrumentingCache.java:44) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:156) at org.apache.cassandra.db.ColumnFamilyStore.initRowCache(ColumnFamilyStore.java:444) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.db.Table.open(Table.java:87) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:278) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) {noformat} I have the files if there is any useful analysis that can be run. Looked 'normal' to a cursory `less` inspection. Possibly related: CASSANDRA-4463 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8301) Create a tool that given a bunch of sstables creates a decent sstable leveling
[ https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227931#comment-14227931 ] Nikolai Grigoriev commented on CASSANDRA-8301: -- Oh...I think I see what you mean. I only created the situation where nothing overlaps at each level - but I've done nothing to respect this rule about the target number of overlapping sstables between the levels. So, if I understand correctly, this will (or may - depending on how lucky I am) result in slower promotion of the sstables to the upper levels, right? Yes, I was checking the logs carefully to see the result of my manipulations - the only error I saw was about the keys out-of-order in a single sstable file - this could not be caused by my re-leveling. What I observe now is that the remaining ~280 pending compactions go very slowly, there is quite a bit of sstables at level 0. Under normal traffic this number seems to be floating around ~600 and probably even increasing. Each compaction grabs some but while it is working new ones get created :) I think new sstables get created a bit faster than they are compacted and promoted. Could it be due to bad leveling? Regarding initial set of L0 sstables...In my case I had 79 sstables with the token ranges like -9010847458915378120,9190536470441980462. I believe they are original L0 sstables from other machines. I think for those there is no choice but to put them in L0, otherwise they will overlap with all other sstables. I think I will try to implement that algorithm differently. So, just to confirm I get it right: - no overlap allowed at any level except L0 - for each sstable at level N there should be no more than 10 at level N+1 - anything that does not fit goes to L0 - sstables with large token ranges have to go to L0 anyway Interesting...these first two rules most likely create a number of different possible combinations. Create a tool that given a bunch of sstables creates a decent sstable leveling Key: CASSANDRA-8301 URL: https://issues.apache.org/jira/browse/CASSANDRA-8301 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new node, you will end up with a ton of files in L0 and it might be extremely painful to get LCS to compact into a new leveling We could probably exploit the fact that we have many non-overlapping sstables in L0, and offline-bump those sstables into higher levels. It does not need to be perfect, just get the majority of the data into L1+ without creating overlaps. So, suggestion is to create an offline tool that looks at the range each sstable covers and tries to bump it as high as possible in the leveling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8301) Create a tool that given a bunch of sstables creates a decent sstable leveling
[ https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226614#comment-14226614 ] Nikolai Grigoriev commented on CASSANDRA-8301: -- I have attempted to write a simple prototype (very ugly :) ) of such a tool. I am very interested in it because I do suffer from that problem. In fact, without such a tool I simply cannot bootstrap a node. I have tried and the node *never* recovers. So, anyway, I have tried my prototype on a freshly bootstrapped node and it seems to be working. Instead of initial 7,5K pending compactions I have got only about 600, few hours later it is down to ~450 and seems to be going down. cfstats also look quite good (to me ;) ): {code} SSTable count: 6311 SSTables in each level: [571/4, 10, 80, 1411/1000, 4239, 0, 0, 0, 0] {code} I do have some sstables at L0 because the node is taking normal (heavy) traffic at the same time. But this number is already down from ~700 original. I think I could give it a try to make the prototype tool less ugly and submit it here, if you do not mind. Create a tool that given a bunch of sstables creates a decent sstable leveling Key: CASSANDRA-8301 URL: https://issues.apache.org/jira/browse/CASSANDRA-8301 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new node, you will end up with a ton of files in L0 and it might be extremely painful to get LCS to compact into a new leveling We could probably exploit the fact that we have many non-overlapping sstables in L0, and offline-bump those sstables into higher levels. It does not need to be perfect, just get the majority of the data into L1+ without creating overlaps. So, suggestion is to create an offline tool that looks at the range each sstable covers and tries to bump it as high as possible in the leveling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8301) Create a tool that given a bunch of sstables creates a decent sstable leveling
[ https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226744#comment-14226744 ] Nikolai Grigoriev commented on CASSANDRA-8301: -- The logic I have built is very simple. And probably has some fundamental flaws :) First I calculate the target size for each level (in bytes) to accommodate all my data - i.e. to distribute the total size of all my sstables. This also gives me the maximum level to target. Then I take all sstables for the given CF, sort them by the beginning (left) of their bounds. Then I start from the highest level (L4 in my example) and iterate over that list of sstables. I grab the first sstable, remember its bounds, put it to the current level. Then skip to the next one that does not intersect with these bounds, assign it to the current level and change the bounds. And so on until the end of the list or until I use all available size. Then I move to the lower level and repeat it on the remaining sstables. And so on. The remainder goes to L0 where overlaps are allowed (right?). I had to also come up with some logic to exclude the sstables that cover large range of tokens. Most likely these are the ones that were recently written at L0 on the source node - they cover whatever was recently written into them, right? I ignore those from my logic and leave them for L0. Or did I get it completely wrong? Create a tool that given a bunch of sstables creates a decent sstable leveling Key: CASSANDRA-8301 URL: https://issues.apache.org/jira/browse/CASSANDRA-8301 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new node, you will end up with a ton of files in L0 and it might be extremely painful to get LCS to compact into a new leveling We could probably exploit the fact that we have many non-overlapping sstables in L0, and offline-bump those sstables into higher levels. It does not need to be perfect, just get the majority of the data into L1+ without creating overlaps. So, suggestion is to create an offline tool that looks at the range each sstable covers and tries to bump it as high as possible in the leveling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8301) Create a tool that given a bunch of sstables creates a decent sstable leveling
[ https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226744#comment-14226744 ] Nikolai Grigoriev edited comment on CASSANDRA-8301 at 11/26/14 8:04 PM: The logic I have built is very simple. And probably has some fundamental flaws :) First I calculate the target size for each level (in bytes) to accommodate all my data - i.e. to distribute the total size of all my sstables. This also gives me the maximum level to target. Then I take all sstables for the given CF, sort them by the beginning (left) of their bounds. Then I start from the highest level (L4 in my example) and iterate over that list of sstables. I grab the first sstable, remember its bounds, put it to the current level. Then skip to the next one that does not intersect with these bounds, assign it to the current level and change the bounds. And so on until the end of the list or until I use all available size. Then I move to the lower level and repeat it on the remaining sstables. And so on. The remainder goes to L0 where overlaps are allowed (right?). I had to also come up with some logic to exclude the sstables that cover large range of tokens. Most likely these are the ones that were recently written at L0 on the original node - they cover whatever was recently written into them, right? I ignore those from my logic and leave them for L0. Or did I get it completely wrong? was (Author: ngrigor...@gmail.com): The logic I have built is very simple. And probably has some fundamental flaws :) First I calculate the target size for each level (in bytes) to accommodate all my data - i.e. to distribute the total size of all my sstables. This also gives me the maximum level to target. Then I take all sstables for the given CF, sort them by the beginning (left) of their bounds. Then I start from the highest level (L4 in my example) and iterate over that list of sstables. I grab the first sstable, remember its bounds, put it to the current level. Then skip to the next one that does not intersect with these bounds, assign it to the current level and change the bounds. And so on until the end of the list or until I use all available size. Then I move to the lower level and repeat it on the remaining sstables. And so on. The remainder goes to L0 where overlaps are allowed (right?). I had to also come up with some logic to exclude the sstables that cover large range of tokens. Most likely these are the ones that were recently written at L0 on the source node - they cover whatever was recently written into them, right? I ignore those from my logic and leave them for L0. Or did I get it completely wrong? Create a tool that given a bunch of sstables creates a decent sstable leveling Key: CASSANDRA-8301 URL: https://issues.apache.org/jira/browse/CASSANDRA-8301 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new node, you will end up with a ton of files in L0 and it might be extremely painful to get LCS to compact into a new leveling We could probably exploit the fact that we have many non-overlapping sstables in L0, and offline-bump those sstables into higher levels. It does not need to be perfect, just get the majority of the data into L1+ without creating overlaps. So, suggestion is to create an offline tool that looks at the range each sstable covers and tries to bump it as high as possible in the leveling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223036#comment-14223036 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- I have recently realized that there may be relatively cheap (operationally and development-wise) workaround for that limitation. It would also partially address the problem with bootstrapping new node. The root cause of all this is a large amount of data in a single CF on a single node when using LCS for that CF. The performance of a single compaction task running on a single thread is limited anyway. One of the obvious ways to break this limitation is to shard the data across multiple clones of that CF at the application level. Something as dumb as row key hash mod X and add this suffix to the CF name. In my case looks like having X=4 would be more than enough to solve the problem. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208154#comment-14208154 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- I had to rebuild one of the nodes in that test cluster. After bootstrapping it I have checked the results - I had over 6,5K pending compactions and many large sstables (between few Gb and 40-60Gb). I knew that under traffic this will *never* return to reasonable number of pending compactions. I have decided to give it another try, enable the option from CASSANDRA-6621 and re-bootstrap. This time I did not end up with huge sstables but, I think, it will also never recover. This is, essentially, what the node does most of the time: {code} pending tasks: 7217 compaction typekeyspace table completed total unit progress Compaction myks mytable1 5434997373 10667184206 bytes50.95% Compaction myksmytable2 1080506914 7466286503 bytes14.47% Active compaction remaining time : 0h00m09s {code} while: {code} # nodetool cfstats myks.mytable1 Keyspace: myks Read Count: 49783 Read Latency: 38.612470602414476 ms. Write Count: 521971 Write Latency: 1.3617571608384373 ms. Pending Tasks: 0 Table: mytable1 SSTable count: 7893 SSTables in each level: [7828/4, 10, 56, 0, 0, 0, 0, 0, 0] Space used (live), bytes: 1181508730955 Space used (total), bytes: 1181509085659 SSTable Compression Ratio: 0.3068450302663634 Number of keys (estimate): 28180352 Memtable cell count: 153554 Memtable data size, bytes: 41190431 Memtable switch count: 178 Local read count: 49826 Local read latency: 38.886 ms Local write count: 522464 Local write latency: 1.392 ms Pending tasks: 0 Bloom filter false positives: 11802553 Bloom filter false ratio: 0.98767 Bloom filter space used, bytes: 17686928 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 3379391 Compacted partition mean bytes: 142171 Average live cells per slice (last five minutes): 537.5 Average tombstones per slice (last five minutes): 0.0 {code} By the way, this is the picture from another node that functions normally: {code} # nodetool cfstats myks.mytable1 Keyspace: myks Read Count: 4638154 Read Latency: 20.784106776316612 ms. Write Count: 15067667 Write Latency: 1.7291775639188205 ms. Pending Tasks: 0 Table: mytable1 SSTable count: 4561 SSTables in each level: [37/4, 15/10, 106/100, 1053/1000, 3350, 0, 0, 0, 0] Space used (live), bytes: 1129716897255 Space used (total), bytes: 1129752918759 SSTable Compression Ratio: 0.33488717551698993 Number of keys (estimate): 25036672 Memtable cell count: 334212 Memtable data size, bytes: 115610737 Memtable switch count: 4476 Local read count: 4638155 Local read latency: 20.784 ms Local write count: 15067679 Local write latency: 1.729 ms Pending tasks: 0 Bloom filter false positives: 104377 Bloom filter false ratio: 0.59542 Bloom filter space used, bytes: 20319608 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 3379391 Compacted partition mean bytes: 152368 Average live cells per slice (last five minutes): 529.5 Average tombstones per slice (last five minutes): 0.0 {code} So, not only the streaming has created an excessive amount of sstables, the compactions are not advancing at all. In fact, the number of pending compactions grows slowly on that (first) node. New L0 sstables get added because the write activity is taking place. Just a simple math. If I take the compaction throughput of the node when it uses only one thread and compare it to my write rate I think the latter is like 4x the former. Under this conditions this node will never recover - while having plenty of resources and very fast I/O. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
[jira] [Commented] (CASSANDRA-8211) Overlapping sstables in L1+
[ https://issues.apache.org/jira/browse/CASSANDRA-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14204994#comment-14204994 ] Nikolai Grigoriev commented on CASSANDRA-8211: -- Could it happen at any level? I did use sstablesplit in my cluster and I have recently spotted a number of messages like: {code} system.log.1: WARN [main] 2014-11-09 03:32:17,434 LeveledManifest.java (line 164) At level 1, SSTableReader(path='/cassandra-data/disk2/myks/mytable1/myks-mytable1-jb-200217-Data.db') [DecoratedKey(-163275977074170, 001001164335100116433510400100), DecoratedKey(2116162112767472431, 001001432a4c1001432a4c10400100)] overlaps SSTableReader(path='/cassandra-data/disk3/myks/mytable1/myks-mytable1-jb-200215-Data.db') [DecoratedKey(665029536263181199, 0010052d6e8d10052d6e8d10400100), DecoratedKey(1008355148187355376, 001001135f971001135f9710400100)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn’t drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable {code} Overlapping sstables in L1+ --- Key: CASSANDRA-8211 URL: https://issues.apache.org/jira/browse/CASSANDRA-8211 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.0.12 Attachments: 0001-Avoid-overlaps-in-L1-v2.patch, 0001-Avoid-overlaps-in-L1.patch Seems we have a bug that can create overlapping sstables in L1: {code} WARN [main] 2014-10-28 04:09:42,295 LeveledManifest.java (line 164) At level 2, SSTableReader(path='sstable') [DecoratedKey(2838397575996053472, 00 10066059b210066059b210400100), DecoratedKey(5516674013223138308, 001000ff2d161000ff2d160 00010400100)] overlaps SSTableReader(path='sstable') [DecoratedKey(2839992722300822584, 0010 00229ad21000229ad210400100), DecoratedKey(5532836928694021724, 0010034b05a610034b05a6100 000400100)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable {code} Which might manifest itself during compaction with this exception: {code} ERROR [CompactionExecutor:3152] 2014-10-28 00:24:06,134 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:3152,1,main] java.lang.RuntimeException: Last written key DecoratedKey(5516674013223138308, 001000ff2d161000ff2d1610400100) = current key DecoratedKey(2839992722300822584, 001000229ad21000229ad210400100) writing into sstable {code} since we use LeveledScanner when compacting (the backing sstable scanner might go beyond the start of the next sstable scanner) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203751#comment-14203751 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Here is another extreme (but, unfortunately, real) example of LCS going a bit crazy. {code} # nodetool cfstats myks.mytable Keyspace: myks Read Count: 3006212 Read Latency: 21.02595119106703 ms. Write Count: 11226340 Write Latency: 1.8405579886231844 ms. Pending Tasks: 0 Table: wm_contacts SSTable count: 6530 SSTables in each level: [2369/4, 10, 104/100, 1043/1000, 3004, 0, 0, 0, 0] Space used (live), bytes: 1113384288740 Space used (total), bytes: 1113406795020 SSTable Compression Ratio: 0.3307170610260717 Number of keys (estimate): 26294144 Memtable cell count: 782994 Memtable data size, bytes: 213472460 Memtable switch count: 3493 Local read count: 3006239 Local read latency: 21.026 ms Local write count: 11226517 Local write latency: 1.841 ms Pending tasks: 0 Bloom filter false positives: 41835779 Bloom filter false ratio: 0.97500 Bloom filter space used, bytes: 19666944 Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 3379391 Compacted partition mean bytes: 139451 Average live cells per slice (last five minutes): 444.0 Average tombstones per slice (last five minutes): 0.0 {code} {code} # nodetool compactionstats pending tasks: 190 compaction typekeyspace table completed total unit progress Compaction myksmytable2 7198353690 7446734394 bytes96.66% Compaction myks mytable 4851429651 10717052513 bytes45.27% Active compaction remaining time : 0h00m04s {code} Note the cfstats. The number of sstables at L0 is insane. Yet, C* is sitting quietly compacting the data using 2 cores out of 32. Once it gets into this state I immediately start seeing large sstables forming - instead of 256Mb the sstables of 1-2Gb and more start appearing. And it creates the snowball effect. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many
[jira] [Commented] (CASSANDRA-7108) Enabling the Repair Service in OpsCenter generates imprecise repair errors
[ https://issues.apache.org/jira/browse/CASSANDRA-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191738#comment-14191738 ] Nikolai Grigoriev commented on CASSANDRA-7108: -- Just wanted to add my $.02 - I am experiencing identical issue. I suspect this issue (at least on my side) results in snapshot left-overs in the cluster, which leads to higher disk usage until I clean these manually. Enabling the Repair Service in OpsCenter generates imprecise repair errors Key: CASSANDRA-7108 URL: https://issues.apache.org/jira/browse/CASSANDRA-7108 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04, 12.10, 14.04 DSE version: 4.0.0 Cassandra version: 2.0.5.x (x = multiple, e.g. 22, 24) Reporter: nayden kolev Enabling the Repair Service in OpsCenter seems to trigger an error on every node, logged every few minutes (sample below). This does not happen if a nodetool repair keyspace command is issued. I have been able to reproduce it on 4 separate clusters over the past month or so, all of them running the latest DSE and Cassandra (2.0.5+) Error logged INFO [RMI TCP Connection(1350)-127.0.0.1] 2014-04-29 18:22:17,705 StorageService.java (line 2539) Starting repair command #6311, repairing 1 ranges for keyspace OpsCenter ERROR [RMI TCP Connection(1350)-127.0.0.1] 2014-04-29 18:22:17,710 StorageService.java (line 2560) Repair session failed: java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair at org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:164) at org.apache.cassandra.repair.RepairSession.init(RepairSession.java:128) at org.apache.cassandra.repair.RepairSession.init(RepairSession.java:117) at org.apache.cassandra.service.ActiveRepairService.submitRepairSession(ActiveRepairService.java:97) at org.apache.cassandra.service.StorageService.forceKeyspaceRepair(StorageService.java:2620) at org.apache.cassandra.service.StorageService$5.runMayThrow(StorageService.java:2556) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at org.apache.cassandra.service.StorageService.forceKeyspaceRepairRange(StorageService.java:2519) at org.apache.cassandra.service.StorageService.forceKeyspaceRepairRange(StorageService.java:2512) at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at
[jira] [Commented] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188364#comment-14188364 ] Nikolai Grigoriev commented on CASSANDRA-8190: -- [~krummas] Marcus, believe me I do not really enjoy hitting this weird stuff lately ;) Most of the background is in CASSANDRA-7949 (the one you have marked resolved, although I am not sure I fully agree with that resolution). The only detail I would add to CASSANDRA-7949 is that finally, after ~3 weeks the cluster has managed to finish all compactions. 3 weeks to compact the data created in ~4 days. In between I have lost the patience, stopped it and ran sstablesplit on all large sstables (anything larger than 1Gb) on each node. And then I started the nodes one by one once they were done with the split. Upon restart each node had between ~2K and 7K compactions to complete. I had to let them finish them. On the way I have seen these errors on different nodes at different time - so I reported them. Yesterday night the last node has finished the compactions. I've been scrubbing each node after the compactions were done to make sure the data integrity is not broken. Now I am about to restart the load that updates and fetches the data. We are doing some kind of modelling for our real data, a capacity exercise to determine the size of the production cluster. Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Assignee: Marcus Eriksson Attachments: jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188364#comment-14188364 ] Nikolai Grigoriev edited comment on CASSANDRA-8190 at 10/29/14 2:09 PM: [~krummas] Marcus, believe me I do not really enjoy hitting this weird stuff lately ;) Most of the background is in CASSANDRA-7949 (the one you have marked resolved, although I am not sure I fully agree with that resolution). The only detail I would add to CASSANDRA-7949 is that finally, after ~3 weeks the cluster has managed to finish all compactions. 3 weeks to compact the data created in ~4 days. In between I have lost the patience, stopped it and ran sstablesplit on all large sstables (anything larger than 1Gb) on each node. And then I started the nodes one by one once they were done with the split. Upon restart each node had between ~2K and 7K compactions to complete. I had to let them finish them. On the way I have seen these errors on different nodes at different time - so I reported them. My goal was to get the system to the state with no pending compactions and all sstables having the size close to the target one. This is why I used the flag from CASSANDRA-6621 (cassandra.disable_stcs_in_l0), otherwise the cluster would stay in unusable state forever. Yesterday night the last node has finished the compactions. I've been scrubbing each node after the compactions were done to make sure the data integrity is not broken. Now I am about to restart the load that updates and fetches the data. We are doing some kind of modelling for our real data, a capacity exercise to determine the size of the production cluster. was (Author: ngrigor...@gmail.com): [~krummas] Marcus, believe me I do not really enjoy hitting this weird stuff lately ;) Most of the background is in CASSANDRA-7949 (the one you have marked resolved, although I am not sure I fully agree with that resolution). The only detail I would add to CASSANDRA-7949 is that finally, after ~3 weeks the cluster has managed to finish all compactions. 3 weeks to compact the data created in ~4 days. In between I have lost the patience, stopped it and ran sstablesplit on all large sstables (anything larger than 1Gb) on each node. And then I started the nodes one by one once they were done with the split. Upon restart each node had between ~2K and 7K compactions to complete. I had to let them finish them. On the way I have seen these errors on different nodes at different time - so I reported them. Yesterday night the last node has finished the compactions. I've been scrubbing each node after the compactions were done to make sure the data integrity is not broken. Now I am about to restart the load that updates and fetches the data. We are doing some kind of modelling for our real data, a capacity exercise to determine the size of the production cluster. Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Assignee: Marcus Eriksson Attachments: jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at
[jira] [Comment Edited] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188364#comment-14188364 ] Nikolai Grigoriev edited comment on CASSANDRA-8190 at 10/29/14 2:10 PM: [~krummas] Marcus, believe me I do not really enjoy hitting this weird stuff lately ;) Most of the background is in CASSANDRA-7949 (the one you have marked resolved, although I am not sure I fully agree with that resolution). The only detail I would add to CASSANDRA-7949 is that finally, after ~3 weeks the cluster has managed to finish all compactions. 3 weeks to compact the data created in ~4 days. In between I have lost the patience, stopped it and ran sstablesplit on all large sstables (anything larger than 1Gb) on each node. And then I started the nodes one by one once they were done with the split. Upon restart each node had between ~2K and 7K compactions to complete. I had to let them finish them. On the way I have seen these errors on different nodes at different time - so I reported them. My goal was to get the system to the state with no pending compactions and all sstables having the size close to the target one. This is why I used the flag from CASSANDRA-6621 (cassandra.disable_stcs_in_l0), otherwise the cluster would stay in unusable state forever. Yesterday night the last node has finished the compactions. I've been scrubbing each node after the compactions were done to make sure the data integrity is not broken. Now I am about to restart the load that updates and fetches the data. We are doing some kind of modelling for our real data, a capacity exercise to determine the size of the production cluster. Note that the configuration I am attaching was modified a bit to attempt to speed up compactions. There was not too much to tune but stillLike 0 compaction throughput limit etc. was (Author: ngrigor...@gmail.com): [~krummas] Marcus, believe me I do not really enjoy hitting this weird stuff lately ;) Most of the background is in CASSANDRA-7949 (the one you have marked resolved, although I am not sure I fully agree with that resolution). The only detail I would add to CASSANDRA-7949 is that finally, after ~3 weeks the cluster has managed to finish all compactions. 3 weeks to compact the data created in ~4 days. In between I have lost the patience, stopped it and ran sstablesplit on all large sstables (anything larger than 1Gb) on each node. And then I started the nodes one by one once they were done with the split. Upon restart each node had between ~2K and 7K compactions to complete. I had to let them finish them. On the way I have seen these errors on different nodes at different time - so I reported them. My goal was to get the system to the state with no pending compactions and all sstables having the size close to the target one. This is why I used the flag from CASSANDRA-6621 (cassandra.disable_stcs_in_l0), otherwise the cluster would stay in unusable state forever. Yesterday night the last node has finished the compactions. I've been scrubbing each node after the compactions were done to make sure the data integrity is not broken. Now I am about to restart the load that updates and fetches the data. We are doing some kind of modelling for our real data, a capacity exercise to determine the size of the production cluster. Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Assignee: Marcus Eriksson Attachments: cassandra-env.sh, cassandra.yaml, jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at
[jira] [Updated] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8190: - Attachment: cassandra.yaml cassandra-env.sh config files Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Assignee: Marcus Eriksson Attachments: cassandra-env.sh, cassandra.yaml, jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8190: - Attachment: system.log.gz a sample log Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Assignee: Marcus Eriksson Attachments: cassandra-env.sh, cassandra.yaml, jstack.txt.gz, system.log.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-5256: - Attachment: cassandra.yaml cassandra-env.sh Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt, cassandra-env.sh, cassandra.yaml, occurence frequency.png When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:158) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:71) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:342) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) --- I've invoked the `nodetool compact` three times; this occurred after each. The node has been up for a couple days accepting writes and has not been restarted. Here's the server's log since it was started a few days ago:
[jira] [Updated] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-5256: - Attachment: occurence frequency.png Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt, cassandra-env.sh, cassandra.yaml, occurence frequency.png When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:158) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:71) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:342) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) --- I've invoked the `nodetool compact` three times; this occurred after each. The node has been up for a couple days accepting writes and has not been restarted. Here's the server's log since it was started a few days ago: https://gist.github.com/cscotta/4956472/raw/95e7cbc68de1aefaeca11812cbb98d5d46f534e8/cassandra.log Here's the code
[jira] [Created] (CASSANDRA-8210) java.lang.AssertionError: Memory was freed exception in CompactionExecutor
Nikolai Grigoriev created CASSANDRA-8210: Summary: java.lang.AssertionError: Memory was freed exception in CompactionExecutor Key: CASSANDRA-8210 URL: https://issues.apache.org/jira/browse/CASSANDRA-8210 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2, Cassandra 2.0.10, OEL 6.5, kernel 3.8.13-44.el6uek.x86_64, 128Gb of RAM, swap disabled, JRE 1.7.0_67-b01 Reporter: Nikolai Grigoriev Priority: Minor I have just got this problem on multiple nodes. Cassandra 2.0.10 (DSE 4.5.2). After looking through the history I have found that it was actually happening on all nodes since the start of large compaction process (I've loaded tons of data in the system and then turned off all load to let it compact the data). {code} ERROR [CompactionExecutor:1196] 2014-10-28 17:14:50,124 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:1196,1,main] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) at org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) at org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:84) at org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:58) at org.apache.cassandra.io.sstable.SSTableReader.getSampleIndexesForRanges(SSTableReader.java:692) at org.apache.cassandra.io.sstable.SSTableReader.estimatedKeysForRanges(SSTableReader.java:663) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:328) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.findDroppableSSTable(LeveledCompactionStrategy.java:354) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:125) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:113) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:192) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8210) java.lang.AssertionError: Memory was freed exception in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8210: - Attachment: cassandra.yaml cassandra-env.sh occurence frequency.png java.lang.AssertionError: Memory was freed exception in CompactionExecutor Key: CASSANDRA-8210 URL: https://issues.apache.org/jira/browse/CASSANDRA-8210 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2, Cassandra 2.0.10, OEL 6.5, kernel 3.8.13-44.el6uek.x86_64, 128Gb of RAM, swap disabled, JRE 1.7.0_67-b01 Reporter: Nikolai Grigoriev Priority: Minor Attachments: cassandra-env.sh, cassandra.yaml, occurence frequency.png I have just got this problem on multiple nodes. Cassandra 2.0.10 (DSE 4.5.2). After looking through the history I have found that it was actually happening on all nodes since the start of large compaction process (I've loaded tons of data in the system and then turned off all load to let it compact the data). {code} ERROR [CompactionExecutor:1196] 2014-10-28 17:14:50,124 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:1196,1,main] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) at org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) at org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:84) at org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:58) at org.apache.cassandra.io.sstable.SSTableReader.getSampleIndexesForRanges(SSTableReader.java:692) at org.apache.cassandra.io.sstable.SSTableReader.estimatedKeysForRanges(SSTableReader.java:663) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:328) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.findDroppableSSTable(LeveledCompactionStrategy.java:354) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:125) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:113) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:192) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-5256: - Attachment: (was: occurence frequency.png) Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:158) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:71) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:342) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) --- I've invoked the `nodetool compact` three times; this occurred after each. The node has been up for a couple days accepting writes and has not been restarted. Here's the server's log since it was started a few days ago: https://gist.github.com/cscotta/4956472/raw/95e7cbc68de1aefaeca11812cbb98d5d46f534e8/cassandra.log Here's the code being used to issue writes to the datastore:
[jira] [Updated] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-5256: - Attachment: (was: cassandra.yaml) Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:158) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:71) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:342) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) --- I've invoked the `nodetool compact` three times; this occurred after each. The node has been up for a couple days accepting writes and has not been restarted. Here's the server's log since it was started a few days ago: https://gist.github.com/cscotta/4956472/raw/95e7cbc68de1aefaeca11812cbb98d5d46f534e8/cassandra.log Here's the code being used to issue writes to the datastore:
[jira] [Updated] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-5256: - Attachment: (was: cassandra-env.sh) Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:158) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:71) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:342) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) --- I've invoked the `nodetool compact` three times; this occurred after each. The node has been up for a couple days accepting writes and has not been restarted. Here's the server's log since it was started a few days ago: https://gist.github.com/cscotta/4956472/raw/95e7cbc68de1aefaeca11812cbb98d5d46f534e8/cassandra.log Here's the code being used to issue writes to the datastore:
[jira] [Updated] (CASSANDRA-8210) java.lang.AssertionError: Memory was freed exception in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8210: - Attachment: system.log.gz java.lang.AssertionError: Memory was freed exception in CompactionExecutor Key: CASSANDRA-8210 URL: https://issues.apache.org/jira/browse/CASSANDRA-8210 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2, Cassandra 2.0.10, OEL 6.5, kernel 3.8.13-44.el6uek.x86_64, 128Gb of RAM, swap disabled, JRE 1.7.0_67-b01 Reporter: Nikolai Grigoriev Priority: Minor Attachments: cassandra-env.sh, cassandra.yaml, occurence frequency.png, system.log.gz I have just got this problem on multiple nodes. Cassandra 2.0.10 (DSE 4.5.2). After looking through the history I have found that it was actually happening on all nodes since the start of large compaction process (I've loaded tons of data in the system and then turned off all load to let it compact the data). {code} ERROR [CompactionExecutor:1196] 2014-10-28 17:14:50,124 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:1196,1,main] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) at org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) at org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:84) at org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:58) at org.apache.cassandra.io.sstable.SSTableReader.getSampleIndexesForRanges(SSTableReader.java:692) at org.apache.cassandra.io.sstable.SSTableReader.estimatedKeysForRanges(SSTableReader.java:663) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:328) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.findDroppableSSTable(LeveledCompactionStrategy.java:354) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:125) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:113) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:192) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8210) java.lang.AssertionError: Memory was freed exception in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188456#comment-14188456 ] Nikolai Grigoriev commented on CASSANDRA-8210: -- Opened new ticket as per [~jbellis]'s recommendation in response to my comment to CASSANDRA-5256 java.lang.AssertionError: Memory was freed exception in CompactionExecutor Key: CASSANDRA-8210 URL: https://issues.apache.org/jira/browse/CASSANDRA-8210 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2, Cassandra 2.0.10, OEL 6.5, kernel 3.8.13-44.el6uek.x86_64, 128Gb of RAM, swap disabled, JRE 1.7.0_67-b01 Reporter: Nikolai Grigoriev Priority: Minor Attachments: cassandra-env.sh, cassandra.yaml, occurence frequency.png, system.log.gz I have just got this problem on multiple nodes. Cassandra 2.0.10 (DSE 4.5.2). After looking through the history I have found that it was actually happening on all nodes since the start of large compaction process (I've loaded tons of data in the system and then turned off all load to let it compact the data). {code} ERROR [CompactionExecutor:1196] 2014-10-28 17:14:50,124 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:1196,1,main] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) at org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) at org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:84) at org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:58) at org.apache.cassandra.io.sstable.SSTableReader.getSampleIndexesForRanges(SSTableReader.java:692) at org.apache.cassandra.io.sstable.SSTableReader.estimatedKeysForRanges(SSTableReader.java:663) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:328) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.findDroppableSSTable(LeveledCompactionStrategy.java:354) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:125) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:113) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:192) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5256) Memory was freed AssertionError During Major Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187189#comment-14187189 ] Nikolai Grigoriev commented on CASSANDRA-5256: -- I have just got this problem on multiple nodes. Cassandra 2.0.10 (DSE 4.5.2). Should I reopen? {code} ERROR [CompactionExecutor:1196] 2014-10-28 17:14:50,124 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:1196,1,main] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:259) at org.apache.cassandra.io.util.Memory.getInt(Memory.java:211) at org.apache.cassandra.io.sstable.IndexSummary.getIndex(IndexSummary.java:79) at org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:84) at org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:58) at org.apache.cassandra.io.sstable.SSTableReader.getSampleIndexesForRanges(SSTableReader.java:692) at org.apache.cassandra.io.sstable.SSTableReader.estimatedKeysForRanges(SSTableReader.java:663) at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:328) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.findDroppableSSTable(LeveledCompactionStrategy.java:354) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:125) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:113) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:192) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} Memory was freed AssertionError During Major Compaction - Key: CASSANDRA-5256 URL: https://issues.apache.org/jira/browse/CASSANDRA-5256 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.0 Environment: Linux ashbdrytest01p 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu 12.04.2 LTS Reporter: C. Scott Andreas Assignee: Jonathan Ellis Priority: Critical Labels: compaction Fix For: 1.2.2 Attachments: 5256-v2.txt, 5256-v4.txt, 5256-v5.txt, 5256.txt When initiating a major compaction with `./nodetool -h localhost compact`, an AssertionError is thrown in the CompactionExecutor from o.a.c.io.util.Memory: ERROR [CompactionExecutor:41495] 2013-02-14 14:38:35,720 CassandraDaemon.java (line 133) Exception in thread Thread[CompactionExecutor:41495,1,RMI Runtime] java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.Memory.checkPosition(Memory.java:146) at org.apache.cassandra.io.util.Memory.getLong(Memory.java:116) at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:176) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:88) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:327) at java.io.RandomAccessFile.readInt(RandomAccessFile.java:755) at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792) at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:114) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:101) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:235) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:109) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:93) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:162) at
[jira] [Commented] (CASSANDRA-8167) sstablesplit tool can be made much faster with few JVM settings
[ https://issues.apache.org/jira/browse/CASSANDRA-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185573#comment-14185573 ] Nikolai Grigoriev commented on CASSANDRA-8167: -- No, unfortunately, I did not think about capturing it :( I only saved the stack trace. I can share my cassandra.yaml if needed. Plus - I was splitting the sstables for a table that has relatively wide rows. Not necessarily in terms or number of columns but size-wise there are few rows there that may be up to 3,5Mb (average row size is about 140Kb for that table). sstablesplit tool can be made much faster with few JVM settings --- Key: CASSANDRA-8167 URL: https://issues.apache.org/jira/browse/CASSANDRA-8167 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Nikolai Grigoriev Priority: Trivial I had to use sstablesplit tool intensively to split some really huge sstables. The tool is painfully slow as it does compaction in one single thread. I have just found that one one of my machines the tool has crashed when I was almost done with 152Gb sstable (!!!). {code} INFO 16:59:22,342 Writing Memtable-compactions_in_progress@1948660572(0/0 serialized/live bytes, 1 ops) INFO 16:59:22,352 Completed flushing /cassandra-data/disk1/system/compactions_in_progress/system-compactions_in_progress-jb-79242-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1413904450653, position=69178) Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:586) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.RangeTombstoneList$InOrderTester.isDeleted(RangeTombstoneList.java:751) at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:422) at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:403) at org.apache.cassandra.db.ColumnFamily.hasIrrelevantData(ColumnFamily.java:489) at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeleted(PrecompactedRow.java:66) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:204) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.SSTableSplitter.split(SSTableSplitter.java:38) at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:150) {code} This has triggered my desire to see what memory settings are used for JVM running the tool...and I have found that it runs with default Java settings (no settings at all). I have tried to apply the settings from C* itself and this resulted in over 40% speed increase. It went from ~5Mb/s to 7Mb/s - from the compressed output perspective. I believe this is mostly due to concurrent GC. I see my CPU usage has increased to ~200%. But this is fine, this is an offline tool, the node is down anyway. I know that concurrent GC (at least something like -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled) normally improves the performance of even primitive single-threaded heap-intensive Java programs. I think it should be acceptable to apply the server JVM settings to this tool.
[jira] [Updated] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8190: - Attachment: jstack.txt.gz Captured when the node was in this state: pending compactions and no compactions active. Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Attachments: jstack.txt.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8190: - Attachment: system.log.gz Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Attachments: jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186231#comment-14186231 ] Nikolai Grigoriev commented on CASSANDRA-8190: -- Happens quite often. I have captured the thread dump and server log (both attached) when I've got this issue again on one of the nodes. {code} pending tasks: 601 Active compaction remaining time :n/a {code} Compactions stop completely because of RuntimeException in CompactionExecutor - Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev Attachments: jstack.txt.gz, system.log.gz I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8190) Compactions stop completely because of RuntimeException in CompactionExecutor
Nikolai Grigoriev created CASSANDRA-8190: Summary: Compactions stop completely because of RuntimeException in CompactionExecutor Key: CASSANDRA-8190 URL: https://issues.apache.org/jira/browse/CASSANDRA-8190 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev I have a cluster that is recovering from being overloaded with writes. I am using the workaround from CASSANDRA-6621 to prevent the STCS fallback (which is killing the cluster - see CASSANDRA-7949). I have observed that after one or more exceptions like this {code} ERROR [CompactionExecutor:4087] 2014-10-26 22:50:05,016 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4087,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130379-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} the node completely stops the compactions and I end up in the state like this: {code} # nodetool compactionstats pending tasks: 1288 compaction typekeyspace table completed total unit progress Active compaction remaining time :n/a {code} The node recovers if restarted and starts compactions - until getting more exceptions like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8191) After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey)
[ https://issues.apache.org/jira/browse/CASSANDRA-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-8191: - Summary: After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey) (was: All nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey)) After sstablesplit all nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey) -- Key: CASSANDRA-8191 URL: https://issues.apache.org/jira/browse/CASSANDRA-8191 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev While recovering the cluster from CASSANDRA-7949 (using the flag from CASSANDRA-6621) I had to use sstablesplit tool to split large sstables. Nodes were off while using this tool and only one sstablesplit instance was running, of course. After splitting was done I have restarted the nodes and they all started compacting the data. All the nodes are logging the exceptions like this: {code} ERROR [CompactionExecutor:4028] 2014-10-26 23:14:52,653 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4028,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130525-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} It seems that scrubbing helps but scrubbing blocks the compactions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8191) All nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey)
Nikolai Grigoriev created CASSANDRA-8191: Summary: All nodes log RuntimeException in CompactionExecutor (Last written key DecoratedKey... = current key DecoratedKey) Key: CASSANDRA-8191 URL: https://issues.apache.org/jira/browse/CASSANDRA-8191 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.2 (Cassandra 2.0.10) Reporter: Nikolai Grigoriev While recovering the cluster from CASSANDRA-7949 (using the flag from CASSANDRA-6621) I had to use sstablesplit tool to split large sstables. Nodes were off while using this tool and only one sstablesplit instance was running, of course. After splitting was done I have restarted the nodes and they all started compacting the data. All the nodes are logging the exceptions like this: {code} ERROR [CompactionExecutor:4028] 2014-10-26 23:14:52,653 CassandraDaemon.java (line 199) Exception in thread Thread[CompactionExecutor:4028,1,main] java.lang.RuntimeException: Last written key DecoratedKey(425124616570337476, 0010033523da10033523da10 400100) = current key DecoratedKey(-8778432288598355336, 0010040c7a8f10040c7a8f10 400100) writing into /cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-130525-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} It seems that scrubbing helps but scrubbing blocks the compactions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181742#comment-14181742 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- By the way, I am getting {code} ERROR [CompactionExecutor:2333] 2014-10-23 18:29:53,590 CassandraDaemon.java (line 199) Exception in thread Thread[Compactio nExecutor:2333,1,main] java.lang.RuntimeException: Last written key DecoratedKey(1156541975678546868, 001003bc510f1 003bc510f10400100) = current key DecoratedKey(36735936098318717, 001000 00015feb8a10015feb8a10400100) writing into / cassandra-data/disk2/myks/mytable/myks-mytable-tmp-jb-94445-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} with 2.0.10 release. I am using native protocol. I believe native protocol handler is based on HSHA, am I right? Anyway, I am getting those too. 2.0 HSHA server introduces corrupt data --- Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Pavel Yaskevich Priority: Critical Fix For: 2.0.8 Attachments: 6285_testnotes1.txt, CASSANDRA-6285-disruptor-heap.patch, cassandra-attack-src.zip, compaction_test.py, disruptor-high-cpu.patch, disruptor-memory-corruption.patch, enable_reallocate_buffers.txt After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
[jira] [Created] (CASSANDRA-8167) sstablesplit tool can be made much faster with few JVM settings
Nikolai Grigoriev created CASSANDRA-8167: Summary: sstablesplit tool can be made much faster with few JVM settings Key: CASSANDRA-8167 URL: https://issues.apache.org/jira/browse/CASSANDRA-8167 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Nikolai Grigoriev Priority: Trivial I had to use sstablesplit tool intensively to split some really huge sstables. The tool is painfully slow as it does compaction in one single thread. I have just found that one one of my machines the tool has crashed when I was almost done with 152Gb sstable (!!!). {code} INFO 16:59:22,342 Writing Memtable-compactions_in_progress@1948660572(0/0 serialized/live bytes, 1 ops) INFO 16:59:22,352 Completed flushing /cassandra-data/disk1/system/compactions_in_progress/system-compactions_in_progress-jb-79242-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1413904450653, position=69178) Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:586) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.RangeTombstoneList$InOrderTester.isDeleted(RangeTombstoneList.java:751) at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:422) at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:403) at org.apache.cassandra.db.ColumnFamily.hasIrrelevantData(ColumnFamily.java:489) at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeleted(PrecompactedRow.java:66) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:204) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.SSTableSplitter.split(SSTableSplitter.java:38) at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:150) {code} This has triggered my desire to see what memory settings are used for JVM running the tool...and I have found that it runs with default Java settings (no settings at all). I have tried to apply the settings from C* itself and this resulted in over 40% speed increase. It went from ~5Mb/s to 7Mb/s - from the compressed output perspective. I believe this is mostly due to concurrent GC. I see my CPU usage has increased to ~200%. But this is fine, this is an offline tool, the node is down anyway. I know that concurrent GC (at least something like -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled) normally improves the performance of even primitive single-threaded heap-intensive Java programs. I think it should be acceptable to apply the server JVM settings to this tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178828#comment-14178828 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- [~sterligovak] I was always wondering why did I always see these problems appearing for OpsCenter keyspace. My keyspace had much more traffic but when I had this problem - it always manifested itself with OpsCenter keyspace. Even when I was also using Thrift (we use native protocol now). I even remember disabling OpsCenter to prove the point :) 2.0 HSHA server introduces corrupt data --- Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Pavel Yaskevich Priority: Critical Fix For: 2.0.8 Attachments: 6285_testnotes1.txt, CASSANDRA-6285-disruptor-heap.patch, cassandra-attack-src.zip, compaction_test.py, disruptor-high-cpu.patch, disruptor-memory-corruption.patch, enable_reallocate_buffers.txt After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179431#comment-14179431 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- I think this is the error that you cannot fix by scrubbing. Corrupted sstable. I was fixing those by deleting the sstables and doing repairs. Unfortunately, if that happens on many nodes there is a risk of data loss. As for the OpsCenter - do not get me wrong ;) I did not want to say that OpsCenter was directly responsible for these troubles. But I do believe that OpsCenter does something particular that reveals the bug in hsha server. At least this was my impression. After disabling OpsCenter and fixing the outstanding problems I do not recall seeing those errors anymore. And I was also using Thrift and I was writing and reading 100x more data than OpsCenter. 2.0 HSHA server introduces corrupt data --- Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Pavel Yaskevich Priority: Critical Fix For: 2.0.8 Attachments: 6285_testnotes1.txt, CASSANDRA-6285-disruptor-heap.patch, cassandra-attack-src.zip, compaction_test.py, disruptor-high-cpu.patch, disruptor-memory-corruption.patch, enable_reallocate_buffers.txt After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176884#comment-14176884 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Then I doubt I can really try it. We are quite close from production deployment and trying with something that far from what we will use in prod is pointless (for me, not for the fix ;) ). LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176426#comment-14176426 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- [~krummas] Marcus, Which patch are you talking about? I am running latest DSE with Cassandra 2.0.10. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174702#comment-14174702 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Update: Using the property from CASSANDRA-6621 does help to get out of this state. My cluster is slowly digesting the large sstables and creating bunch of nice small sstables from them. It is slower than using sstablesplit, I believe, because it actually does real compactions and, thus, processes and reprocesses different sets of sstables. My understanding is that every time I get new bunch of L0 sstables there is a phase for updating other levels and it repeats and repeats. With that property set I see that my total number of sstables grows, my number of huge sstables decreases and the average size of the sstable decreases as result. My conclusions so far: 1. STCS fallback in LCS is a double-edged sword. It is needed to prevent the flooding the node with tons of small sstables resulting from ongoing writes. These small ones are often much smaller than the configured target size and hey need to be merged. But also the use of STCS results in generation of the super-sized sstables. These become a large headache when the fallback stops and LCS is supposed to resume normal operations. It appears to me (my humble opinion) that fallback should be done to some kind of specialized rescue STCS flavor that merges the small sstables to approximately the LCS target sstable size BUT DOES NOT create sstables that are much larger than the target size. With this approach the LCS will resume normal operations much faster than the cause for the fallback (abnormally high write load) is gone. 2. LCS has major (performance?) issue when you have super-large sstables in the system. It often gets stuck with single long (many hours) compaction stream that, by itself, will increase the probability of another STCS fallback even with reasonable write load. As a possible workaround I was recommended to consider running multiple C* instances on our relatively powerful machines - to significantly reduce the amount of data per node and increase compaction throughput. 3. In the existing systems, depending on the severity of the STCS fallback work the fix from CASSANDRA-6621 may help to recover while keeping the nodes up. It will take a very long time to recover but the nodes will be online. 4. Recovery (see above) is very long. It is much much longer than the duration of the stress period that causes the condition. In my case I was writing like crazy for about 4 days and it's been over a week of compactions after that. I am still very far from 0 pending compactions. Considering this it makes sense to artificially throttle the write speed when generating the data (like in the use case I described in previous comments). Extra time spent on writing the data will be still significantly shorter than the amount of time required to recover from the consequences of abusing the available write bandwidth. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been
[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174702#comment-14174702 ] Nikolai Grigoriev edited comment on CASSANDRA-7949 at 10/17/14 3:57 AM: Update: Using the property from CASSANDRA-6621 does help to get out of this state. My cluster is slowly digesting the large sstables and creating bunch of nice small sstables from them. It is slower than using sstablesplit, I believe, because it actually does real compactions and, thus, processes and reprocesses different sets of sstables. My understanding is that every time I get new bunch of L0 sstables there is a phase for updating other levels and it repeats and repeats. With that property set I see that my total number of sstables grows, my number of huge sstables decreases and the average size of the sstable decreases as result. My conclusions so far: 1. STCS fallback in LCS is a double-edged sword. It is needed to prevent the flooding the node with tons of small sstables resulting from ongoing writes. These small ones are often much smaller than the configured target size and hey need to be merged. But also the use of STCS results in generation of the super-sized sstables. These become a large headache when the fallback stops and LCS is supposed to resume normal operations. It appears to me (my humble opinion) that fallback should be done to some kind of specialized rescue STCS flavor that merges the small sstables to approximately the LCS target sstable size BUT DOES NOT create sstables that are much larger than the target size. With this approach the LCS will resume normal operations much faster than the cause for the fallback (abnormally high write load) is gone. 2. LCS has major (performance?) issue when you have super-large sstables in the system. It often gets stuck with single long (many hours) compaction stream that, by itself, will increase the probability of another STCS fallback even with reasonable write load. As a possible workaround I was recommended to consider running multiple C* instances on our relatively powerful machines - to significantly reduce the amount of data per node and increase compaction throughput. 3. In the existing systems, depending on the severity of the STCS fallback work, the fix from CASSANDRA-6621 may help to recover while keeping the nodes up. It will take a very long time to recover but the nodes will be online. 4. Recovery (see above) is very long. It is much much longer than the duration of the stress period that causes the condition. In my case I was writing like crazy for about 4 days and it's been over a week of compactions after that. I am still very far from 0 pending compactions. Considering this it makes sense to artificially throttle the write speed when generating the data (like in the use case I described in previous comments). Extra time spent on writing the data will be still significantly shorter than the amount of time required to recover from the consequences of abusing the available write bandwidth. was (Author: ngrigor...@gmail.com): Update: Using the property from CASSANDRA-6621 does help to get out of this state. My cluster is slowly digesting the large sstables and creating bunch of nice small sstables from them. It is slower than using sstablesplit, I believe, because it actually does real compactions and, thus, processes and reprocesses different sets of sstables. My understanding is that every time I get new bunch of L0 sstables there is a phase for updating other levels and it repeats and repeats. With that property set I see that my total number of sstables grows, my number of huge sstables decreases and the average size of the sstable decreases as result. My conclusions so far: 1. STCS fallback in LCS is a double-edged sword. It is needed to prevent the flooding the node with tons of small sstables resulting from ongoing writes. These small ones are often much smaller than the configured target size and hey need to be merged. But also the use of STCS results in generation of the super-sized sstables. These become a large headache when the fallback stops and LCS is supposed to resume normal operations. It appears to me (my humble opinion) that fallback should be done to some kind of specialized rescue STCS flavor that merges the small sstables to approximately the LCS target sstable size BUT DOES NOT create sstables that are much larger than the target size. With this approach the LCS will resume normal operations much faster than the cause for the fallback (abnormally high write load) is gone. 2. LCS has major (performance?) issue when you have super-large sstables in the system. It often gets stuck with single long (many hours) compaction stream that, by itself, will increase the probability of another STCS fallback even with reasonable write
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168822#comment-14168822 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- I did another round of testing and I can confirm my previous suspicion. If LCS goes into STCS fallback mode there seems to be some kind of point of no return. After loading fairly large amount of data I end up with a number of large (from few Gb to 200+Gb) sstables. After that the cluster simply goes downhill - it never recovers. Even if there is no traffic except the repair service (DSE OpsCenter) the number of pending compactions never declines. It actually grows. Sstables also grow and grow in size until the moment one of the compactions runs out of disk space and crashes the node. Also I believe once in this state there is no way out. sstablesplit tool, as far as I understand, cannot be used with the live node. And the tool splits the data in single thread. I have measured its performance on my system, it processes about 13Mb/s on average, thus, to split all these large sstables it would take many DAYS. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140%
[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168822#comment-14168822 ] Nikolai Grigoriev edited comment on CASSANDRA-7949 at 10/12/14 11:59 PM: - I did another round of testing and I can confirm my previous suspicion. If LCS goes into STCS fallback mode there seems to be some kind of point of no return. After loading fairly large amount of data I end up with a number of large (from few Gb to 200+Gb) sstables. After that the cluster simply goes downhill - it never recovers. Even if there is no traffic except the repair service (DSE OpsCenter) the number of pending compactions never declines. It actually grows. Sstables also grow and grow in size until the moment one of the compactions runs out of disk space and crashes the node. Also I believe once in this state there is no way out. sstablesplit tool, as far as I understand, cannot be used with the live node. And the tool splits the data in single thread. I have measured its performance on my system, it processes about 13Mb/s on average, thus, to split all these large sstables it would take many DAYS. I have got an idea that might actually help. That JVM property from CASSANDRA-6621 - it seems to be what I need right now. I have tried it and it seems (so far) that when compacting my nodes produce only the sstables of the target size, i.e (I may be wrong but so far it seems so) it is splitting the large sstables into the small ones while the nodes are on. If it continues like this I may hope to eventually get rid of mega-huge-sstables and then LCS performance should be back to normal. Will provide an update later. was (Author: ngrigor...@gmail.com): I did another round of testing and I can confirm my previous suspicion. If LCS goes into STCS fallback mode there seems to be some kind of point of no return. After loading fairly large amount of data I end up with a number of large (from few Gb to 200+Gb) sstables. After that the cluster simply goes downhill - it never recovers. Even if there is no traffic except the repair service (DSE OpsCenter) the number of pending compactions never declines. It actually grows. Sstables also grow and grow in size until the moment one of the compactions runs out of disk space and crashes the node. Also I believe once in this state there is no way out. sstablesplit tool, as far as I understand, cannot be used with the live node. And the tool splits the data in single thread. I have measured its performance on my system, it processes about 13Mb/s on average, thus, to split all these large sstables it would take many DAYS. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162263#comment-14162263 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- It seems that what I am suffering from in this specific test is similar to CASSANDRA-6621. When I write all unique data to create my initial snapshot I effectively do something similar to what happens when new node is bootstrapped, I think. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152569#comment-14152569 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Upgraded to Cassandra 2.0.10 (via DSE 4.5.2) today. Switched my tables that used STCS to LCS. Restarted. For last 8 hours I observe this on all nodes: {code} # nodetool compactionstats pending tasks: 13808 compaction typekeyspace table completed total unit progress Compaction mykeyspacetable_1528230773591 1616185183262 bytes32.68% Compaction mykeyspace table_2456361916088 4158821946280 bytes10.97% Active compaction remaining time : 3h57m56s {code} In the beginning of these 8 hours the remaining time was about 4h08m. CPU activity - almost nothing (between 2 and 3 cores), disk I/O - nearly zero. So clearly it compacts in one thread per keyspace and almost does not progress. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug
[jira] [Issue Comment Deleted] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Comment: was deleted (was: May be this is not related but I have another small cluster with similar data. I have just upgraded that one to 2.0.10 (not DSE, original open-source version). On all machines in this cluster I have many thousands of sstables, all 160Mb, few ones that are smaller. So they are all L0, no L1 or higher level sstables exist. LCS is used. Number of pending compactions: 0. There is even incoming traffic that writes into that keyspace. nodetool compact returns immediately. ) LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146632#comment-14146632 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- May be this is not related but I have another small cluster with similar data. I have just upgraded that one to 2.0.10 (not DSE, original open-source version). On all machines in this cluster I have many thousands of sstables, all 160Mb, few ones that are smaller. So they are all L0, no L1 or higher level sstables exist. LCS is used. Number of pending compactions: 0. There is even incoming traffic that writes into that keyspace. nodetool compact returns immediately. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143214#comment-14143214 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Update: I have completed my last data writing test, now I have enough data to start another phase. I did that last test with compaction strategy set to STCS but disabled for the duration of the test. Once all writers have finished I have re-enabled the compactions. In under one day STCS has completed the job on all nodes, I ended up with few dozens (~40 or so) large sstables, total amount of data about 23Tb on 15 nodes. I have switched back to LCS this morning and immediately observed the hockey stick on the pending compaction graph. Now each node reports about 8-10K of pending compactions, they are all compacting in one stream per CF and consume virtually no resources: {code} # nodetool compactionstats pending tasks: 9900 compaction typekeyspace table completed total unit progress Compaction testks test_list2 26630083587 812539331642 bytes 3.28% Compaction testks test_list1 24071738534 1994877844635 bytes 1.21% Active compaction remaining time : 2h16m55s # w 13:41:45 up 23 days, 18:13, 2 users, load average: 1.81, 2.13, 2.51 ... # iostat -mdx 5 Linux 3.8.13-44.el6uek.x86_64 (cassandra01.mydomain.com) 22/09/14 _x86_64_(32 CPU) Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 5.73 88.00 13.33 5.47 5.16 214.84 0.515.08 0.39 3.98 sda 0.00 8.160.13 65.80 0.00 3.28 101.80 0.060.87 0.11 0.71 sdc 0.00 4.93 75.05 13.34 4.67 5.42 233.62 0.495.55 0.39 3.42 sdd 0.00 5.82 86.40 14.10 5.37 5.52 221.83 0.565.59 0.38 3.81 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 134.600.00 8.37 0.00 127.30 0.060.42 0.42 5.64 sda 0.0013.000.00 220.40 0.00 0.96 8.94 0.010.05 0.01 0.32 sdc 0.00 0.00 36.400.00 2.27 0.00 128.00 0.010.41 0.41 1.50 sdd 0.00 0.00 21.200.00 1.32 0.00 128.00 0.000.19 0.19 0.40 {code} LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141087#comment-14141087 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Yes and no. Yes - the number of pending compactions started to go down and I ended up with fewer (and large) sstables. But I think the issue is more about LCS compaction performance. Is it normal that LCS cannot efficiently use the host resources while having tons of pending compactions? LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141154#comment-14141154 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- I understand that it was an estimate but my cluster was trying to process this estimate in almost 3 full days with little progress. About 1,5 days of data injection3 days of compaction with no progress - that does not sound right. And STCS was able to crunch most of the data in about one day after the switch. I strongly suspect that the fact that I was loading and not updating the data at high rate resulted in some sort of edge case scenario for LCS. But considering that the cluster could not recover in reasonable amount of time (exceeding the original load time by factor of 2+) I do believe that something may need to be improved in LCS logic OR some kind of diagnostic message needs to be generated to request a specific action to be taken by the cluster owner. In my case the problem was easy to spot as it was highly visible - but if this happens to one of 50 CFs it may take a while before someone spots endless compactions happening. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141174#comment-14141174 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Just a small clarification: just not very many is not exactly what I observed. Mainly there was one active compaction but once in a while there was a burst of compactions with high CPU usage, GOSSIP issues caused by nodes being less responsive etc. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7956) nodetool compactionhistory crashes because of low heap size (GC overhead limit exceeded)
[ https://issues.apache.org/jira/browse/CASSANDRA-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139342#comment-14139342 ] Nikolai Grigoriev commented on CASSANDRA-7956: -- I think that setting is not effective for nodetool status because of the GC settings. I have seen it before in other apps that the default GC settings may be very ineffective. Mostly it was due to the parallel GC not being enabled. Maybe trying to put -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled would be enough. Although, of course, in that case nodetool will use more CPU resources. nodetool compactionhistory crashes because of low heap size (GC overhead limit exceeded) -- Key: CASSANDRA-7956 URL: https://issues.apache.org/jira/browse/CASSANDRA-7956 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.8 Reporter: Nikolai Grigoriev Priority: Trivial Fix For: 2.0.11 Attachments: 7956.txt, nodetool_compactionhistory_128m_heap_output.txt.gz {code} ]# nodetool compactionhistory Compaction History: Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:967) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at java.util.HashMap.readObject(HashMap.java:1180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at javax.management.openmbean.TabularDataSupport.readObject(TabularDataSupport.java:912) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:325) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:174) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:906) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:267) at com.sun.proxy.$Proxy3.getCompactionHistory(Unknown Source) {code} nodetool starts with -Xmx32m. This seems to be not enough at least in my case to show the history. I am not sure what would the appropriate amount be but increasing it to 128m definitely solves the problem. Output from modified nodetool attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137054#comment-14137054 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- I see. I could try to switch to STCS now and see what happens. My concern is that the issue seems to be permanent. Even after last night none of the nodes (being vritually idle - the load was over) was able to eat through the pending compactions. And, to my surprise, half of the nodes in the cluster do not even compact fast enough - look at the graphs attached. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: pending compactions 2day LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: (was: pending compactions 2day) LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: pending compactions 2day.png LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7956) nodetool compactionhistory crashes because of low heap size (GC overhead limit exceeded)
Nikolai Grigoriev created CASSANDRA-7956: Summary: nodetool compactionhistory crashes because of low heap size (GC overhead limit exceeded) Key: CASSANDRA-7956 URL: https://issues.apache.org/jira/browse/CASSANDRA-7956 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.8 Reporter: Nikolai Grigoriev Priority: Trivial Attachments: nodetool_compactionhistory_128m_heap_output.txt.gz {code} ]# nodetool compactionhistory Compaction History: Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:967) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at java.util.HashMap.readObject(HashMap.java:1180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at javax.management.openmbean.TabularDataSupport.readObject(TabularDataSupport.java:912) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:325) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:174) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:906) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:267) at com.sun.proxy.$Proxy3.getCompactionHistory(Unknown Source) {code} nodetool starts with -Xmx32m. This seems to be not enough at least in my case to show the history. I am not sure what would the appropriate amount be but increasing it to 128m definitely solves the problem. Output from modified nodetool attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7957) improve active/pending compaction monitoring
Nikolai Grigoriev created CASSANDRA-7957: Summary: improve active/pending compaction monitoring Key: CASSANDRA-7957 URL: https://issues.apache.org/jira/browse/CASSANDRA-7957 Project: Cassandra Issue Type: Improvement Components: Core, Tools Reporter: Nikolai Grigoriev Priority: Minor I think it might be useful to create a way to see what sstables are being compacted into what new sstable. Something like an extension of nodetool compactionstats. I think it would be easier with this feature to troubleshoot and understand how compactions are happening on your data. Not sure how it is useful in everyday life but I could use such a feature when dealing with CASSANDRA-7949. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138137#comment-14138137 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- Just an update: I have switched to STCS early this morning and by now half of the nodes are getting close to zero pending transactions. Half of remaining nodes seem to be behind but they are compacting at full speed (smoke coming from the lab ;) ) and I see the number of pending compactions going down on them as well. On the nodes where compactions are almost over the number of sstables is now very small, less than a hundred. LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger
[jira] [Created] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
Nikolai Grigoriev created CASSANDRA-7949: Summary: LCS compaction low performance, many pending compactions, nodes are almost idle Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Description: I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. was: I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: system.log.gz LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: system.log.gz I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: iostats.txt LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, system.log.gz I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: vmstat.txt LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: nodetool_compactionstats.txt LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Attachment: nodetool_tpstats.txt LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136549#comment-14136549 ] Nikolai Grigoriev commented on CASSANDRA-7949: -- system log already includes the logs from log4j.logger.org.apache.cassandra.db.compaction (except log4j.logger.org.apache.cassandra.db.compaction.ParallelCompactionIterable) LCS compaction low performance, many pending compactions, nodes are almost idle --- Key: CASSANDRA-7949 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949 Project: Cassandra Issue Type: Bug Components: Core Environment: DSE 4.5.1-1, Cassandra 2.0.8 Reporter: Nikolai Grigoriev Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, system.log.gz, vmstat.txt I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7949: - Description: I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic. However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this: # nodetool compactionstats pending tasks: 3351 compaction typekeyspace table completed total unit progress Compaction myks table_list1 66499295588 1910515889913 bytes 3.48% Active compaction remaining time :n/a # df -h ... /dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1 /dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2 /dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3 # find . -name **table_list1**Data** | grep -v snapshot | wc -l 1310 Among these files I see: 1043 files of 161Mb (my sstable size is 160Mb) 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb 263 files of various sized - between few dozens of Kb and 160Mb I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down. I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes. I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly. By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table. I'll be attaching the relevant logs shortly. was: I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row. This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space. At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration
[jira] [Commented] (CASSANDRA-6173) Unable to delete multiple entries using In clause on clustering part of compound key
[ https://issues.apache.org/jira/browse/CASSANDRA-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091595#comment-14091595 ] Nikolai Grigoriev commented on CASSANDRA-6173: -- In the absence of range-based deletes (because deleting of a slice is not supported) this option is quite important for some data structures. I have just hit a case myself when I need to delete a slice of columns (range of last component of the clustering key in CQL terms). So, first I have found that deleting a slice is not possible (CASSANDRA-494) - so I need to first read the list of values to delete them :) And then I have found that for each value I will be issuing a delete statement :( Unable to delete multiple entries using In clause on clustering part of compound key Key: CASSANDRA-6173 URL: https://issues.apache.org/jira/browse/CASSANDRA-6173 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Ashot Golovenko Priority: Minor I have the following table: CREATE TABLE user_relation ( u1 bigint, u2 bigint, mf int, i boolean, PRIMARY KEY (u1, u2)); And I'm trying to delete two entries using In clause on clustering part of compound key and I fail to do so: cqlsh:bm DELETE from user_relation WHERE u1 = 755349113 and u2 in (13404014120, 12537242743); Bad Request: Invalid operator IN for PRIMARY KEY part u2 Although the select statement works just fine: cqlsh:bm select * from user_relation WHERE u1 = 755349113 and u2 in (13404014120, 12537242743); u1| u2 | i| mf ---+-+--+ 755349113 | 12537242743 | null | 27 755349113 | 13404014120 | null | 0 (2 rows) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7415) COPY command does not quote map keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7415: - Description: {code} create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {code} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. was: {quote} create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {quote} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. COPY command does not quote map keys Key: CASSANDRA-7415 URL: https://issues.apache.org/jira/browse/CASSANDRA-7415 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.5, Linux Reporter: Nikolai Grigoriev Priority: Minor {code} create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {code} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7415) COPY command does not quote map keys
Nikolai Grigoriev created CASSANDRA-7415: Summary: COPY command does not quote map keys Key: CASSANDRA-7415 URL: https://issues.apache.org/jira/browse/CASSANDRA-7415 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.5, Linux Reporter: Nikolai Grigoriev Priority: Minor {quote} create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {quote} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7415) COPY command does not quote map keys
[ https://issues.apache.org/jira/browse/CASSANDRA-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-7415: - Description: {code} cqlsh:myks create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {code} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. was: {code} create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {code} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. COPY command does not quote map keys Key: CASSANDRA-7415 URL: https://issues.apache.org/jira/browse/CASSANDRA-7415 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.5, Linux Reporter: Nikolai Grigoriev Priority: Minor {code} cqlsh:myks create table test (pk text primary key, props mapascii, blob); cqlsh:myks insert into test (pk, props) values ('aaa', {'prop1': 0x1020, 'prop2': 0x4050}); cqlsh:myks copy test to 't.csv'; 1 rows exported in 0.056 seconds. cqlsh:myks copy test from 't.csv'; Bad Request: line 1:74 no viable alternative at input ':' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.012 seconds. cqlsh:myks {code} t.csv: {code} # cat t.csv aaa,{prop1: 0x1020, prop2: 0x4050} {code} I believe the missing quotes in the CSV file cause INSERT to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005971#comment-14005971 ] Nikolai Grigoriev commented on CASSANDRA-6716: -- I have made two more observations, one of them may be unrelated, but still: 1. I had tons of these exceptions when doing compaction or scrubbing on some of the nodes. Disabling Datastax agent on them and restarting the nodes eliminated the exceptions completely. All under heavy load. 2. Just started having these exceptions again on one of the nodes after a minor configuration change (compaction throughput) and restarting the node. Restarted again - same thing, several exceptions per second, all FileNotFoundException when compacting. Stopped the node. Removed the caches stored in /var/lib/cassandra/saved_caches. Started the node. Not a single exception in ~1,5 hours. Again, all this under heavy load. Now I am wondering - where else a reference to a non-existing sstable can be except the cache? If simple restart does not help and the filesystem really does not have the file the server tries to access - then it cannot be something about in-memory cache being out of sync, so it's got to be the persistent one. nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
[jira] [Commented] (CASSANDRA-6285) LCS compaction failing with Exception
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918048#comment-13918048 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- [~krummas] I think using HSHA makes it easier to reproduce but...I am running SYNC for over a week now and recently I have experienced the same issue again. We had another unclean shutdown (hrrr...some people are smarter than the UPSes ;) ) and after bringing the nodes back I have found that on one node my compactions constantly fail with FileNotFoundException. Even worse, I can't scrub the keyspace/CF in question because scrub fails instantly with RuntimeException: Tried to hard link to file that does not exist I have reported that one too. It is impossible to scrub. The only way to fix that issue I have found so far is to restart Cassandra on that node, stop compactions as soon as it starts (well, I could disable them differently, I assume) and then scrub. Sometimes I have to do it in several iterations to complete the process. Once I scrub all problematic KS/CFs I see no more exceptions. LCS compaction failing with Exception - Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Marcus Eriksson Fix For: 2.0.6 Attachments: compaction_test.py After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918463#comment-13918463 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- [~xedin] That seems to be a parameter of the Thrift server...How do I control this parameter? Or I should just disable JNA? 2.0 HSHA server introduces corrupt data --- Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Pavel Yaskevich Priority: Critical Fix For: 2.0.6 Attachments: compaction_test.py After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6285) LCS compaction failing with Exception
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907434#comment-13907434 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- Can confirm on my side. I have switched to sync RPC server and after few scrubs/restarts I am running my load tests on a 6-node 2.0.5 cluster without a single exception in last ~8 hours. I tried to correlate the moment I started getting large number of FileNotFoundException's with other events in my clusterrealized that it was not exactly 2.0.5 upgrade. It seems to correlate mostly with a moment when my jmeter server went out of free space and a bunch of tests crashed. Obviously, these crashes have terminated a few hundreds of client connections to Cassandra. Not sure if it is related but it seems that from that moment it was some sort of snowball effect. LCS compaction failing with Exception - Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Tyler Hobbs Fix For: 2.0.6 Attachments: compaction_test.py After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905486#comment-13905486 ] Nikolai Grigoriev commented on CASSANDRA-6716: -- OK, I am observing *massive* problems with the sstables as of moving from 2.0.4 to 2.0.5. I am rolling back now and scrubbing (I wish I had Mr. Net ;) ). Just when scrubbing OpsCenter keyspaces I see tons of messages like this: {quote} WARN [CompactionExecutor:110] 2014-02-19 14:25:13,811 OutputHandler.java (line 52) 1 out of order rows found while scrubbing SSTableReader(path='/hadoop/disk2/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-jb-1901-Data.db'); Those have been written (in order) to a new sstable (SSTableReader(path='/hadoop/disk5/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-jb-15423-Data.db')) {quote} I am not exaggerating - dozens of thousands. To be fair, I am not 100% if the problem was there with 2.0.4. But as of 2.0.5 I have noticed the frequent exceptions about the key ordering, that caught my attention. nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905506#comment-13905506 ] Nikolai Grigoriev commented on CASSANDRA-6716: -- And more scary one: {quote} WARN [CompactionExecutor:84] 2014-02-19 14:35:25,418 OutputHandler.java (line 52) Unable to r ecover 8 rows that were skipped. You can attempt manual recovery from the pre-scrub snapshot. You can also run nodetool repair to transfer the data from a healthy replica, if any {quote} nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at
[jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906444#comment-13906444 ] Nikolai Grigoriev commented on CASSANDRA-6716: -- I have switched from hsha RPC server to sync to test this theory from CASSANDRA-6285. It seems that the things are getting a bit better. I did some scrubbing. It seems that the most affected sstables were from OpsCenter. Even after scrubbing the entire OpsCenter keyspace on all nodes, shutting down OpsCenter and its agents and restarting Cassandra I am still getting this in the logs: {code} INFO [CompactionExecutor:239] 2014-02-20 01:22:38,931 OutputHandler.java (line 42) Scrubbing SSTableReader(path='/hado op/disk2/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-jb-2152-Data.db') (249309 bytes) WARN [CompactionExecutor:239] 2014-02-20 01:22:38,958 OutputHandler.java (line 57) Error reading row (stacktrace follo ws): org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.serializers.MarshalException: String didn 't validate. at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:152) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:32) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:156) at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:101) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:169) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:199) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:443) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:432) at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:62) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:236) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.cassandra.serializers.MarshalException: String didn't validate. at org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35) at org.apache.cassandra.db.marshal.AbstractType.validate(AbstractType.java:172) at org.apache.cassandra.db.Column.validateName(Column.java:295) at org.apache.cassandra.db.Column.validateFields(Column.java:300) at org.apache.cassandra.db.ExpiringColumn.validateFields(ExpiringColumn.java:181) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:147) ... 21 more {code} nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception
[jira] [Created] (CASSANDRA-6720) Implment support for Log4j DOMConfigurator for Cassandra damon
Nikolai Grigoriev created CASSANDRA-6720: Summary: Implment support for Log4j DOMConfigurator for Cassandra damon Key: CASSANDRA-6720 URL: https://issues.apache.org/jira/browse/CASSANDRA-6720 Project: Cassandra Issue Type: Improvement Components: Config, Core Reporter: Nikolai Grigoriev Priority: Trivial Currently CassandraDaemon explicitly uses PropertyConfigurator to load log4j settings if log4j.defaultInitOverride is set to true, which is done by default. This does not allow to use log4j XML configuration file because it requires using of DOMConfigurator, in the similar fashion. The only way to use it is to change the value of log4j.defaultInitOverride property in the startup script. Here is the background why I think it might be useful to support the XML configuration, even if you hate XML ;) I wanted to ship my Cassandra logs to Logstash and I have been using SocketAppender. But then I have discovered that any issue with Logstash log4j server result in significant performance degradation for Cassandra as the logger blocks. I was able to easily reproduce the problem with a separate test. It seems that the obvious solution was to use AsyncAppender before SocketAppender, that eliminates the blocking. However, AsyncAppender can be only confgured via DOMConfigurator, at least in Log4j 1.2. I think it does not hurt to make a little change to support both configuration types, in a way similar to Spring's Log4jConfigurer: {code} public static void initLogging(String location, long refreshInterval) throws FileNotFoundException { String resolvedLocation = SystemPropertyUtils.resolvePlaceholders(location); File file = ResourceUtils.getFile(resolvedLocation); if (!file.exists()) { throw new FileNotFoundException(Log4j config file [ + resolvedLocation + ] not found); } if (resolvedLocation.toLowerCase().endsWith(XML_FILE_EXTENSION)) { DOMConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval); } else { PropertyConfigurator.configureAndWatch(file.getAbsolutePath(), refreshInterval); } } I would be happy to submit the change unless there are any objections. {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6285) LCS compaction failing with Exception
[ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903449#comment-13903449 ] Nikolai Grigoriev commented on CASSANDRA-6285: -- I have started seeing these too. Surprisingly...after adding OpsCenter CE to my cluster. I do not see these associated with my own data. {code} java.lang.RuntimeException: Last written key DecoratedKey(3542937286762954312, 31302e332e34352e3135382d676574466c757368657350656e64696e67) = current key DecoratedKey(-2152912038130700738, 31302e332e34352e3135362d77696e7465726d7574655f6a6d657465722d776d5f6170706c69636174696f6e732d676574526563656e744 26c6f6f6d46) writing into /hadoop/disk1/cassandra/data/OpsCenter/rollups300/OpsCenter-rollups300-tmp-jb-5055-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:165) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} LCS compaction failing with Exception - Key: CASSANDRA-6285 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 Project: Cassandra Issue Type: Bug Components: Core Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 Reporter: David Sauer Assignee: Tyler Hobbs Fix For: 2.0.6 Attachments: compaction_test.py After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. The compaction started and ran until the logs showed this: ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) = current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Moving back to STC worked to keep the compactions running. Especialy my own Table i would like to move to LCS. After a major compaction with STC the move to LCS fails with the same Exception.
[jira] [Created] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
Nikolai Grigoriev created CASSANDRA-6716: Summary: nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {code} Also I have noticed that the files that are missing are often (or maybe always?) referred to in the log as follows: {quote} WARN 00:06:10,597 At level 3,
[jira] [Updated] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6716: - Attachment: system.log.gz log from one of the nodes nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at
[jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903662#comment-13903662 ] Nikolai Grigoriev commented on CASSANDRA-6716: -- Yes, but I was not sure if the problem with missing sstables is the consequence of that issue. nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Comment Edited] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903662#comment-13903662 ] Nikolai Grigoriev edited comment on CASSANDRA-6716 at 2/18/14 12:54 AM: Yes, but I was not sure if the problem with missing sstables is the consequence of that issue. And, unlike with that issue I did not upgrade from 1.2. was (Author: ngrigoriev): Yes, but I was not sure if the problem with missing sstables is the consequence of that issue. nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist) -- Key: CASSANDRA-6716 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7 Reporter: Nikolai Grigoriev Attachments: system.log.gz It seems that since recently I have started getting a number of exceptions like File not found on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs. I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find. One one of the nodes currently the nodetool scrub command fails instantly and consistently with this exception: {code} # /opt/cassandra/bin/nodetool scrub Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at
[jira] [Commented] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867189#comment-13867189 ] Nikolai Grigoriev commented on CASSANDRA-6407: -- [~xedin] Source patch will be OK too, whichever is simpler for you. We are building our Cassandra from source with two patches that are scheduled for 2.0.5. I do not mind rebuilding another dependency :) Thanks! CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz, system.log.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867356#comment-13867356 ] Nikolai Grigoriev commented on CASSANDRA-6407: -- I have tested the updated Thrift server with a single-node cluster using my test case and in my larger cluster with my original test - it seems to be working correctly now with large responses! Thanks!!! CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz, disruptor-thrift-server-0.3.3-SNAPSHOT.jar, system.log.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6407) CQLSH hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866117#comment-13866117 ] Nikolai Grigoriev commented on CASSANDRA-6407: -- Some additional details. I can confirm that the problem is not limited to CQLSH, it can be reproduced via CQL/Thrift. Which does not surprise me, I was assuming that's what CQLSH is using today. One of my coworkers has pointed out that he did not observe this problem in his small single-node cluster, even with larger amounts of data in one response. I was curious enough to try it so I have configured a single-node Cassandra 2.0.4 cluster on a spare Linux machine, loaded my schema there and generated the problematic test data set. I could not reproduce the problem, i.e. I was getting back much larger result set than in my larger cluster. After that I took my production cassandra.yaml, changed the cluster name to a dummy one, reinitialized that single-node cluster with new config, reloaded the data and I could immediately reproduce the problem. To keep long story short, I was comparing the parameters I changed in my config with the defaults and finally found THE parameter that is clearly responsible for this issue: rpc_server_type. If set to sync, then I can query larger data set. If set to hsha - I can only query up to ~256Kb of data and then the connection gets stuck forever. Anything obvious that I am missing about the limitations of hsha? CQLSH hangs forever when querying more than certain amount of data -- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Tools Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Reproduced In: 2.0.4 CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Component/s: (was: Tools) Core CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Summary: CQL/Thrift request hangs forever when querying more than certain amount of data (was: CQLSH hangs forever when querying more than certain amount of data) CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866133#comment-13866133 ] Nikolai Grigoriev commented on CASSANDRA-6407: -- It sounds somewhat related to: CASSANDRA-4573 CASSANDRA-6373 CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Attachment: cassandra.yaml CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Attachment: cassandra6407test.cql.gz CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866245#comment-13866245 ] Nikolai Grigoriev commented on CASSANDRA-6407: -- [~xedin] I have prepared a simple test that does demonstrate the problem even in a small single-node cluster. Interestingly enough, with this test and such a small cluster with no load at all sometimes it actually works. So, here is how I use it: 1. Set the RPC server type to hsha 2. Load the attached CQL ile 3. Use CQLSH use cassandra6407test ; select * from my_test_table ; In most of the cases this SELECT gets stuck forever. Sometimes if you interrupt it (after a while) and do it again it actually returns all the data on the second attempt. Sometimes it does not. If you restart CQLSH and do it again - it will get stuck again. Specifying a LIMIT above 24-25 demonstrates similar behavior. If you switch RPC server type to sync and restart, then select * from my_test_table ; works all the time. It almost feels like some sort of race condition or a timing issue somewhere between the part that produces the query result and the part that streams it back to the client. CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev updated CASSANDRA-6407: - Attachment: system.log.gz this is the DEBUG log - I have tried that select * request 3 times after restarting the server with RPC server type set to hsha. CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz, system.log.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb for each entity it *seems* like the query hangs when the total size of the response exceeds 252..256Kb. Looks quite suspicious especially because 256Kb is such a particular number. I am wondering if this has something to do with the result paging. I did not test if the issue is reproducible outside of CQLSH but I do recall that I observed somewhat similar behavior when fetching relatively large data sets. I can consistently reproduce this problem on my cluster. I am also attaching the jstack output that I have captured when CQLSH was hanging on one of these queries. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-6407) CQL/Thrift request hangs forever when querying more than certain amount of data
[ https://issues.apache.org/jira/browse/CASSANDRA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866245#comment-13866245 ] Nikolai Grigoriev edited comment on CASSANDRA-6407 at 1/9/14 3:21 AM: -- [~xedin] I have prepared a simple test that does demonstrate the problem even in a small single-node cluster. Interestingly enough, with this test and such a small cluster with no load at all sometimes it actually works. So, here is how I use it: 1. Set the RPC server type to hsha 2. Load the attached CQL ile 3. Use CQLSH use cassandra6407test ; select * from my_test_table ; In most of the cases this SELECT gets stuck forever. Sometimes if you interrupt it (after a while) and do it again it actually returns all the data on the second attempt. Sometimes it does not. If you restart CQLSH and do it again - it will get stuck again. Specifying a LIMIT above 24-25 demonstrates similar behavior. If you switch RPC server type to sync and restart, then select * from my_test_table ; works all the time. It almost feels like some sort of race condition or a timing issue somewhere between the part that produces the query result and the part that streams it back to the client. The server config I have attached is simplified, I have disabled JNA, JEMalloc etc to have a configuration that is as close as possible to the default installation. was (Author: ngrigoriev): [~xedin] I have prepared a simple test that does demonstrate the problem even in a small single-node cluster. Interestingly enough, with this test and such a small cluster with no load at all sometimes it actually works. So, here is how I use it: 1. Set the RPC server type to hsha 2. Load the attached CQL ile 3. Use CQLSH use cassandra6407test ; select * from my_test_table ; In most of the cases this SELECT gets stuck forever. Sometimes if you interrupt it (after a while) and do it again it actually returns all the data on the second attempt. Sometimes it does not. If you restart CQLSH and do it again - it will get stuck again. Specifying a LIMIT above 24-25 demonstrates similar behavior. If you switch RPC server type to sync and restart, then select * from my_test_table ; works all the time. It almost feels like some sort of race condition or a timing issue somewhere between the part that produces the query result and the part that streams it back to the client. CQL/Thrift request hangs forever when querying more than certain amount of data --- Key: CASSANDRA-6407 URL: https://issues.apache.org/jira/browse/CASSANDRA-6407 Project: Cassandra Issue Type: Bug Components: Core Environment: Oracle Linux 6.4, JDK 1.7.0_25-b15, Cassandra 2.0.2 Reporter: Nikolai Grigoriev Attachments: cassandra.jstack.gz, cassandra.yaml, cassandra6407test.cql.gz, system.log.gz I have a table like this (slightly simplified for clarity): {code} CREATE TABLE my_test_table ( uid uuid, d_id uuid, a_id uuid, c_idtext, i_idblob, datatext, PRIMARY KEY ((uid, d_id, a_id), c_id, i_id) ); {code} I have created about over a hundred (117 to be specific) of sample entities with the same row key and different clustering keys. Each has a blob of approximately 4Kb. I have tried to fetch all of them with a query like this via CQLSH: {code} select * from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} This query simply hangs in CQLSH, it does not return at all until I abort it. Then I started playing with LIMIT clause and found that this query returns instantly (with good data) when I use LIMIT 55 but hangs forever when I use LIMIT 56. Then I tried to just query all i_id values like this: {code} select i_id from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' {code} And this query returns instantly with the complete set of 117 values. So I started thinking that it must be something about the total size of the response, not the number of results or the number of columns to be fetches in slices. And I have tried another test: {code} select cdata from my_test_table where uid=44338526-7aac-4640-bcde-0f4663c07572 and a_id=--4000--0002 and d_id=--1e64--0001 and c_id='list-2' LIMIT 63 {code} This query returns instantly but if I change the limit to 64 it hangs forever. Since my blob is about 4Kb
[jira] [Resolved] (CASSANDRA-6528) TombstoneOverwhelmingException is thrown while populating data in recently truncated CF
[ https://issues.apache.org/jira/browse/CASSANDRA-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Grigoriev resolved CASSANDRA-6528. -- Resolution: Cannot Reproduce Closing since I cannot reproduce it anymore. Will reopen if I manage to reproduce it again and capture the debug information as per instructions above. TombstoneOverwhelmingException is thrown while populating data in recently truncated CF --- Key: CASSANDRA-6528 URL: https://issues.apache.org/jira/browse/CASSANDRA-6528 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassadra 2.0.3, Linux, 6 nodes Reporter: Nikolai Grigoriev Priority: Minor I am running some performance tests and recently I had to flush the data from one of the tables and repopulate it. I have about 30M rows with a few columns in each, about 5kb per row in in total. In order to repopulate the data I do truncate table from CQLSH and then relaunch the test. The test simply inserts the data in the table, does not read anything. Shortly after restarting the data generator I see this on one of the nodes: {code} INFO [HintedHandoff:655] 2013-12-26 16:45:42,185 HintedHandOffManager.java (line 323) Started hinted handoff f or host: 985c8a08-3d92-4fad-a1d1-7135b2b9774a with IP: /10.5.45.158 ERROR [HintedHandoff:655] 2013-12-26 16:45:42,680 SliceQueryFilter.java (line 200) Scanned ove r 10 tombstones; query aborted (see tombstone_fail_threshold) ERROR [HintedHandoff:655] 2013-12-26 16:45:42,680 CassandraDaemon.java (line 187) Exception in thread Thread[HintedHandoff:655,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:56) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) at org.apache.cassandra.db.HintedHandOffManager.access$4(HintedHandOffManager.java:281) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) INFO [OptionalTasks:1] 2013-12-26 16:45:53,946 MeteredFlusher.java (line 63) flushing high-traffic column family CFS(Keyspace='test_jmeter', ColumnFamily='test_profiles') (estimated 192717267 bytes) {code} I am inserting the data with CL=1. It seems to be happening every time I do it. But I do not see any errors on the client side and the node seems to continue operating, this is why I think it is not a major issue. Maybe not an issue at all, but the message is logged as ERROR. -- This message was sent by Atlassian JIRA (v6.1.5#6160)