[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502388#comment-15502388 ] Stefania commented on CASSANDRA-9318: - Latest dtest [build|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-dtest/7/] completed without failures. > Bound the number of in-flight requests at the coordinator > - > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging >Reporter: Ariel Weisberg >Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11218) Prioritize Secondary Index rebuild
[ https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502288#comment-15502288 ] Jeff Jirsa edited comment on CASSANDRA-11218 at 9/19/16 5:04 AM: - I have a version of this patch I'll be submitting very soon, but while I wait for internal approvals, I'd like to describe the implementation so that those of you who care about this can provide feedback conceptually before I submit a patch for review. I'm implementing this as a priority queue that uses a custom comparator implemented with three tiers: * Operation type priority (to allow certain types - like index rebuild - to run at higher priorities, and others - scrub / cleanup / verify - to run at much lower priorities). This is defined as an int field in the enum in the OperationType, and can be overridden via system property. Lot of opportunity for bike shedding here in picking exact priorities - I've chosen (highest priority to lowest): ** Anticompaction / Index Summary Redistribution ** Index Build / View Build ** Key Cache Save / Row Cache Save / Counter Cache Save ** User Defined Compaction ** Compaction (including maximal/major compaction) ** Tombstone Compaction ** Scrub / Cleanup / Upgrade SSTables ** Verify * Sub type priority (to allow compaction tasks within a type to have preference - to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set by the compaction strategies, and by default, I'm setting this as the bytes on disk of the source sstables - larger transactions (at the time the task was created) preferred over smaller transactions. * Timestamp priority, where tasks with the same type/subtype values are served FIFO. The implementation here was pretty straight forward - we create a new interface to expose the three priority values, and then extend AbstractCompactionTask and de-anonymize the handful of anonymous runnables/wrapped runnables/callables to implement that interface so they can be sorted in the PriorityBlockingQueue. There may an opportunity to try to get clever to protect against starvation in under-resourced systems, such as increasing type priority over time as tasks age, but I'm leaving that as a potential optimization for the future - I'm not sure it's really needed, it makes reasoning about compaction harder, but maybe there exists a use case where it's necessary. Expecting to submit the patch early this week - if either of you (Sankalp / Marcus) finds this approach conflicts with your expectations, or if you want to volunteer to review, let me know. was (Author: jjirsa): I have a version of this patch I'll be submitting very soon, but while I wait for internal approvals, I'd like to describe the implementation so that those of you who care about this can provide feedback conceptually before I submit a patch for review. I'm implementing this as a priority queue that uses a custom comparator implemented with three tiers: * Operation type priority (to allow certain types - like index rebuild - to run at higher priorities, and others - scrub / cleanup / verify - to run at much lower priorities). This is defined as an int field in the enum in the OperationType, and can be overridden via system property. Lot of opportunity for bike shedding here in picking exact priorities - I've chosen (highest priority to lowest): ** Anticompaction ** Index Build / View Build ** Key Cache Save / Row Cache Save / Counter Cache Save ** User Defined Compaction ** Compaction (including maximal/major compaction) ** Tombstone Compaction ** Scrub / Cleanup / Upgrade SSTables ** Index Summary Redistribution ** Verify * Sub type priority (to allow compaction tasks within a type to have preference - to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set by the compaction strategies, and by default, I'm setting this as the bytes on disk of the source sstables - larger transactions (at the time the task was created) preferred over smaller transactions. * Timestamp priority, where tasks with the same type/subtype values are served FIFO. The implementation here was pretty straight forward - we create a new interface to expose the three priority values, and then extend AbstractCompactionTask and de-anonymize the handful of anonymous runnables/wrapped runnables/callables to implement that interface so they can be sorted in the PriorityBlockingQueue. There may an opportunity to try to get clever to protect against starvation in under-resourced systems, such as increasing type priority over time as tasks age, but I'm leaving that as a potential optimization for the future - I'm not sure it's really needed, it makes reasoning about compaction harder, but maybe there exists a use case where it's necessary. Expecting to submit the patch early this week - if either of you (Sankalp / Marcus) finds this approach
[jira] [Commented] (CASSANDRA-11218) Prioritize Secondary Index rebuild
[ https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502288#comment-15502288 ] Jeff Jirsa commented on CASSANDRA-11218: I have a version of this patch I'll be submitting very soon, but while I wait for internal approvals, I'd like to describe the implementation so that those of you who care about this can provide feedback conceptually before I submit a patch for review. I'm implementing this as a priority queue that uses a custom comparator implemented with three tiers: * Operation type priority (to allow certain types - like index rebuild - to run at higher priorities, and others - scrub / cleanup / verify - to run at much lower priorities). This is defined as an int field in the enum in the OperationType, and can be overridden via system property. Lot of opportunity for bike shedding here in picking exact priorities - I've chosen (highest priority to lowest): ** Anticompaction ** Index Build / View Build ** Key Cache Save / Row Cache Save / Counter Cache Save ** User Defined Compaction ** Compaction (including maximal/major compaction) ** Tombstone Compaction ** Scrub / Cleanup / Upgrade SSTables ** Index Summary Redistribution ** Verify * Sub type priority (to allow compaction tasks within a type to have preference - to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set by the compaction strategies, and by default, I'm setting this as the bytes on disk of the source sstables - larger transactions (at the time the task was created) preferred over smaller transactions. * Timestamp priority, where tasks with the same type/subtype values are served FIFO. The implementation here was pretty straight forward - we create a new interface to expose the three priority values, and then extend AbstractCompactionTask and de-anonymize the handful of anonymous runnables/wrapped runnables/callables to implement that interface so they can be sorted in the PriorityBlockingQueue. There may an opportunity to try to get clever to protect against starvation in under-resourced systems, such as increasing type priority over time as tasks age, but I'm leaving that as a potential optimization for the future - I'm not sure it's really needed, it makes reasoning about compaction harder, but maybe there exists a use case where it's necessary. Expecting to submit the patch early this week - if either of you (Sankalp / Marcus) finds this approach conflicts with your expectations, or if you want to volunteer to review, let me know. > Prioritize Secondary Index rebuild > -- > > Key: CASSANDRA-11218 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11218 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Jeff Jirsa >Priority: Minor > > We have seen that secondary index rebuild get stuck behind other compaction > during a bootstrap and other operations. This causes things to not finish. We > should prioritize index rebuild via a separate thread pool or using a > priority queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502215#comment-15502215 ] Stefania edited comment on CASSANDRA-9318 at 9/19/16 4:09 AM: -- Thanks [~sbtourist], the latest commits and the entire patch LGTM. The test failures are unrelated and we have tickets for all of them: CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more dtest build to cover the final commit and to hopefully shake off the CASSANDRA-12656 failures since these tests shouldn't even be running now. I've squashed your entire patch into [one commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726] and fixed some formatting issues (mostly trailing spaces) [here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730] on this [branch|https://github.com/stef1927/cassandra/commits/9318]. If you could double check the formatting nits, I can squash them and commit once the final dtest build has also completed. was (Author: stefania): Thanks [~sbtourist], the latest commits and the entire patch LGTM. The test failures are all unrelated and we have tickets for all of them: CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more dtest build to cover the final commit and to hopefully shake off the CASSANDRA-12656 failures since these tests shouldn't even be running now. I've squashed your entire patch into [one commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726] and fixed some formatting issues (mostly trailing spaces) [here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730] on this [branch|https://github.com/stef1927/cassandra/commits/9318]. If you could double check the formatting nits, I can squash them and commit once the final dtest build has also completed. > Bound the number of in-flight requests at the coordinator > - > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging >Reporter: Ariel Weisberg >Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502215#comment-15502215 ] Stefania commented on CASSANDRA-9318: - Thanks [~sbtourist], the latest commits and the entire patch LGTM. The test failures are all unrelated and we have tickets for all of them: CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more dtest build to cover the final commit and to hopefully shake off the CASSANDRA-12656 failures since these tests shouldn't even be running now. I've squashed your entire patch into [one commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726] and fixed some formatting issues (mostly trailing spaces) [here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730] on this [branch|https://github.com/stef1927/cassandra/commits/9318]. If you could double check the formatting nits, I can squash them and commit once the final dtest build has also completed. > Bound the number of in-flight requests at the coordinator > - > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging >Reporter: Ariel Weisberg >Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data
[ https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502203#comment-15502203 ] Wei Deng commented on CASSANDRA-12659: -- Did you preserve the SSTables (snapshot) when you were able to reproduce the problem? If yes, it will be useful to use "nodetool getendpoints" and "nodetool getsstables" to extract a number of SSTables that contains the partition in question and upload them here along with the schema (assuming it doesn't contain sensitive information). If you're no longer able to reproduce the problem, then there is no need to provide the SSTables. Without a repro case, it will be hard for people to look into it. However, filing this JIRA right now is still valuable because if other people run into the same problem in a different occasion, they will have something to compare notes to determine if this is a real problem. If after a while nobody else runs into this issue and you are still not able to reproduce it, the JIRA could end up getting closed eventually. > Query in reversed order brough back deleted data > > > Key: CASSANDRA-12659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12659 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.0.5, 6 nodes cluster >Reporter: Tai Khuu Tan > > We have and issues with our Cassandra 3.0.5. After we deleted a large amount > of data in the multiple partition keys. Query those partition keys with > reversed order on a clustering key return the deleted data. I have checked > and there are no tombstones left. All of them are deleted. So I don't know > where or how can those deleted data still exist. Is there any other place > that Cassandra will read data when query in reverse order compare to normal > order ? > the schema is very simple > {noformat} > CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 > varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, > version, data1 , data2 , data3 , data4 ) ) with compact storage; > {noformat} > Query are doing reverse order on column timestamp > Ex: > {noformat} > select * from data where uid="uid1" order by version DESC > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12649) Add BATCH metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alwyn Davis updated CASSANDRA-12649: Attachment: trunk-12649.txt Added new metrics and basic test cases. > Add BATCH metrics > - > > Key: CASSANDRA-12649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12649 > Project: Cassandra > Issue Type: Wish >Reporter: Alwyn Davis >Priority: Minor > Fix For: 3.x > > Attachments: trunk-12649.txt > > > To identify causes of load on a cluster, it would be useful to have some > additional metrics: > * *Mutation size distribution:* I believe this would be relevant when > tracking the performance of unlogged batches. > * *Logged / Unlogged Partitions per batch distribution:* This would also give > a count of batch types processed. Multiple distinct tables in batch would > just be considered as separate partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12649) Add BATCH metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alwyn Davis updated CASSANDRA-12649: Status: Patch Available (was: Open) > Add BATCH metrics > - > > Key: CASSANDRA-12649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12649 > Project: Cassandra > Issue Type: Wish >Reporter: Alwyn Davis >Priority: Minor > Fix For: 3.x > > Attachments: trunk-12649.txt > > > To identify causes of load on a cluster, it would be useful to have some > additional metrics: > * *Mutation size distribution:* I believe this would be relevant when > tracking the performance of unlogged batches. > * *Logged / Unlogged Partitions per batch distribution:* This would also give > a count of batch types processed. Multiple distinct tables in batch would > just be considered as separate partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12664) GCCompactionTest is flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502058#comment-15502058 ] Stefania commented on CASSANDRA-12664: -- cc [~blambov] and [~krummas]. > GCCompactionTest is flaky > - > > Key: CASSANDRA-12664 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12664 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Stefania >Priority: Minor > Fix For: 3.x > > > {{GCCompactionTest}} was introduced by CASSANDRA-7019 and appears to be > flaky, see for example > [here|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.cql3/GcCompactionTest/testGcCompactionStatic/]. > > I think it's the same root cause as CASSANDRA-12282: the tables in the test > keyspace are dropped asynchronously after each test, and this might cause > additional flush operations for all dirty tables in the keyspace. See the > [callstack|https://issues.apache.org/jira/browse/CASSANDRA-12282?focusedCommentId=15399098=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399098] > in 12282. > A possible solution is to use KEYSPACE_PER_TEST, which is instead dropped > synchronously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12664) GCCompactionTest is flaky
Stefania created CASSANDRA-12664: Summary: GCCompactionTest is flaky Key: CASSANDRA-12664 URL: https://issues.apache.org/jira/browse/CASSANDRA-12664 Project: Cassandra Issue Type: Bug Components: Local Write-Read Paths Reporter: Stefania Priority: Minor Fix For: 3.x {{GCCompactionTest}} was introduced by CASSANDRA-7019 and appears to be flaky, see for example [here|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.cql3/GcCompactionTest/testGcCompactionStatic/]. I think it's the same root cause as CASSANDRA-12282: the tables in the test keyspace are dropped asynchronously after each test, and this might cause additional flush operations for all dirty tables in the keyspace. See the [callstack|https://issues.apache.org/jira/browse/CASSANDRA-12282?focusedCommentId=15399098=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399098] in 12282. A possible solution is to use KEYSPACE_PER_TEST, which is instead dropped synchronously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data
[ https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501977#comment-15501977 ] Tai Khuu Tan commented on CASSANDRA-12659: -- The weird thing is only reversed order query return deleted data, even with Consistency level set to ALL, normal query won't return deleted data, and there are no tombstone also, so I really don't know where the data come from. I tried to reproduce it but i couldn't. I will keep trying to see if I can do it. > Query in reversed order brough back deleted data > > > Key: CASSANDRA-12659 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12659 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.0.5, 6 nodes cluster >Reporter: Tai Khuu Tan > > We have and issues with our Cassandra 3.0.5. After we deleted a large amount > of data in the multiple partition keys. Query those partition keys with > reversed order on a clustering key return the deleted data. I have checked > and there are no tombstones left. All of them are deleted. So I don't know > where or how can those deleted data still exist. Is there any other place > that Cassandra will read data when query in reverse order compare to normal > order ? > the schema is very simple > {noformat} > CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 > varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, > version, data1 , data2 , data3 , data4 ) ) with compact storage; > {noformat} > Query are doing reverse order on column timestamp > Ex: > {noformat} > select * from data where uid="uid1" order by version DESC > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12471) Allow for some form of "unset" in CQL's COPY command.
[ https://issues.apache.org/jira/browse/CASSANDRA-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-12471: - Assignee: (was: Stefania) > Allow for some form of "unset" in CQL's COPY command. > - > > Key: CASSANDRA-12471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12471 > Project: Cassandra > Issue Type: New Feature > Components: CQL >Reporter: Nate Sanders >Priority: Minor > Fix For: 2.2.0 > > > Currently, it looks like there's no way to get "unset" values via the COPY > command, say, for example with empty string fields. Instead, these create > tombstones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12590) Segfault reading secondary index
[ https://issues.apache.org/jira/browse/CASSANDRA-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501742#comment-15501742 ] Cameron Zemek commented on CASSANDRA-12590: --- [~beobal] I will see what I can do. [~ifesdjeen] Not sure exactly what you mean by sstable flush threshold. Here related settings from cassandra.yaml: {noformat} memtable_allocation_type: offheap_objects concurrent_writes: 2 key_cache_size_in_mb: 0 memtable_flush_writers: 1 concurrent_compactors: 1 concurrent_reads: 2 commitlog_total_space_in_mb: 1024 file_cache_size_in_mb: '1' {noformat} > Segfault reading secondary index > > > Key: CASSANDRA-12590 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12590 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: Occurs on Cassandra 3.5 and 3.7 >Reporter: Cameron Zemek >Assignee: Sam Tunnicliffe > > Getting segfaults when reading secondary index as follows: > {code} > J 9272 C2 > org.apache.cassandra.dht.LocalPartitioner$LocalToken.compareTo(Lorg/apache/cassandra/dht/Token;)I > (53 bytes) @ 0x7fd7354749b7 [0x7fd735474840+0x177] > J 5661 C2 org.apache.cassandra.db.DecoratedKey.compareTo(Ljava/lang/Object;)I > (9 bytes) @ 0x7fd7351b35b8 [0x7fd7351b3440+0x178] > J 14205 C2 > java.util.concurrent.ConcurrentSkipListMap.doGet(Ljava/lang/Object;)Ljava/lang/Object; > (142 bytes) @ 0x7fd736404dd8 [0x7fd736404cc0+0x118] > J 17764 C2 > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator; > (635 bytes) @ 0x7fd736e09638 [0x7fd736e08720+0xf18] > J 17808 C2 > org.apache.cassandra.index.internal.CassandraIndexSearcher.search(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator; > (68 bytes) @ 0x7fd736e01a48 [0x7fd736e012a0+0x7a8] > J 14217 C2 > org.apache.cassandra.db.ReadCommand.executeLocally(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator; > (219 bytes) @ 0x7fd736417c1c [0x7fd736416fa0+0xc7c] > J 14585 C2 > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()V > (337 bytes) @ 0x7fd736541e6c [0x7fd736541d60+0x10c] > J 14584 C2 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()V > (48 bytes) @ 0x7fd7357957b4 [0x7fd735795760+0x54] > J 9648% C2 org.apache.cassandra.concurrent.SEPWorker.run()V (253 bytes) @ > 0x7fd735938d8c [0x7fd7359356e0+0x36ac] > {code} > Which I have translated to the codepath: > org.apache.cassandra.dht.LocalPartitioner (Line 139) > org.apache.cassandra.db.DecoratedKey (Line 85) > java.util.concurrent.ConcurrentSkipListMap (Line 794) > org.apache.cassandra.db.SinglePartitionReadCommand (Line 498) > org.apache.cassandra.index.internal.CassandraIndexSearcher (Line 60) > org.apache.cassandra.db.ReadCommand (Line 367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12663) Allow per DC segregation, grant user to create different indices per datacenter on tables
[ https://issues.apache.org/jira/browse/CASSANDRA-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhuvan Rawal updated CASSANDRA-12663: - Summary: Allow per DC segregation, grant user to create different indices per datacenter on tables (was: Allowing per DC segregation of schema, allowing user to create different indices per datacenter on tables) > Allow per DC segregation, grant user to create different indices per > datacenter on tables > - > > Key: CASSANDRA-12663 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12663 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Bhuvan Rawal > > For analytics & auditing purposes it becomes essential to serve different > access patterns than that modeled from a partition key fetch perspective, > although a limited reads are needed by users but if enabled cluster wide it > will require index write for every row written on that table on every single > node on every DC even the one which may be serving read operations. A user > may not want to have indices built on Transactional DC on every write, that > computation and disk utilization may not be useful as the Analytics may > possibly be performed on other DC. > It will be a plus to have analytics / auditing workload built inside > Cassandra itself using native secondary indices / SASI indices / Stratio by > creating indices for a specific datacenter and not having to ship off data to > other index stores like Elasticsearch through application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12663) Allowing per DC segregation of schema, allowing user to create different indices per datacenter on tables
Bhuvan Rawal created CASSANDRA-12663: Summary: Allowing per DC segregation of schema, allowing user to create different indices per datacenter on tables Key: CASSANDRA-12663 URL: https://issues.apache.org/jira/browse/CASSANDRA-12663 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Bhuvan Rawal For analytics & auditing purposes it becomes essential to serve different access patterns than that modeled from a partition key fetch perspective, although a limited reads are needed by users but if enabled cluster wide it will require index write for every row written on that table on every single node on every DC even the one which may be serving read operations. A user may not want to have indices built on Transactional DC on every write, that computation and disk utilization may not be useful as the Analytics may possibly be performed on other DC. It will be a plus to have analytics / auditing workload built inside Cassandra itself using native secondary indices / SASI indices / Stratio by creating indices for a specific datacenter and not having to ship off data to other index stores like Elasticsearch through application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501472#comment-15501472 ] DOAN DuyHai edited comment on CASSANDRA-12573 at 9/18/16 6:59 PM: -- Ok it's my bad. The root of the operation tree for the QueryPlanner is an {{AND}} https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60 The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : {{CONTAINS RevisionDiff}} & {{CONTAINS ItemImpl}} and the *AND* logic does apply. The comment in the source code is pretty misleading. Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results are also consistent Only experiment 3 results are wrong: {code:sql} insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%'; {code} Expected result: qweasd, qwea1. Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. Let me reproduce it was (Author: doanduyhai): Ok it's my bad. The root of the operation tree for the QueryPlanner is an {{AND}} https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60 The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : {{CONTAINS RevisionDiff}} & {{CONTAINS ItemImpl}} and the **AND** logic does apply. The comment in the source code is pretty misleading. Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results are also consistent Only experiment 3 results are wrong: ```sql insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%'; ``` Expected result: qweasd, qwea1. Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. Let me reproduce it > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: asdqwe, qweasd, qwea1. > Experiment 3 (NOTE: primary key is compound now and inserted data was > changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy',
[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501472#comment-15501472 ] DOAN DuyHai commented on CASSANDRA-12573: - Ok it's my bad. The root of the operation tree for the QueryPlanner is an {{AND}} https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60 The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : {{CONTAINS RevisionDiff}} & {{CONTAINS ItemImpl}} and the **AND** logic does apply. The comment in the source code is pretty misleading. Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results are also consistent Only experiment 3 results are wrong: ```sql insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%'; ``` Expected result: qweasd, qwea1. Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. Let me reproduce it > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: asdqwe, qweasd, qwea1. > Experiment 3 (NOTE: primary key is compound now and inserted data was > changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, > c1)); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. > Experiment 4 (NOTE: search criteria is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
[jira] [Commented] (CASSANDRA-12662) OOM when using SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501465#comment-15501465 ] Maxim Podkolzine commented on CASSANDRA-12662: -- Got it, thanks a lot! We don't use SSD right now, I need to check what the actual storage is. I'll try to get as much CPU and RAM as possible and get back with the results. > OOM when using SASI index > - > > Key: CASSANDRA-12662 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12662 > Project: Cassandra > Issue Type: Bug > Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes > ~8Gb, of which ~4Gb is Java heap >Reporter: Maxim Podkolzine >Priority: Critical > Fix For: 3.6 > > Attachments: memory-dump.png > > > 2.8Gb of the heap is taken by the index data, pending for flush (see the > screenshot). As a result the node fails with OOM. > Questions: > - Why can't Cassandra keep up with the inserted data and flush it? > - What resources/configuration should be changed to improve the performance? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501454#comment-15501454 ] DOAN DuyHai commented on CASSANDRA-12573: - Let me reproduce your results with an unit test > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: asdqwe, qweasd, qwea1. > Experiment 3 (NOTE: primary key is compound now and inserted data was > changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, > c1)); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. > Experiment 4 (NOTE: search criteria is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, > c1)); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w22%a%'; > {noformat} > Expected result: no rows. > Actual result: qweasd, qwea1, asdqwe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501443#comment-15501443 ] Maxim Podkolzine commented on CASSANDRA-12573: -- There are 7 rows that contain "RevisionDiff" and 2 rows that contain "ItemImpl". There are 9 rows that contain "RevisionDiff" OR "ItemImpl". Here they are (only the name): - RevisionDiffType.java: it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - RevisionDiffItem.java: it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - RevisionDiffItemDTO.java: it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - GetRevisionDiff.java: it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - RevisionDiffItemDTO.java (twice): it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - RevisionDiffItemImpl.java: it contains "RevisionDiff", hence it contains "RevisionDiff" OR "ItemImpl" - FastTreeItemImpl.java: it contains "ItemImpl", hence it contains "RevisionDiff" OR "ItemImpl" - RevisionDiffItemImpl.java: it contains "ItemImpl", hence it contains "RevisionDiff" OR "ItemImpl" Of these 9 rows there is one row that contains both "RevisionDiff" AND "ItemImpl": "RevisionDiffItemImpl.java". > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: asdqwe, qweasd, qwea1. > Experiment 3 (NOTE: primary key is compound now and inserted data was > changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, > c1)); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. > Experiment 4 (NOTE: search criteria is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : >
[jira] [Commented] (CASSANDRA-12662) OOM when using SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501440#comment-15501440 ] DOAN DuyHai commented on CASSANDRA-12662: - Default hardcoded value for memIndexTable is 1Gb: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L369-L372 bq. 2.8Gb of the heap is taken by the index data, pending for flush (see the screenshot) When you have more than 1Gb of index data, SASI flushes the index by chunks of 1Gb into temporary index files : https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L247-L250 Then it needs a 2nd pass to merge them into memory to write the final index file, see here: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L311-L326 Thus, it may take a while to write the final SASI index file. The speed of all this depends on many factors, mainly CPU and Disk I/O. What are you disk hardware specs ? SSD ? Spinning disk ? shared storage ? bq. Why can't Cassandra keep up with the inserted data and flush it? Write are CPU-intensive. Compactions are more disk I/O intensive bq. What resources/configuration should be changed to improve the performance? Right now, 4 cores CPU is below the official recommendation to run Cassandra in production, which is 8 cores CPU. Same for RAM, recommendation is 32Gb, see here: http://cassandra.apache.org/doc/latest/operating/hardware.html > OOM when using SASI index > - > > Key: CASSANDRA-12662 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12662 > Project: Cassandra > Issue Type: Bug > Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes > ~8Gb, of which ~4Gb is Java heap >Reporter: Maxim Podkolzine >Priority: Critical > Fix For: 3.6 > > Attachments: memory-dump.png > > > 2.8Gb of the heap is taken by the index data, pending for flush (see the > screenshot). As a result the node fails with OOM. > Questions: > - Why can't Cassandra keep up with the inserted data and flush it? > - What resources/configuration should be changed to improve the performance? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501413#comment-15501413 ] DOAN DuyHai commented on CASSANDRA-12573: - bq. That's good news. When do you plan to merge it? See this JIRA: [CASSANDRA-10765] (second comment) bq. As a customer I have a slightly different view on this. My expectations are based on prior experience and common sense. What are you talking about ? Customer of what ? Apache Cassandra is open-source. bq. My current impression is that this feature is half-baked and not well tested. But it's just my opinion. Well that are the risks of open source software, you don't have any strong guarantees/SLA or whatsoever. But you can contribute to improve SASI. Any pull request is welcomed of course. The community will be more than happy to have contributors bq. After that I run the queries with '%' inside. As you can see multi-patterns are handled by AND: Absolutely not. Your examples just show how the index mode {{CONTAINS}} works. First query {{name like '%RevisionDiff%';}} means give me all names containing {{RevisionDiff}} substring 2nd query {{name like '%ItemImpl%';}} means give me all names containing {{ItemImpl}} substring 3rd query {{name like '%RevisionDiff%ItemImpl%';}} means give me all names containing {{RevisionDiff}} substring OR 'ItemImpl' substring Nowhere I see the *AND* semantic in your example > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: asdqwe, qweasd, qwea1. > Experiment 3 (NOTE: primary key is compound now and inserted data was > changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, > c1)); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe. > Experiment 4 (NOTE: search criteria is
[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501293#comment-15501293 ] Maxim Podkolzine edited comment on CASSANDRA-12573 at 9/18/16 5:03 PM: --- > SASI initially support multiple predicates, something like : WHERE ((col1=xxx > OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x > trunk That's good news. When do you plan to merge it? > Wrong, a bug is something that does not work as expected e.g that does not > work as documented. As a customer I have a slightly different view on this. My expectations are based on prior experience and common sense. I understand when certain features that are usual in other products are not implemented by design. This is obviously not the case. My current impression is that this feature is half-baked and not well tested. But it's just my opinion. I think I have a stronger argument that this is a bug. I have created a DB and filled it with some data from my disk: {code} CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; use excelsior; create table demo (id text primary key, name text, content text); CREATE CUSTOM INDEX name_index ON demo (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; {code} After that I run the queries with '%' inside. As you can see multi-patterns are handled by AND: {code} cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%'; id | name --+ 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (7 rows) cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%'; id | name --+--- 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (2 rows) cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%ItemImpl%'; id | name --+-- 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (1 rows) {code} was (Author: maximp): > SASI initially support multiple predicates, something like : WHERE ((col1=xxx > OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x > trunk That's good news. When do you plan to merge it? > Wrong, a bug is something that does not work as expected e.g that does not > work as documented. As a customer I have a slightly different view on this. My expectations are based on prior experience and common sense. I understand when certain features that are usual in other products are not implemented by design. This is obviously not the case. My current impression is that this feature is half-baked and not well tested. But it's just my opinion. I think I have a stronger argument that this is a bug. I have created a DB and filled it with some data from my disk: ``` CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; use excelsior; create table demo (id text primary key, name text, content text); CREATE CUSTOM INDEX name_index ON demo (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; ``` After that I run the queries with '%' inside. As you can see multi-patterns are handled by AND: ``` cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%'; id | name --+ 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (7 rows) cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%'; id
[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
[ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501293#comment-15501293 ] Maxim Podkolzine commented on CASSANDRA-12573: -- > SASI initially support multiple predicates, something like : WHERE ((col1=xxx > OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x > trunk That's good news. When do you plan to merge it? > Wrong, a bug is something that does not work as expected e.g that does not > work as documented. As a customer I have a slightly different view on this. My expectations are based on prior experience and common sense. I understand when certain features that are usual in other products are not implemented by design. This is obviously not the case. My current impression is that this feature is half-baked and not well tested. But it's just my opinion. I think I have a stronger argument that this is a bug. I have created a DB and filled it with some data from my disk: ``` CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; use excelsior; create table demo (id text primary key, name text, content text); CREATE CUSTOM INDEX name_index ON demo (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; ``` After that I run the queries with '%' inside. As you can see multi-patterns are handled by AND: ``` cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%'; id | name --+ 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (7 rows) cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%'; id | name --+--- 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (2 rows) cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%ItemImpl%'; id | name --+-- 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java (1 rows) ``` > SASI index. Incorrect results for '%foo%bar%'-like search pattern. > --- > > Key: CASSANDRA-12573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12573 > Project: Cassandra > Issue Type: Bug >Reporter: Mikhail Krupitskiy >Priority: Critical > Labels: sasi > > We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests > with "LIKE '%foo%bar%'" constraints on a column with SASI index. > Below are few experiments that show this behaviour. > Experiment 1: > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; > insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; > insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; > select c2 from kmv.kmv where c2 like '%w%a%'; > {noformat} > Expected result: qweasd, qwea1. > Actual result: no rows. > Experiment 2 (NOTE: definition of index is changed): > {noformat} > drop keyspace if exists kmv; > create keyspace if not exists kmv WITH REPLICATION = { 'class' : > 'SimpleStrategy', 'replication_factor':'1'} ; > use kmv; > CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); > CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'mode': 'CONTAINS', > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'analyzed': 'true' > }; > insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; > insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; > insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1')
[jira] [Created] (CASSANDRA-12662) OOM when using SASI index
Maxim Podkolzine created CASSANDRA-12662: Summary: OOM when using SASI index Key: CASSANDRA-12662 URL: https://issues.apache.org/jira/browse/CASSANDRA-12662 Project: Cassandra Issue Type: Bug Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes ~8Gb, of which ~4Gb is Java heap Reporter: Maxim Podkolzine Priority: Critical Fix For: 3.6 Attachments: memory-dump.png 2.8Gb of the heap is taken by the index data, pending for flush (see the screenshot). As a result the node fails with OOM. Questions: - Why can't Cassandra keep up with the inserted data and flush it? - What resources/configuration should be changed to improve the performance? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12655) Incremental repair & compaction hang on random nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500435#comment-15500435 ] Navjyot Nishant commented on CASSANDRA-12655: - Hello Wei, Thank for responding. Actually its an issue with compaction getting blocked, anticompaction is moving through without any issue. Let me explain in detail - 1. We run incremental repair one one node at a time. 2. When repair starts it shows completion progress and for large keyspace after showing 100% it take some times/couple of minutes to move forward with next keyspace. When we verified actually it wait for anticompaction to get completed on all the relevant replicas. The moment anticompaction gets completed on all replicas it move forward with next keyspace. 3. Then compaction starts followed by anticompaction which sometime get hang on random replicas, resulting that particular replica become unresponsive which impact the repair running on next keyspace/node hence the repair also become unresponsive. I am able to omit this blocking behavior if i disable autocompaction before starting the repair. But post repair when i enable anticompaction it gets blocked on random node and the only way to resolve it bounce the node, which doesn't seems practical. For now i am able to resolve this issue by not using -dcpar. So far i have been trying to use -dcpar to speedup the repair but the moment i have removed it it is not complaining and compaction is also going through. This spare us some time to plan for the upgrade early next year directly to 3.x. -dcpar is working fine on other non prod environment but it seems it has problem with one of the largest keyspace which has table of size 3-4GB? If you guys can relate the above issues & resolution that would be great. Thanks! > Incremental repair & compaction hang on random nodes > > > Key: CASSANDRA-12655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12655 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: CentOS Linux release 7.1.1503 (Core) > RAM - 64GB > HEAP - 16GB > Load on each node - ~5GB > Cassandra Version - 2.2.5 >Reporter: Navjyot Nishant >Priority: Blocker > > Hi We are setting up incremental repair on our 18 node cluster. Avg load on > each node is ~5GB. The repair run fine on couple of nodes and sudently get > stuck on random nodes. Upon checking the system.log of impacted node we dont > see much information. > Following are the lines we see in system.log and its there from the point > repair is not making progress - > {code} > INFO [CompactionExecutor:3490] 2016-09-16 11:14:44,236 > CompactionManager.java:1221 - Anticompacting > [BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'), > > BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')] > INFO [IndexSummaryManager:1] 2016-09-16 11:14:49,954 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > INFO [IndexSummaryManager:1] 2016-09-16 12:14:49,961 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > {code} > When we try to see pending compaction by executing {code}nodetool > compactionstats{code} it hangs as well and doesn't return anything. However > {code}nodetool tpstats{code} show active and pending compaction which never > come down and keep increasing. > {code} > Pool NameActive Pending Completed Blocked All > time blocked > MutationStage 0 0 221208 0 > 0 > ReadStage 0 01288839 0 > 0 > RequestResponseStage 0 0 104356 0 > 0 > ReadRepairStage 0 0 72 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > HintedHandoff 0 0 46 0 > 0 > MiscStage 0 0 0 0 > 0 > CompactionExecutor866 68124 0 > 0 > MemtableReclaimMemory 0 0166 0 > 0 > PendingRangeCalculator0 0 38 0 > 0 > GossipStage 0 0 242455 0 > 0 > MigrationStage0 0 0 0 > 0 > MemtablePostFlush 0 0 3682 0 >