[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502388#comment-15502388
 ] 

Stefania commented on CASSANDRA-9318:
-

Latest dtest 
[build|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-dtest/7/]
 completed without failures.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11218) Prioritize Secondary Index rebuild

2016-09-18 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502288#comment-15502288
 ] 

Jeff Jirsa edited comment on CASSANDRA-11218 at 9/19/16 5:04 AM:
-

I have a version of this patch I'll be submitting very soon, but while I wait 
for internal approvals, I'd like to describe the implementation so that those 
of you who care about this can provide feedback conceptually before I submit a 
patch for review.

I'm implementing this as a priority queue that uses a custom comparator 
implemented with three tiers:

* Operation type priority (to allow certain types - like index rebuild - to run 
at higher priorities, and others - scrub / cleanup / verify - to run at much 
lower priorities). This is defined as an int field in the enum in the 
OperationType, and can be overridden via system property. Lot of opportunity 
for bike shedding here in picking exact priorities - I've chosen (highest 
priority to lowest):

** Anticompaction / Index Summary Redistribution
** Index Build / View Build
** Key Cache Save / Row Cache Save / Counter Cache Save
** User Defined Compaction
** Compaction (including maximal/major compaction)
** Tombstone Compaction
** Scrub / Cleanup / Upgrade SSTables
** Verify

* Sub type priority (to allow compaction tasks within a type to have preference 
- to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set 
by the compaction strategies, and by default, I'm setting this as the bytes on 
disk of the source sstables - larger transactions (at the time the task was 
created) preferred over smaller transactions. 

* Timestamp priority, where tasks with the same type/subtype values are served 
FIFO.

The implementation here was pretty straight forward - we create a new interface 
to expose the three priority values, and then extend AbstractCompactionTask and 
de-anonymize the handful of anonymous runnables/wrapped runnables/callables to 
implement that interface so they can be sorted in the PriorityBlockingQueue. 

There may an opportunity to try to get clever to protect against starvation in 
under-resourced systems, such as increasing type priority over time as tasks 
age, but I'm leaving that as a potential optimization for the future - I'm not 
sure it's really needed, it makes reasoning about compaction harder, but maybe 
there exists a use case where it's necessary. 

Expecting to submit the patch early this week - if either of you (Sankalp / 
Marcus) finds this approach conflicts with your expectations, or if you want to 
volunteer to review, let me know.


was (Author: jjirsa):
I have a version of this patch I'll be submitting very soon, but while I wait 
for internal approvals, I'd like to describe the implementation so that those 
of you who care about this can provide feedback conceptually before I submit a 
patch for review.

I'm implementing this as a priority queue that uses a custom comparator 
implemented with three tiers:

* Operation type priority (to allow certain types - like index rebuild - to run 
at higher priorities, and others - scrub / cleanup / verify - to run at much 
lower priorities). This is defined as an int field in the enum in the 
OperationType, and can be overridden via system property. Lot of opportunity 
for bike shedding here in picking exact priorities - I've chosen (highest 
priority to lowest):

** Anticompaction
** Index Build / View Build
** Key Cache Save / Row Cache Save / Counter Cache Save
** User Defined Compaction
** Compaction (including maximal/major compaction)
** Tombstone Compaction
** Scrub / Cleanup / Upgrade SSTables
** Index Summary Redistribution
** Verify

* Sub type priority (to allow compaction tasks within a type to have preference 
- to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set 
by the compaction strategies, and by default, I'm setting this as the bytes on 
disk of the source sstables - larger transactions (at the time the task was 
created) preferred over smaller transactions. 

* Timestamp priority, where tasks with the same type/subtype values are served 
FIFO.

The implementation here was pretty straight forward - we create a new interface 
to expose the three priority values, and then extend AbstractCompactionTask and 
de-anonymize the handful of anonymous runnables/wrapped runnables/callables to 
implement that interface so they can be sorted in the PriorityBlockingQueue. 

There may an opportunity to try to get clever to protect against starvation in 
under-resourced systems, such as increasing type priority over time as tasks 
age, but I'm leaving that as a potential optimization for the future - I'm not 
sure it's really needed, it makes reasoning about compaction harder, but maybe 
there exists a use case where it's necessary. 

Expecting to submit the patch early this week - if either of you (Sankalp / 
Marcus) finds this approach 

[jira] [Commented] (CASSANDRA-11218) Prioritize Secondary Index rebuild

2016-09-18 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502288#comment-15502288
 ] 

Jeff Jirsa commented on CASSANDRA-11218:


I have a version of this patch I'll be submitting very soon, but while I wait 
for internal approvals, I'd like to describe the implementation so that those 
of you who care about this can provide feedback conceptually before I submit a 
patch for review.

I'm implementing this as a priority queue that uses a custom comparator 
implemented with three tiers:

* Operation type priority (to allow certain types - like index rebuild - to run 
at higher priorities, and others - scrub / cleanup / verify - to run at much 
lower priorities). This is defined as an int field in the enum in the 
OperationType, and can be overridden via system property. Lot of opportunity 
for bike shedding here in picking exact priorities - I've chosen (highest 
priority to lowest):

** Anticompaction
** Index Build / View Build
** Key Cache Save / Row Cache Save / Counter Cache Save
** User Defined Compaction
** Compaction (including maximal/major compaction)
** Tombstone Compaction
** Scrub / Cleanup / Upgrade SSTables
** Index Summary Redistribution
** Verify

* Sub type priority (to allow compaction tasks within a type to have preference 
- to enable behavior like CASSANDRA-6288 ). This is defined as a long, and set 
by the compaction strategies, and by default, I'm setting this as the bytes on 
disk of the source sstables - larger transactions (at the time the task was 
created) preferred over smaller transactions. 

* Timestamp priority, where tasks with the same type/subtype values are served 
FIFO.

The implementation here was pretty straight forward - we create a new interface 
to expose the three priority values, and then extend AbstractCompactionTask and 
de-anonymize the handful of anonymous runnables/wrapped runnables/callables to 
implement that interface so they can be sorted in the PriorityBlockingQueue. 

There may an opportunity to try to get clever to protect against starvation in 
under-resourced systems, such as increasing type priority over time as tasks 
age, but I'm leaving that as a potential optimization for the future - I'm not 
sure it's really needed, it makes reasoning about compaction harder, but maybe 
there exists a use case where it's necessary. 

Expecting to submit the patch early this week - if either of you (Sankalp / 
Marcus) finds this approach conflicts with your expectations, or if you want to 
volunteer to review, let me know.

> Prioritize Secondary Index rebuild
> --
>
> Key: CASSANDRA-11218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11218
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Jeff Jirsa
>Priority: Minor
>
> We have seen that secondary index rebuild get stuck behind other compaction 
> during a bootstrap and other operations. This causes things to not finish. We 
> should prioritize index rebuild via a separate thread pool or using a 
> priority queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502215#comment-15502215
 ] 

Stefania edited comment on CASSANDRA-9318 at 9/19/16 4:09 AM:
--

Thanks [~sbtourist], the latest commits and the entire patch LGTM.

The test failures are unrelated and we have tickets for all of them: 
CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more 
dtest build to cover the final commit and to hopefully shake off the 
CASSANDRA-12656 failures since these tests shouldn't even be running now.

I've squashed your entire patch into [one 
commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726]
 and fixed some formatting issues (mostly trailing spaces) 
[here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730]
 on this [branch|https://github.com/stef1927/cassandra/commits/9318].

If you could double check the formatting nits, I can squash them and commit 
once the final dtest build has also completed. 


was (Author: stefania):
Thanks [~sbtourist], the latest commits and the entire patch LGTM.

The test failures are all unrelated and we have tickets for all of them: 
CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more 
dtest build to cover the final commit and to hopefully shake off the 
CASSANDRA-12656 failures since these tests shouldn't even be running now.

I've squashed your entire patch into [one 
commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726]
 and fixed some formatting issues (mostly trailing spaces) 
[here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730]
 on this [branch|https://github.com/stef1927/cassandra/commits/9318].

If you could double check the formatting nits, I can squash them and commit 
once the final dtest build has also completed. 

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502215#comment-15502215
 ] 

Stefania commented on CASSANDRA-9318:
-

Thanks [~sbtourist], the latest commits and the entire patch LGTM.

The test failures are all unrelated and we have tickets for all of them: 
CASSANDRA-12664, CASSANDRA-12656 and CASSANDRA-12140. I've launched one more 
dtest build to cover the final commit and to hopefully shake off the 
CASSANDRA-12656 failures since these tests shouldn't even be running now.

I've squashed your entire patch into [one 
commit|https://github.com/stef1927/cassandra/commit/1632f2e9892624f611ac3629fb84a82594fec726]
 and fixed some formatting issues (mostly trailing spaces) 
[here|https://github.com/stef1927/cassandra/commit/e3346e5f5a49b2933e10a84405730]
 on this [branch|https://github.com/stef1927/cassandra/commits/9318].

If you could double check the formatting nits, I can squash them and commit 
once the final dtest build has also completed. 

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data

2016-09-18 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502203#comment-15502203
 ] 

Wei Deng commented on CASSANDRA-12659:
--

Did you preserve the SSTables (snapshot) when you were able to reproduce the 
problem? If yes, it will be useful to use "nodetool getendpoints" and "nodetool 
getsstables" to extract a number of SSTables that contains the partition in 
question and upload them here along with the schema (assuming it doesn't 
contain sensitive information). If you're no longer able to reproduce the 
problem, then there is no need to provide the SSTables.

Without a repro case, it will be hard for people to look into it. However, 
filing this JIRA right now is still valuable because if other people run into 
the same problem in a different occasion, they will have something to compare 
notes to determine if this is a real problem. If after a while nobody else runs 
into this issue and you are still not able to reproduce it, the JIRA could end 
up getting closed eventually.

> Query in reversed order brough back deleted data
> 
>
> Key: CASSANDRA-12659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12659
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.0.5, 6 nodes cluster
>Reporter: Tai Khuu Tan
>
> We have and issues with our Cassandra 3.0.5. After we deleted a large amount 
> of data in the multiple partition keys. Query those partition keys with 
> reversed order on a clustering key return the deleted data. I have checked 
> and there are no tombstones left. All of them are deleted. So I don't know 
> where or how can those deleted data still exist. Is there any other place 
> that Cassandra will read data when query in reverse order compare to normal 
> order ?
> the schema is very simple
> {noformat}
> CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 
> varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, 
> version, data1 , data2 , data3 , data4 ) ) with compact storage;
> {noformat}
> Query are doing reverse order on column timestamp
> Ex:
> {noformat}
> select * from data where uid="uid1" order by version DESC
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12649) Add BATCH metrics

2016-09-18 Thread Alwyn Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alwyn Davis updated CASSANDRA-12649:

Attachment: trunk-12649.txt

Added new metrics and basic test cases.

> Add BATCH metrics
> -
>
> Key: CASSANDRA-12649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12649
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Alwyn Davis
>Priority: Minor
> Fix For: 3.x
>
> Attachments: trunk-12649.txt
>
>
> To identify causes of load on a cluster, it would be useful to have some 
> additional metrics:
> * *Mutation size distribution:* I believe this would be relevant when 
> tracking the performance of unlogged batches.
> * *Logged / Unlogged Partitions per batch distribution:* This would also give 
> a count of batch types processed. Multiple distinct tables in batch would 
> just be considered as separate partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12649) Add BATCH metrics

2016-09-18 Thread Alwyn Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alwyn Davis updated CASSANDRA-12649:

Status: Patch Available  (was: Open)

> Add BATCH metrics
> -
>
> Key: CASSANDRA-12649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12649
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Alwyn Davis
>Priority: Minor
> Fix For: 3.x
>
> Attachments: trunk-12649.txt
>
>
> To identify causes of load on a cluster, it would be useful to have some 
> additional metrics:
> * *Mutation size distribution:* I believe this would be relevant when 
> tracking the performance of unlogged batches.
> * *Logged / Unlogged Partitions per batch distribution:* This would also give 
> a count of batch types processed. Multiple distinct tables in batch would 
> just be considered as separate partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12664) GCCompactionTest is flaky

2016-09-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502058#comment-15502058
 ] 

Stefania commented on CASSANDRA-12664:
--

cc [~blambov] and [~krummas].

> GCCompactionTest is flaky
> -
>
> Key: CASSANDRA-12664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Priority: Minor
> Fix For: 3.x
>
>
> {{GCCompactionTest}} was introduced by CASSANDRA-7019 and appears to be 
> flaky, see for example 
> [here|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.cql3/GcCompactionTest/testGcCompactionStatic/].
>  
> I think it's the same root cause as CASSANDRA-12282: the tables in the test 
> keyspace are dropped asynchronously after each test, and this might cause 
> additional flush operations for all dirty tables in the keyspace. See the 
> [callstack|https://issues.apache.org/jira/browse/CASSANDRA-12282?focusedCommentId=15399098=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399098]
>  in 12282. 
> A possible solution is to use KEYSPACE_PER_TEST, which is instead dropped 
> synchronously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12664) GCCompactionTest is flaky

2016-09-18 Thread Stefania (JIRA)
Stefania created CASSANDRA-12664:


 Summary: GCCompactionTest is flaky
 Key: CASSANDRA-12664
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12664
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
Reporter: Stefania
Priority: Minor
 Fix For: 3.x


{{GCCompactionTest}} was introduced by CASSANDRA-7019 and appears to be flaky, 
see for example 
[here|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.cql3/GcCompactionTest/testGcCompactionStatic/].
 

I think it's the same root cause as CASSANDRA-12282: the tables in the test 
keyspace are dropped asynchronously after each test, and this might cause 
additional flush operations for all dirty tables in the keyspace. See the 
[callstack|https://issues.apache.org/jira/browse/CASSANDRA-12282?focusedCommentId=15399098=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399098]
 in 12282. 

A possible solution is to use KEYSPACE_PER_TEST, which is instead dropped 
synchronously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data

2016-09-18 Thread Tai Khuu Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501977#comment-15501977
 ] 

Tai Khuu Tan commented on CASSANDRA-12659:
--

The weird thing is only reversed order query return deleted data, even with 
Consistency level set to ALL, normal query won't return deleted data, and there 
are no tombstone also, so I really don't know where the data come from. I tried 
to reproduce it but i couldn't. I will keep trying to see if I can do it.

> Query in reversed order brough back deleted data
> 
>
> Key: CASSANDRA-12659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12659
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.0.5, 6 nodes cluster
>Reporter: Tai Khuu Tan
>
> We have and issues with our Cassandra 3.0.5. After we deleted a large amount 
> of data in the multiple partition keys. Query those partition keys with 
> reversed order on a clustering key return the deleted data. I have checked 
> and there are no tombstones left. All of them are deleted. So I don't know 
> where or how can those deleted data still exist. Is there any other place 
> that Cassandra will read data when query in reverse order compare to normal 
> order ?
> the schema is very simple
> {noformat}
> CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 
> varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, 
> version, data1 , data2 , data3 , data4 ) ) with compact storage;
> {noformat}
> Query are doing reverse order on column timestamp
> Ex:
> {noformat}
> select * from data where uid="uid1" order by version DESC
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12471) Allow for some form of "unset" in CQL's COPY command.

2016-09-18 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-12471:
-
Assignee: (was: Stefania)

> Allow for some form of "unset" in CQL's COPY command.
> -
>
> Key: CASSANDRA-12471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12471
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL
>Reporter: Nate Sanders
>Priority: Minor
> Fix For: 2.2.0
>
>
> Currently, it looks like there's no way to get "unset" values via the COPY 
> command, say, for example with empty string fields.  Instead, these create 
> tombstones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12590) Segfault reading secondary index

2016-09-18 Thread Cameron Zemek (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501742#comment-15501742
 ] 

Cameron Zemek commented on CASSANDRA-12590:
---

[~beobal] I will see what I can do.

[~ifesdjeen] Not sure exactly what you mean by sstable flush threshold. Here 
related settings from cassandra.yaml:
{noformat}
memtable_allocation_type: offheap_objects
concurrent_writes: 2
key_cache_size_in_mb: 0
memtable_flush_writers: 1
concurrent_compactors: 1
concurrent_reads: 2
commitlog_total_space_in_mb: 1024
file_cache_size_in_mb: '1'
{noformat}


> Segfault reading secondary index
> 
>
> Key: CASSANDRA-12590
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12590
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Occurs on Cassandra 3.5 and 3.7
>Reporter: Cameron Zemek
>Assignee: Sam Tunnicliffe
>
> Getting segfaults when reading secondary index as follows:
> {code}
> J 9272 C2 
> org.apache.cassandra.dht.LocalPartitioner$LocalToken.compareTo(Lorg/apache/cassandra/dht/Token;)I
>  (53 bytes) @ 0x7fd7354749b7 [0x7fd735474840+0x177]
> J 5661 C2 org.apache.cassandra.db.DecoratedKey.compareTo(Ljava/lang/Object;)I 
> (9 bytes) @ 0x7fd7351b35b8 [0x7fd7351b3440+0x178]
> J 14205 C2 
> java.util.concurrent.ConcurrentSkipListMap.doGet(Ljava/lang/Object;)Ljava/lang/Object;
>  (142 bytes) @ 0x7fd736404dd8 [0x7fd736404cc0+0x118]
> J 17764 C2 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
>  (635 bytes) @ 0x7fd736e09638 [0x7fd736e08720+0xf18]
> J 17808 C2 
> org.apache.cassandra.index.internal.CassandraIndexSearcher.search(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;
>  (68 bytes) @ 0x7fd736e01a48 [0x7fd736e012a0+0x7a8]
> J 14217 C2 
> org.apache.cassandra.db.ReadCommand.executeLocally(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;
>  (219 bytes) @ 0x7fd736417c1c [0x7fd736416fa0+0xc7c]
> J 14585 C2 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()V 
> (337 bytes) @ 0x7fd736541e6c [0x7fd736541d60+0x10c]
> J 14584 C2 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()V 
> (48 bytes) @ 0x7fd7357957b4 [0x7fd735795760+0x54]
> J 9648% C2 org.apache.cassandra.concurrent.SEPWorker.run()V (253 bytes) @ 
> 0x7fd735938d8c [0x7fd7359356e0+0x36ac]
> {code}
> Which I have translated to the codepath:
> org.apache.cassandra.dht.LocalPartitioner (Line 139)
> org.apache.cassandra.db.DecoratedKey (Line 85)
> java.util.concurrent.ConcurrentSkipListMap (Line 794)
> org.apache.cassandra.db.SinglePartitionReadCommand (Line 498)
> org.apache.cassandra.index.internal.CassandraIndexSearcher (Line 60)
> org.apache.cassandra.db.ReadCommand (Line 367)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12663) Allow per DC segregation, grant user to create different indices per datacenter on tables

2016-09-18 Thread Bhuvan Rawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Rawal updated CASSANDRA-12663:
-
Summary: Allow per DC segregation, grant user to create different indices 
per datacenter on tables  (was: Allowing per DC segregation of schema, allowing 
user to create different indices per datacenter on tables)

> Allow per DC segregation, grant user to create different indices per 
> datacenter on tables
> -
>
> Key: CASSANDRA-12663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12663
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Bhuvan Rawal
>
> For analytics & auditing purposes it becomes essential to serve different 
> access patterns than that modeled from a partition key fetch perspective, 
> although a limited reads are needed by users but if enabled cluster wide it 
> will require index write for every row written on that table on every single 
> node on every DC even the one which may be serving read operations. A user 
> may not want to have indices built on Transactional DC on every write, that 
> computation and disk utilization may not be useful as the Analytics may 
> possibly be performed on other DC.
> It will be a plus to have analytics / auditing workload built inside 
> Cassandra itself using native secondary indices / SASI indices / Stratio by 
> creating indices for a specific datacenter and not having to ship off data to 
> other index stores like Elasticsearch through application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12663) Allowing per DC segregation of schema, allowing user to create different indices per datacenter on tables

2016-09-18 Thread Bhuvan Rawal (JIRA)
Bhuvan Rawal created CASSANDRA-12663:


 Summary: Allowing per DC segregation of schema, allowing user to 
create different indices per datacenter on tables
 Key: CASSANDRA-12663
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12663
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Bhuvan Rawal


For analytics & auditing purposes it becomes essential to serve different 
access patterns than that modeled from a partition key fetch perspective, 
although a limited reads are needed by users but if enabled cluster wide it 
will require index write for every row written on that table on every single 
node on every DC even the one which may be serving read operations. A user may 
not want to have indices built on Transactional DC on every write, that 
computation and disk utilization may not be useful as the Analytics may 
possibly be performed on other DC.

It will be a plus to have analytics / auditing workload built inside Cassandra 
itself using native secondary indices / SASI indices / Stratio by creating 
indices for a specific datacenter and not having to ship off data to other 
index stores like Elasticsearch through application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501472#comment-15501472
 ] 

DOAN DuyHai edited comment on CASSANDRA-12573 at 9/18/16 6:59 PM:
--

Ok it's my bad.  The root of the operation tree for the QueryPlanner is an 
{{AND}}

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60

The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : 
{{CONTAINS RevisionDiff}} &  {{CONTAINS ItemImpl}} and the *AND* logic does 
apply.

 The comment in the source code is pretty misleading.

Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results 
are also consistent

Only experiment 3 results are wrong:

{code:sql}
insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;

select c2 from kmv.kmv where c2 like '%w%a%';
{code}

Expected result: qweasd, qwea1.
Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.

 Let me reproduce it


was (Author: doanduyhai):
Ok it's my bad.  The root of the operation tree for the QueryPlanner is an 
{{AND}}

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60

The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : 
{{CONTAINS RevisionDiff}} &  {{CONTAINS ItemImpl}} and the **AND** logic does 
apply.

 The comment in the source code is pretty misleading.

Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results 
are also consistent

Only experiment 3 results are wrong:

```sql
insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;

select c2 from kmv.kmv where c2 like '%w%a%';

```

Expected result: qweasd, qwea1.
Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.

 Let me reproduce it

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501472#comment-15501472
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Ok it's my bad.  The root of the operation tree for the QueryPlanner is an 
{{AND}}

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60

The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : 
{{CONTAINS RevisionDiff}} &  {{CONTAINS ItemImpl}} and the **AND** logic does 
apply.

 The comment in the source code is pretty misleading.

Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results 
are also consistent

Only experiment 3 results are wrong:

```sql
insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;

select c2 from kmv.kmv where c2 like '%w%a%';

```

Expected result: qweasd, qwea1.
Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.

 Let me reproduce it

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY 

[jira] [Commented] (CASSANDRA-12662) OOM when using SASI index

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501465#comment-15501465
 ] 

Maxim Podkolzine commented on CASSANDRA-12662:
--

Got it, thanks a lot!
We don't use SSD right now, I need to check what the actual storage is.
I'll try to get as much CPU and RAM as possible and get back with the results.

> OOM when using SASI index
> -
>
> Key: CASSANDRA-12662
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12662
> Project: Cassandra
>  Issue Type: Bug
> Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes 
> ~8Gb, of which ~4Gb is Java heap
>Reporter: Maxim Podkolzine
>Priority: Critical
> Fix For: 3.6
>
> Attachments: memory-dump.png
>
>
> 2.8Gb of the heap is taken by the index data, pending for flush (see the 
> screenshot). As a result the node fails with OOM.
> Questions:
> - Why can't Cassandra keep up with the inserted data and flush it?
> - What resources/configuration should be changed to improve the performance?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501454#comment-15501454
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Let me reproduce your results with an unit test

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501443#comment-15501443
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

There are 7 rows that contain "RevisionDiff" and 2 rows that contain 
"ItemImpl". There are 9 rows that contain "RevisionDiff" OR "ItemImpl".
Here they are (only the name):
- RevisionDiffType.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItem.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemDTO.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- GetRevisionDiff.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemDTO.java (twice): it contains "RevisionDiff", hence it 
contains "RevisionDiff" OR "ItemImpl"
- RevisionDiffItemImpl.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- FastTreeItemImpl.java: it contains "ItemImpl", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemImpl.java: it contains "ItemImpl", hence it contains 
"RevisionDiff" OR "ItemImpl"

Of these 9 rows there is one row that contains both "RevisionDiff" AND 
"ItemImpl": "RevisionDiffItemImpl.java".

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 

[jira] [Commented] (CASSANDRA-12662) OOM when using SASI index

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501440#comment-15501440
 ] 

DOAN DuyHai commented on CASSANDRA-12662:
-

Default hardcoded value for memIndexTable is 1Gb: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L369-L372

bq. 2.8Gb of the heap is taken by the index data, pending for flush (see the 
screenshot)

When you have more than 1Gb of index data, SASI flushes the index by chunks of 
1Gb into temporary index files : 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L247-L250

Then it needs a 2nd pass to merge them into memory to write the final index 
file, see here:  
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java#L311-L326

Thus, it may take a while to write the final SASI index file. The speed of all 
this depends on many factors, mainly CPU and Disk I/O. 

What are you disk hardware specs ? SSD ? Spinning disk ? shared storage ?

bq. Why can't Cassandra keep up with the inserted data and flush it?

Write are CPU-intensive. Compactions are more disk I/O intensive

bq. What resources/configuration should be changed to improve the performance?

Right now, 4 cores CPU is below the official recommendation to run Cassandra in 
production, which is 8 cores CPU. Same for RAM, recommendation is 32Gb, see 
here: http://cassandra.apache.org/doc/latest/operating/hardware.html


> OOM when using SASI index
> -
>
> Key: CASSANDRA-12662
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12662
> Project: Cassandra
>  Issue Type: Bug
> Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes 
> ~8Gb, of which ~4Gb is Java heap
>Reporter: Maxim Podkolzine
>Priority: Critical
> Fix For: 3.6
>
> Attachments: memory-dump.png
>
>
> 2.8Gb of the heap is taken by the index data, pending for flush (see the 
> screenshot). As a result the node fails with OOM.
> Questions:
> - Why can't Cassandra keep up with the inserted data and flush it?
> - What resources/configuration should be changed to improve the performance?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501413#comment-15501413
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

bq. That's good news. When do you plan to merge it?
See this JIRA:  [CASSANDRA-10765] (second comment)

bq. As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.

 What are you talking about ? Customer of what ? Apache Cassandra is 
open-source.

bq. My current impression is that this feature is half-baked and not well 
tested. But it's just my opinion.

Well that are the risks of open source software, you don't have any strong 
guarantees/SLA or whatsoever. But you can contribute to improve SASI. Any pull 
request is welcomed of course. The community will be more than happy to have 
contributors

bq. After that I run the queries with '%' inside. As you can see multi-patterns 
are handled by AND:

Absolutely not. Your examples just show how the index mode {{CONTAINS}} works. 

First query {{name like '%RevisionDiff%';}} means give me all names containing 
{{RevisionDiff}} substring

2nd query {{name like '%ItemImpl%';}} means give me all names containing 
{{ItemImpl}} substring

3rd query {{name like '%RevisionDiff%ItemImpl%';}} means give me all names 
containing {{RevisionDiff}} substring OR 'ItemImpl' substring

Nowhere I see the *AND* semantic in your example





> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is 

[jira] [Comment Edited] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501293#comment-15501293
 ] 

Maxim Podkolzine edited comment on CASSANDRA-12573 at 9/18/16 5:03 PM:
---

> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
{code}
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
{code}

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
{code}
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id   | name
--+
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id   | name
--+---
 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(2 rows)
cqlsh:excelsior> select id, name from demo where name like 
'%RevisionDiff%ItemImpl%';

 id   | name
--+--
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(1 rows)
{code}


was (Author: maximp):
> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
```
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
```

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
```
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id   | name
--+
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501293#comment-15501293
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
```
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
```

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
```
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id   | name
--+
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id   | name
--+---
 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(2 rows)
cqlsh:excelsior> select id, name from demo where name like 
'%RevisionDiff%ItemImpl%';

 id   | name
--+--
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(1 rows)
```

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') 

[jira] [Created] (CASSANDRA-12662) OOM when using SASI index

2016-09-18 Thread Maxim Podkolzine (JIRA)
Maxim Podkolzine created CASSANDRA-12662:


 Summary: OOM when using SASI index
 Key: CASSANDRA-12662
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12662
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux, 4 CPU cores, 16Gb RAM, Cassandra process utilizes 
~8Gb, of which ~4Gb is Java heap
Reporter: Maxim Podkolzine
Priority: Critical
 Fix For: 3.6
 Attachments: memory-dump.png

2.8Gb of the heap is taken by the index data, pending for flush (see the 
screenshot). As a result the node fails with OOM.

Questions:
- Why can't Cassandra keep up with the inserted data and flush it?
- What resources/configuration should be changed to improve the performance?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12655) Incremental repair & compaction hang on random nodes

2016-09-18 Thread Navjyot Nishant (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500435#comment-15500435
 ] 

Navjyot Nishant commented on CASSANDRA-12655:
-

Hello Wei, Thank for responding. Actually its an issue with compaction getting 
blocked, anticompaction is moving through without any issue. 
Let me explain in detail -

1. We run incremental repair one one node at a time. 
2. When repair starts it shows completion progress and for large keyspace after 
showing 100% it take some times/couple of minutes to move forward with next 
keyspace. When we verified actually it wait for anticompaction to get completed 
on all the relevant replicas. The moment anticompaction gets completed on all 
replicas it move forward with next keyspace. 
3. Then compaction starts followed by anticompaction which sometime get hang on 
random replicas, resulting that particular replica become unresponsive which 
impact the repair running on next keyspace/node hence the repair also become 
unresponsive.
I am able to omit this blocking behavior if i disable autocompaction before 
starting the repair. But post repair when i enable anticompaction it gets 
blocked on random node and the only way to resolve it bounce the node, which 
doesn't seems practical.

For now i am able to resolve this issue by not using -dcpar. So far i have been 
trying to use -dcpar to speedup the repair but the moment i have removed it it 
is not complaining and compaction is also going through. This spare us some 
time to plan for the upgrade early next year directly to 3.x.

-dcpar is working fine on other non prod environment but it seems it has 
problem with one of the largest keyspace which has table of size 3-4GB?

If you guys can relate the above issues & resolution that would be great.

Thanks!

> Incremental repair & compaction hang on random nodes
> 
>
> Key: CASSANDRA-12655
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12655
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: CentOS Linux release 7.1.1503 (Core)
> RAM - 64GB
> HEAP - 16GB
> Load on each node - ~5GB
> Cassandra Version - 2.2.5
>Reporter: Navjyot Nishant
>Priority: Blocker
>
> Hi We are setting up incremental repair on our 18 node cluster. Avg load on 
> each node is ~5GB. The repair run fine on couple of nodes and sudently get 
> stuck on random nodes. Upon checking the system.log of impacted node we dont 
> see much information.
> Following are the lines we see in system.log and its there from the point 
> repair is not making progress -
> {code}
> INFO  [CompactionExecutor:3490] 2016-09-16 11:14:44,236 
> CompactionManager.java:1221 - Anticompacting 
> [BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30832-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/gccatlgsvcks/message_backup-cab0485008ed11e5bfed452cdd54652d/la-30811-big-Data.db')]
> INFO  [IndexSummaryManager:1] 2016-09-16 11:14:49,954 
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2016-09-16 12:14:49,961 
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> {code}
> When we try to see pending compaction by executing {code}nodetool 
> compactionstats{code} it hangs as well and doesn't return anything. However 
> {code}nodetool tpstats{code} show active and pending compaction which never 
> come down and keep increasing. 
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0 221208 0
>  0
> ReadStage 0 01288839 0
>  0
> RequestResponseStage  0 0 104356 0
>  0
> ReadRepairStage   0 0 72 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> HintedHandoff 0 0 46 0
>  0
> MiscStage 0 0  0 0
>  0
> CompactionExecutor866  68124 0
>  0
> MemtableReclaimMemory 0 0166 0
>  0
> PendingRangeCalculator0 0 38 0
>  0
> GossipStage   0 0 242455 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlush 0 0   3682 0
>