from:"Pavel Yaskevich \(JIRA\)"

[jira] [Commented] (CASSANDRA-14247) SASI tokenizer for simple delimiter based entries

2018-02-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374868#comment-16374868
 ] 

Pavel Yaskevich commented on CASSANDRA-14247:
-

LGTM, but I think [~mkjellman] should take a look as well.

> SASI tokenizer for simple delimiter based entries
> -
>
> Key: CASSANDRA-14247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: mck
>Assignee: mck
>Priority: Major
> Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
> ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = {
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterTokenizerAnalyzer', 
> 'delimiter': '░',
> 'case_sensitive': 'true', 
> 'mode': 'prefix', 
> 'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12674) [SASI] Confusing AND/OR semantics for StandardAnalyzer

2017-01-18 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829060#comment-15829060
 ] 

Pavel Yaskevich commented on CASSANDRA-12674:
-

[~ifesdjeen] Can you please take a look at this one?

> [SASI] Confusing AND/OR semantics for StandardAnalyzer 
> ---
>
> Key: CASSANDRA-12674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12674
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {code:sql}
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> use test;
> cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY 
> KEY((id), clustering));
> cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
> 'analyzed': 'true'};
> //1st example SAME PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 
> 'homeworker');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 
> 'hardworker');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
>  id | clustering | val
> ++
>   1 |  1 | homeworker
>   1 |  2 | hardworker
> (2 rows)
> //2nd example DIFFERENT PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 
> 'speedrun');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 
> 'longrun');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
>  id | clustering | val
> ++-
>  11 |  1 | longrun
> (1 rows)
> {code}
> In the 1st example, both rows belong to the same partition so SASI returns 
> both values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 
> 'home'}} so the result makes sense
> In the 2nd example, only one row is returned whereas we expect 2 rows because 
> {{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should 
> be returned too.
> So where is the problem ? Explanation:
> When there is only 1 predicate, the root operation type is an *AND*:
> {code:java|title=QueryPlan}
> private Operation analyze()
> {
> try
> {
> Operation.Builder and = new Operation.Builder(OperationType.AND, 
> controller);
> controller.getExpressions().forEach(and::add);
> return and.complete();
> }
>...
> }
> {code}
> During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for 
> the searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. 
> However, this piece of code just ruins the *OR* logic:
> {code:java|title=Operation}
> public Operation complete()
> {
> if (!expressions.isEmpty())
> {
> ListMultimap 
> analyzedExpressions = analyzeGroup(controller, op, expressions);
> RangeIterator.Builder range = 
> controller.getIndexes(op, analyzedExpressions.values());
>  ...
> }
> {code}
> As you can see, we blindly take all the *values* of the MultiMap (which 
> contains a single entry for the {{val}} column with 2 expressions) and pass 
> it to {{controller.getIndexes(...)}}
> {code:java|title=QueryController}
> public RangeIterator.Builder getIndexes(OperationType op, 
> Collection expressions)
> {
> if (resources.containsKey(expressions))
> throw new IllegalArgumentException("Can't process the same 
> expressions multiple times.");
> RangeIterator.Builder builder = op == OperationType.OR
> ? RangeUnionIterator. Token>builder()
> : 
> RangeIntersectionIterator.builder();
> ...
> }
> {code}
> And because the root operation has *AND* type, the 
> {{RangeIntersectionIterator}} will be used on both expressions {{long}} and 
> {{run}}.
> So when data belong to different partitions, we have the *AND* logic that 
> applies and eliminates _speedrun_
> When data belong to the same partition but different row, the 
> {{RangeIntersectionIterator}} returns a single partition and then the rows 
> are filtered further by {{operationTree.satisfiedBy}} and the results are 
> correct
> {code:java|title=QueryPlan}
> while (currentKeys.hasNext())
> {
> DecoratedKey key = currentKeys.next();
> if (!keyRange.right.isMinimum() && 
>

[jira] [Issue Comment Deleted] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-11-17 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11990:

Comment: was deleted

(was: I think we should keep this issue open for updated implementation to 
come...)

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-11-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675136#comment-15675136
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

I think we should keep this issue open for updated implementation to come...

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-11-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675134#comment-15675134
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

I think we should keep this issue open for updated implementation to come...

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-11-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675130#comment-15675130
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

Ok, sorry I've almost missed that one [~ifesdjeen], reverted 
7d857b46fb070548bf5e5f6ff81db588f08ec22a from both cassandra-3.X and trunk

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12832) SASI index corruption on too many overflow items

2016-11-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648569#comment-15648569
 ] 

Pavel Yaskevich commented on CASSANDRA-12832:
-

[~ifesdjeen] I don't think it's a good idea to log instead of throwing an 
exception in there, because throwing exception gives a clear indication that 
file is unusable where logging it would still make it look "ok" and loadable 
altough data is going to be missing...

> SASI index corruption on too many overflow items
> 
>
> Key: CASSANDRA-12832
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12832
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> When SASI index has too many overflow items, it currently writes a corrupted 
> index file:
> {code}
> java.lang.AssertionError: cannot have more than 8 overflow collisions per 
> leaf, but had: 15
> at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createOverflowEntry(AbstractTokenTreeBuilder.java:357)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createEntry(AbstractTokenTreeBuilder.java:346)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.DynamicTokenTreeBuilder$DynamicLeaf.serializeData(DynamicTokenTreeBuilder.java:180)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.serialize(AbstractTokenTreeBuilder.java:306)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder.write(AbstractTokenTreeBuilder.java:90)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableDataBlock.flushAndClear(OnDiskIndexBuilder.java:629)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.flush(OnDiskIndexBuilder.java:446)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.finalFlush(OnDiskIndexBuilder.java:451)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:296)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:258)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.finish(OnDiskIndexBuilder.java:241)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.PerSSTableIndexWriter$Index.lambda$scheduleSegmentFlush$0(PerSSTableIndexWriter.java:267)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.PerSSTableIndexWriter$Index.lambda$complete$1(PerSSTableIndexWriter.java:296)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_91]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> ERROR [MemtableFlushWriter:4] 2016-10-23 23:17:19,920 DataTracker.java:168 - 
> Can't open index file at , skipping.
> java.lang.IllegalArgumentException: position: -524200, limit: 12288
> at 
> org.apache.cassandra.index.sasi.utils.MappedBuffer.position(MappedBuffer.java:106)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndex.(OnDiskIndex.java:155) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.SSTableIndex.(SSTableIndex.java:62) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.DataTracker.getIndexes(DataTracker.java:150)
>  [main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.DataTracker.update(DataTracker.java:69) 
> [main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.update(ColumnIndex.java:147) 
> [main/:na]
> at 
> org.apache.cassandra.index.sasi.SASIIndex.handleNotification(SASIIndex.java:320)
>  [main/:na]
> at 
> org.apache.cassandra.db.lifecycle.Tracker.notifyAdded(Tracker.java:421) 
> [main/:na]
> at 
> org.apache.cassandra.db.lifecycle.Tracker.replaceFlushed(Tracker.java:356) 
> [main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.replaceFlushed(CompactionStrategyManager.java:317)
>  [main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1569)
>  [main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1197)
>  [main/:na]
> at 
>

[jira] [Commented] (CASSANDRA-12845) net.mintern.primitive library has GPL license

2016-10-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611056#comment-15611056
 ] 

Pavel Yaskevich commented on CASSANDRA-12845:
-

+1. Let me know if you me to commit it or just go ahead and do it otherwise.

> net.mintern.primitive library has GPL license
> -
>
> Key: CASSANDRA-12845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12845
> Project: Cassandra
>  Issue Type: Bug
>Reporter: David Pennell
>Assignee: Robert Stupp
> Fix For: 3.x
>
>
> https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b
>  add net.mintern.primitive libary.
> The license at https://github.com/mintern-java/primitive/tree/1.0 indicates 
> that it is licensed under GPL v2 with the classpath exception.
> The license file for primitive-1.0 included in the above commit claims that 
> it under Apache license.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12845) net.mintern.primitive library has GPL license

2016-10-26 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609719#comment-15609719
 ] 

Pavel Yaskevich commented on CASSANDRA-12845:
-

+1 on the code, but I can't see results of test all and dtest, both links 
return 404.

> net.mintern.primitive library has GPL license
> -
>
> Key: CASSANDRA-12845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12845
> Project: Cassandra
>  Issue Type: Bug
>Reporter: David Pennell
>Assignee: Robert Stupp
> Fix For: 3.x
>
>
> https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b
>  add net.mintern.primitive libary.
> The license at https://github.com/mintern-java/primitive/tree/1.0 indicates 
> that it is licensed under GPL v2 with the classpath exception.
> The license file for primitive-1.0 included in the above commit claims that 
> it under Apache license.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583373#comment-15583373
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


I'm planning to take a closer look at the code etc. soon, so if I see something 
or have any ideas I'll let you know!

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583328#comment-15583328
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


[~mkjellman] Maybe "largeuuid1"? Looks like rows there were about ~300KB too, 
which is reasonable.

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583289#comment-15583289
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


bq. I'm actually only am using the key cache in the current implementation

I wanted to mention that purely from looking up key in the key cache 
perspective, I've assumed that index is only going to have key offsets in it, 
so we are on the same page. 

[~barnie] Is there any way you can run this through automated perf stress test? 
Since the size of the tree attached to the key is bigger than it was 
originally, I'm curious what is performance difference in conditions where rows 
are just barely big enough to be indexed and there are a lot of keys.

[~mkjellman] I understand that the test you are running is designed to check 
what is the performance like relative to the Birch tree itself, but is there 
there any chance you can disable key cache and generate some more keys (maybe 
~100k?) to see how changes to the column index are affecting read path top-down?

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-10-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583210#comment-15583210
 ] 

Pavel Yaskevich commented on CASSANDRA-9754:


[~mkjellman] This looks great! Can you please post information regarding 
SSTables sizes and their estimated key counts as well? AFAIR there exists 
another problem related to how indexes are currently stored - if key is not in 
the key cache there is no way to jump to it directly in the index file, index 
reader has to scan through index segment to find requested key, so I'm 
wondering what happens in the situation when there are many keys which are 
small-to-medium sized e.g. 64-128 MB in each given SSTable (let's say SSTable 
size is set to 1G or 2G) and stress readers are trying to read random keys, 
what would be the difference between current index read performance vs. index + 
birch tree?...

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: gc_collection_times_with_birch.png, 
> gc_collection_times_without_birch.png, gc_counts_with_birch.png, 
> gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-09-05 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11990:

Fix Version/s: 3.10

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-09-05 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11990:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks, [~ifesdjeen]!

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-09-01 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457072#comment-15457072
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

LGTM, [~ifesdjeen]! Can you please rebase it with latest trunk so I can commit 
it?

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-08-21 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429872#comment-15429872
 ] 

Pavel Yaskevich commented on CASSANDRA-11067:
-

Thanks!

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>  Labels: client-impacting, sasi
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-08-20 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429586#comment-15429586
 ] 

Pavel Yaskevich commented on CASSANDRA-11067:
-

Thanks for noticing that, [~dbrosius] and sorry for late response, I just got 
to this!.. [~ifesdjeen] Can you please include this as part of CASSANDRA-11990?

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>  Labels: client-impacting, sasi
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-19 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427802#comment-15427802
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

Sounds good! Thanks for helping out with SASI! :)

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-18 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12374:

   Resolution: Fixed
Fix Version/s: 3.10
   Status: Resolved  (was: Patch Available)

Committed.

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Fix For: 3.10
>
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-18 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427267#comment-15427267
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

Test looks good now, I'm going to wait for CI to finish and commit everything, 
thanks, [~ifesdjeen]!

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-18 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427265#comment-15427265
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

bq.  currently unqueriable indexes / are skipped.

Yes, that was intentional, because there is (and probably still isn't) a good 
way to propagate exceptions in a meaningful way, so we've opted out to writing 
to the log and returning 0 results instead.

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-18 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427092#comment-15427092
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

Is there any way to get rid of the Thread.sleep there? New test consistently 
fails on my machine now...

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12378) Creating SASI index on clustering column in presence of static column breaks writes

2016-08-17 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12378:

   Resolution: Fixed
Fix Version/s: 3.10
   Status: Resolved  (was: Patch Available)

Committed.

> Creating SASI index on clustering column in presence of static column breaks 
> writes
> ---
>
> Key: CASSANDRA-12378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12378
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Critical
> Fix For: 3.10
>
>
> Steps to reproduce:
> {code}
> String simpleTable = "simple_table";
> QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS 
> %s.%s (pk int, ck1 int, ck2 int, s1 int static, reg1 int, PRIMARY KEY (pk, 
> ck1));", KS_NAME, simpleTable));
> QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX ON 
> %s.%s (ck1) USING 'org.apache.cassandra.index.sasi.SASIIndex';", KS_NAME, 
> simpleTable));
> QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (pk, ck1, 
> ck2, s1, reg1) VALUES (1,1,1,1,1);", KS_NAME, simpleTable));
> {code}
> {code}
> ERROR [MutationStage-2] 2016-08-04 09:59:08,054 StorageProxy.java:1351 - 
> Failed to apply mutation locally : {}
> java.lang.RuntimeException: 0 for ks: test, table: sasi
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1371) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:555) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:425) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1345)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2520)
>  [main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.getValueOf(ColumnIndex.java:235)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.index(ColumnIndex.java:104) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.SASIIndex$1.insertRow(SASIIndex.java:254) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onInserted(SecondaryIndexManager.java:808)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:335)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
>  ~[main/:na]
> at org.apache.cassandra.db.Memtable.put(Memtable.java:251) ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1358) 
> ~[main/:na]
> ... 12 common frames omitted
> {code}
> I would say this issue is critical, as if it occurs, the node will crash on 
> commitlog replay, too (if it was restarted for unrelated reason). 
> However, the fix is relatively simple: check for static clustering in 
> {{ColumnIndex}}. 
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12223) SASI Indexes querying incorrectly return 0 rows

2016-08-17 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12223:

   Resolution: Fixed
Fix Version/s: (was: 3.7)
   3.10
   Status: Resolved  (was: Patch Available)

Committed. Thanks, [~ifesdjeen]!

> SASI Indexes querying incorrectly return 0 rows
> ---
>
> Key: CASSANDRA-12223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12223
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Windows, DataStax Distribution
>Reporter: Qiu Zhida
>Assignee: Alex Petrov
> Fix For: 3.10
>
>
> I just started working with the SASI index on Cassandra 3.7.0 and I 
> encountered a problem which as I suspected was a bug. I had hardly tracked 
> down the situation in which the bug showed up, here is what I found:
> When querying with a SASI index, *it may incorrectly return 0 rows*, and 
> changing a little conditions, it works again, like the following CQL code:
> {code:title=CQL|borderStyle=solid}
> CREATE TABLE IF NOT EXISTS roles (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b DESC);
> 
> insert into roles (name,a,b) values ('Joe',1,1);
> insert into roles (name,a,b) values ('Joe',2,2);
> insert into roles (name,a,b) values ('Joe',3,3);
> insert into roles (name,a,b) values ('Joe',4,4);
> CREATE TABLE IF NOT EXISTS roles2 (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b ASC);
> 
> insert into roles2 (name,a,b) values ('Joe',1,1);
> insert into roles2 (name,a,b) values ('Joe',2,2);
> insert into roles2 (name,a,b) values ('Joe',3,3);
> insert into roles2 (name,a,b) values ('Joe',4,4);
> CREATE CUSTOM INDEX ON roles (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> CREATE CUSTOM INDEX ON roles2 (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> {code}
> Noticing that I only change table *roles2* from table *roles*'s '*CLUSTERING 
> ORDER BY (b DESC)*' into '*CLUSTERING ORDER BY (b ASC)*'.
> When querying with statement +select * from roles2 where b<3+, the rusult is 
> two rows:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
>   Joe | 1 | 1
>   Joe | 2 | 2
> (2 rows)
> {code}
> However, if querying with +select * from roles where b<3+, it returned no 
> rows at all:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
> (0 rows)
> {code}
> This is not the only situation where the bug would show up, one time I 
> created a SASI index with specific name like 'end_idx' on column 'end', the 
> bug showed up, when I didn't specify the index name, it gone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-17 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425255#comment-15425255
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

Looks like the tests for rebuild are still broken with the latest changes, but 
the changes look good, let me know when you fix the test, so I can commit this.

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-08-16 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423297#comment-15423297
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

[~ifesdjeen] Sure, I will try to get to that asap.

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-08-09 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413184#comment-15413184
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

Sounds good, thanks!

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12223) SASI Indexes querying incorrectly return 0 rows

2016-08-09 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413142#comment-15413142
 ] 

Pavel Yaskevich commented on CASSANDRA-12223:
-

Sure, I'm always happy to help.

> SASI Indexes querying incorrectly return 0 rows
> ---
>
> Key: CASSANDRA-12223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12223
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Windows, DataStax Distribution
>Reporter: Qiu Zhida
>Assignee: Alex Petrov
> Fix For: 3.7
>
>
> I just started working with the SASI index on Cassandra 3.7.0 and I 
> encountered a problem which as I suspected was a bug. I had hardly tracked 
> down the situation in which the bug showed up, here is what I found:
> When querying with a SASI index, *it may incorrectly return 0 rows*, and 
> changing a little conditions, it works again, like the following CQL code:
> {code:title=CQL|borderStyle=solid}
> CREATE TABLE IF NOT EXISTS roles (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b DESC);
> 
> insert into roles (name,a,b) values ('Joe',1,1);
> insert into roles (name,a,b) values ('Joe',2,2);
> insert into roles (name,a,b) values ('Joe',3,3);
> insert into roles (name,a,b) values ('Joe',4,4);
> CREATE TABLE IF NOT EXISTS roles2 (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b ASC);
> 
> insert into roles2 (name,a,b) values ('Joe',1,1);
> insert into roles2 (name,a,b) values ('Joe',2,2);
> insert into roles2 (name,a,b) values ('Joe',3,3);
> insert into roles2 (name,a,b) values ('Joe',4,4);
> CREATE CUSTOM INDEX ON roles (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> CREATE CUSTOM INDEX ON roles2 (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> {code}
> Noticing that I only change table *roles2* from table *roles*'s '*CLUSTERING 
> ORDER BY (b DESC)*' into '*CLUSTERING ORDER BY (b ASC)*'.
> When querying with statement +select * from roles2 where b<3+, the rusult is 
> two rows:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
>   Joe | 1 | 1
>   Joe | 2 | 2
> (2 rows)
> {code}
> However, if querying with +select * from roles where b<3+, it returned no 
> rows at all:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
> (0 rows)
> {code}
> This is not the only situation where the bug would show up, one time I 
> created a SASI index with specific name like 'end_idx' on column 'end', the 
> bug showed up, when I didn't specify the index name, it gone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12223) SASI Indexes querying incorrectly return 0 rows

2016-08-09 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12223:

Reviewer: Pavel Yaskevich

> SASI Indexes querying incorrectly return 0 rows
> ---
>
> Key: CASSANDRA-12223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12223
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Windows, DataStax Distribution
>Reporter: Qiu Zhida
>Assignee: Alex Petrov
> Fix For: 3.7
>
>
> I just started working with the SASI index on Cassandra 3.7.0 and I 
> encountered a problem which as I suspected was a bug. I had hardly tracked 
> down the situation in which the bug showed up, here is what I found:
> When querying with a SASI index, *it may incorrectly return 0 rows*, and 
> changing a little conditions, it works again, like the following CQL code:
> {code:title=CQL|borderStyle=solid}
> CREATE TABLE IF NOT EXISTS roles (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b DESC);
> 
> insert into roles (name,a,b) values ('Joe',1,1);
> insert into roles (name,a,b) values ('Joe',2,2);
> insert into roles (name,a,b) values ('Joe',3,3);
> insert into roles (name,a,b) values ('Joe',4,4);
> CREATE TABLE IF NOT EXISTS roles2 (
> name text,
> a int,
> b int,
> PRIMARY KEY ((name, a), b)
> ) WITH CLUSTERING ORDER BY (b ASC);
> 
> insert into roles2 (name,a,b) values ('Joe',1,1);
> insert into roles2 (name,a,b) values ('Joe',2,2);
> insert into roles2 (name,a,b) values ('Joe',3,3);
> insert into roles2 (name,a,b) values ('Joe',4,4);
> CREATE CUSTOM INDEX ON roles (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> CREATE CUSTOM INDEX ON roles2 (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
> WITH OPTIONS = { 'mode': 'SPARSE' };
> {code}
> Noticing that I only change table *roles2* from table *roles*'s '*CLUSTERING 
> ORDER BY (b DESC)*' into '*CLUSTERING ORDER BY (b ASC)*'.
> When querying with statement +select * from roles2 where b<3+, the rusult is 
> two rows:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
>   Joe | 1 | 1
>   Joe | 2 | 2
> (2 rows)
> {code}
> However, if querying with +select * from roles where b<3+, it returned no 
> rows at all:
> {code:title=CQL|borderStyle=solid}
>  name | a | b
> --+---+---
> (0 rows)
> {code}
> This is not the only situation where the bug would show up, one time I 
> created a SASI index with specific name like 'end_idx' on column 'end', the 
> bug showed up, when I didn't specify the index name, it gone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-08-09 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413135#comment-15413135
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

bq. Sure, this was already done: CASSANDRA-12374 and CASSANDRA-12378.

The branch you have in the previous comment still have all of the change from 
dropping/rebuilding of the indexes.

bq. As far as I understood, SASI relies heavily on the fact that the tokens are 
fixed size...

Yes, and that's exactly my point, let's keep it that way - relying that index 
has *fixed-size long tokens*. 
_If_ we were to keep all of the changes done to partitioner interface and 
TokenTreeSerializationHelper it would
mean maintenance cost for no long term benefit, because once we go full 
featured variable size tokens all of the
changes done to partitioner interface are going to be no longer required (will 
have to be removed), because
index shouldn't care (at that point) about how exactly are tokens serialized 
from it's point of view it's just going
to be comparable blob of bytes and we can use existing token serialization and 
partitioner logic to deal with that. 

bq. We can improve the situation with current abstraction by relying on the 
fact that Partitioner is a singleton in DatabaseDescriptor so higher level 
abstractions will never even see it.

We shouldn't make such assumptions, because everything is already polluted by 
that as is, better option would be to get it from
the keyspace which indexes are attached to and keep that info somewhere on the 
top level e.g. SASIndex plus push down specialized 
comparator implementation like what we do with KeyFetcher right now...

bq. Rolling back everything will be a large chunk of work both now in order to 
undo it (in combination with all
test changes and original PR changes) and re-introducing it shortly when we do 
RP (or any other partitioner)
support. 

Now imagine how much more work would that be to undo that in the future? I'm 
not comfortable committing code
just because it's hard to get out from the patch. RP is as good as dead 
currently and there no other viable partitioners,
which are going to benefit from the changes, so it's much easier to stick with 
long tokens and get to variable-size in one hop instead of three.

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-08-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412775#comment-15412775
 ] 

Pavel Yaskevich commented on CASSANDRA-11990:
-

I looked at the code again and I think we might be better off keeping only 
Murmur3Partitioner (tokens as longs) for now instead of changing so much just 
to support RP...
That would also make update path much easier for us - a. added clustering 
positions, b. extend tokens sizes to variable size. instead of having 
intermediate of multi-size fixed tokens...

*Code remarks:*

- 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-2143dfad950ab134c0a25bd903c2875aR143
   what is the point of logger.info it it's still wrapped into 
logger.isDebugEnabled() ?
- Can you move drop-data and rebuild related changes to the separate branch so 
we can keep changes to token tree and bug fixes separate since they are 
separate tickets?
 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-afe204f22033543e3ae0185240523a9bR113
 
   I'm not sure what is this `// ignore` about?...
- SasiToken -> SASIToken since SASI is an abbreviation already.
- Back to the point about not supporting PR; the way 
TokenTreeSerializationHelper is currently done is not optimal since it has to 
be explicitly carried around,
   which means that it's most likely not a proper abstraction to have, e.g. 
higher levels like QueryPlan/Operation/Expression/ColumnIndex should never care 
about any details
   about TokenTree, especially about how it's tokens are serialized, we should 
either use IPartitioner interface or nothing at all.
- Upgrade test is missing for TokenTree.

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11990) Address rows rather than partitions in SASI

2016-08-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412775#comment-15412775
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11990 at 8/9/16 1:12 AM:
-

I looked at the code again and I think we might be better off keeping only 
Murmur3Partitioner (tokens as longs) for now instead of changing so much just 
to support RP...
That would also make update path much easier for us - a. added clustering 
positions, b. extend tokens sizes to variable size. instead of having 
intermediate of multi-size fixed tokens...

*Code remarks:*

- 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-2143dfad950ab134c0a25bd903c2875aR143
   what is the point of logger.info it it's still wrapped into 
logger.isDebugEnabled() ?
- Can you move drop-data and rebuild related changes to the separate branch so 
we can keep changes to token tree and bug fixes separate since they are 
separate tickets?
- 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-afe204f22033543e3ae0185240523a9bR113
 
   I'm not sure what is this `// ignore` about?...
- SasiToken -> SASIToken since SASI is an abbreviation already.
- Back to the point about not supporting PR; the way 
TokenTreeSerializationHelper is currently done is not optimal since it has to 
be explicitly carried around,
   which means that it's most likely not a proper abstraction to have, e.g. 
higher levels like QueryPlan/Operation/Expression/ColumnIndex should never care 
about any details
   about TokenTree, especially about how it's tokens are serialized, we should 
either use IPartitioner interface or nothing at all.
- Upgrade test is missing for TokenTree.


was (Author: xedin):
I looked at the code again and I think we might be better off keeping only 
Murmur3Partitioner (tokens as longs) for now instead of changing so much just 
to support RP...
That would also make update path much easier for us - a. added clustering 
positions, b. extend tokens sizes to variable size. instead of having 
intermediate of multi-size fixed tokens...

*Code remarks:*

- 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-2143dfad950ab134c0a25bd903c2875aR143
   what is the point of logger.info it it's still wrapped into 
logger.isDebugEnabled() ?
- Can you move drop-data and rebuild related changes to the separate branch so 
we can keep changes to token tree and bug fixes separate since they are 
separate tickets?
 
https://github.com/apache/cassandra/compare/trunk...ifesdjeen:11990-trunk#diff-afe204f22033543e3ae0185240523a9bR113
 
   I'm not sure what is this `// ignore` about?...
- SasiToken -> SASIToken since SASI is an abbreviation already.
- Back to the point about not supporting PR; the way 
TokenTreeSerializationHelper is currently done is not optimal since it has to 
be explicitly carried around,
   which means that it's most likely not a proper abstraction to have, e.g. 
higher levels like QueryPlan/Operation/Expression/ColumnIndex should never care 
about any details
   about TokenTree, especially about how it's tokens are serialized, we should 
either use IPartitioner interface or nothing at all.
- Upgrade test is missing for TokenTree.

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Alex Petrov
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12378) Creating SASI index on clustering column in presence of static column breaks writes

2016-08-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407446#comment-15407446
 ] 

Pavel Yaskevich commented on CASSANDRA-12378:
-

Sounds great, [~ifesdjeen]!

> Creating SASI index on clustering column in presence of static column breaks 
> writes
> ---
>
> Key: CASSANDRA-12378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12378
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alex Petrov
>Priority: Critical
>
> Steps to reproduce:
> {code}
> String simpleTable = "simple_table";
> QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS 
> %s.%s (pk int, ck1 int, ck2 int, s1 int static, reg1 int, PRIMARY KEY (pk, 
> ck1));", KS_NAME, simpleTable));
> QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX ON 
> %s.%s (ck1) USING 'org.apache.cassandra.index.sasi.SASIIndex';", KS_NAME, 
> simpleTable));
> QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (pk, ck1, 
> ck2, s1, reg1) VALUES (1,1,1,1,1);", KS_NAME, simpleTable));
> {code}
> {code}
> ERROR [MutationStage-2] 2016-08-04 09:59:08,054 StorageProxy.java:1351 - 
> Failed to apply mutation locally : {}
> java.lang.RuntimeException: 0 for ks: test, table: sasi
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1371) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:555) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:425) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1345)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2520)
>  [main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.getValueOf(ColumnIndex.java:235)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.index(ColumnIndex.java:104) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.SASIIndex$1.insertRow(SASIIndex.java:254) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onInserted(SecondaryIndexManager.java:808)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:335)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
>  ~[main/:na]
> at org.apache.cassandra.db.Memtable.put(Memtable.java:251) ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1358) 
> ~[main/:na]
> ... 12 common frames omitted
> {code}
> I would say this issue is critical, as if it occurs, the node will crash on 
> commitlog replay, too (if it was restarted for unrelated reason). 
> However, the fix is relatively simple: check for static clustering in 
> {{ColumnIndex}}. 
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12378) Creating SASI index on clustering column in presence of static column breaks writes

2016-08-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407415#comment-15407415
 ] 

Pavel Yaskevich commented on CASSANDRA-12378:
-

I will be more than happy to review this for you or anybody who would like to 
tackle it. 

> Creating SASI index on clustering column in presence of static column breaks 
> writes
> ---
>
> Key: CASSANDRA-12378
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12378
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alex Petrov
>Priority: Critical
>
> Steps to reproduce:
> {code}
> String simpleTable = "simple_table";
> QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS 
> %s.%s (pk int, ck1 int, ck2 int, s1 int static, reg1 int, PRIMARY KEY (pk, 
> ck1));", KS_NAME, simpleTable));
> QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX ON 
> %s.%s (ck1) USING 'org.apache.cassandra.index.sasi.SASIIndex';", KS_NAME, 
> simpleTable));
> QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (pk, ck1, 
> ck2, s1, reg1) VALUES (1,1,1,1,1);", KS_NAME, simpleTable));
> {code}
> {code}
> ERROR [MutationStage-2] 2016-08-04 09:59:08,054 StorageProxy.java:1351 - 
> Failed to apply mutation locally : {}
> java.lang.RuntimeException: 0 for ks: test, table: sasi
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1371) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:555) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:425) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[main/:na]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1345)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2520)
>  [main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.getValueOf(ColumnIndex.java:235)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.conf.ColumnIndex.index(ColumnIndex.java:104) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.SASIIndex$1.insertRow(SASIIndex.java:254) 
> ~[main/:na]
> at 
> org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onInserted(SecondaryIndexManager.java:808)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:335)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
>  ~[main/:na]
> at org.apache.cassandra.db.Memtable.put(Memtable.java:251) ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1358) 
> ~[main/:na]
> ... 12 common frames omitted
> {code}
> I would say this issue is critical, as if it occurs, the node will crash on 
> commitlog replay, too (if it was restarted for unrelated reason). 
> However, the fix is relatively simple: check for static clustering in 
> {{ColumnIndex}}. 
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12374) Can't rebuild SASI index

2016-08-03 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406928#comment-15406928
 ] 

Pavel Yaskevich commented on CASSANDRA-12374:
-

This is definitely a porting bug, I'll be happy to review your changes, 
[~ifesdjeen].

> Can't rebuild SASI index
> 
>
> Key: CASSANDRA-12374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>
> There's been no real requirement for that so far. 
> As [~beobal] has pointed out, it's not a big issue, since that only could be 
> needed when index files are lost, data corruption on disk (hardware issue) 
> has occurred or there was a bug that'd require an index rebuild.
> During {{rebuild_index}} task, indexes are only "marked" as removed with 
> {{SecondaryIndexManager::markIndexRemoved}} and then {{buildIndexesBlocking}} 
> is called. However, since SASI keeps track of SSTables for the index, it's 
> going to filter them out with {{.filter((sstable) -> 
> !sasi.index.hasSSTable(sstable))}} in {{SASIIndexBuildingSupport}}.
> If I understand the logic correctly, we have to "invalidate" (drop data) 
> right before we re-index them. This is also a blocker for [CASSANDRA-11990] 
> since without it we can't have an upgrade path.
> I have a patch ready in branch, but since it's a bug, it's better to have it 
> released earlier and for all branches affected.
> cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9772) Bound the number of concurrent range requests

2016-07-22 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389870#comment-15389870
 ] 

Pavel Yaskevich commented on CASSANDRA-9772:


More concurrency is actually beneficial for SASI queries, and reducing it is 
also not a problem :)

> Bound the number of concurrent range requests
> -
>
> Key: CASSANDRA-9772
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9772
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
> Fix For: 3.x
>
>
> After CASSANDRA-1337, we will execute requests for many token ranges 
> concurrently based on our estimate of how many ranges will be required to 
> meet the requested LIMIT.  For queries with a lot of results this is 
> generally fine, because it will only take a few ranges to satisfy the limit.  
> However, for queries with very few results, this may result in the 
> coordinator concurrently requesting all token ranges.  On large vnode 
> clusters, this will be particularly problematic.
> Placing a simple bound on the number of concurrent requests is a good first 
> step.  Long-term, we should look into creating a new range command that 
> supports requesting multiple ranges.  This would eliminate the overhead of 
> serializing and handling hundreds of separate commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12149) NullPointerException on SELECT with SASI index

2016-07-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368137#comment-15368137
 ] 

Pavel Yaskevich commented on CASSANDRA-12149:
-

[~doanduyhai] are you still planing to look at this or do you want me to?

> NullPointerException on SELECT with SASI index
> --
>
> Key: CASSANDRA-12149
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12149
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Andrey Konstantinov
> Attachments: CASSANDRA-12149.txt
>
>
> If I execute the sequence of queries (see the attached file), Cassandra 
> aborts a connection reporting NPE on server side. SELECT query without token 
> range filter works, but does not work when token range filter is specified. 
> My intent was to issue multiple SELECT queries targeting the same single 
> partition, filtered by a column indexed by SASI, partitioning results by 
> different token ranges.
> Output from cqlsh on SELECT is the following:
> cqlsh> SELECT namespace, entity, timestamp, feature1, feature2 FROM 
> mykeyspace.myrecordtable WHERE namespace = 'ns2' AND entity = 'entity2' AND 
> feature1 > 11 AND feature1 < 31  AND token(namespace, entity) <= 
> 9223372036854775807;
> ServerError:  message="java.lang.NullPointerException">



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-07-04 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12073:

   Resolution: Fixed
Fix Version/s: (was: 3.x)
   3.9
   Status: Resolved  (was: Patch Available)

Committed. Thanks, [~doanduyhai]!

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.9
>
> Attachments: patch_PREFIX_search_with_CONTAINS_mode.txt, 
> patch_PREFIX_search_with_CONTAINS_mode_V2.txt
>
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the term match but also returns *false* when it's a *partial term* !
> {code:java}
> public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
> {
> if (!hasUpper())
> return true;
> if (nonMatchingPartial(term))
> return false;
> int cmp = term.compareTo(validator, upper.value, false);
>

[jira] [Commented] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352073#comment-15352073
 ] 

Pavel Yaskevich commented on CASSANDRA-12073:
-

[~doanduyhai] The changes look good to me, because in case of prefix queries we 
should "step over" partial terms (because they are essentially invisible for 
the expression in PREFIX mode) until upper bound tells us to stop instead of 
trying to stop right when partial has been encountered. 

Couple of comments regarding code: since partialTermAndPrefixOperation is only 
used at one place I would rather have it as inline if with the comment on top 
e.g. 
{noformat}
// we need to step over all of the partial terms, in PREFIX mode,
// encountered by the query until upper-bound tells us to stop
if (operation == Op.PREFIX && term.isPartial()) 
   continue;

// haven't reached the start of the query range yet, let's 
// keep skip the current term until lower bound is satisfied
if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
   continue;
{noformat}

Also I think it essential to have a test-case in OnDiskIndexTest since it gives 
you more control over index blocks and internals than SASIndexTest.

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.x
>
> Attachments: patch_PREFIX_search_with_CONTAINS_mode.txt
>
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
>

[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-26 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks [~doanduyhai]! I've removed already committed part from the 
patch and included only change for {{StemmingFilters}} and tests.

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt, patch_V2.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-24 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348780#comment-15348780
 ] 

Pavel Yaskevich commented on CASSANDRA-12078:
-

I'm sorry guys, i had it [run through 
CI|http://cassci.datastax.com/view/Dev/view/xedin/job/xedin-CASSANDRA-12078-testall/lastCompletedBuild/testReport/]
 but somehow it didn't show me the problem with standard analyzer. 

Actually after thinking about this further - I think stop words should be 
specified as a list in the language which is used by the field, so maybe the 
problem here is actually not with stop words ordering but rather with that 
locale has been set? 

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

   Resolution: Fixed
Fix Version/s: (was: 3.7)
   3.8
   Status: Resolved  (was: Patch Available)

Committed.

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Issue Type: Bug  (was: Improvement)

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Component/s: (was: CQL)
 sasi

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

 Reviewer: Pavel Yaskevich
Fix Version/s: 3.7

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12073:

Assignee: DOAN DuyHai

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the term match but also returns *false* when it's a *partial term* !
> {code:java}
> public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
> {
> if (!hasUpper())
> return true;
> if (nonMatchingPartial(term))
> return false;
> int cmp = term.compareTo(validator, upper.value, false);
> return cmp < 0 || cmp == 0 && upper.inclusive;
> }
> {code}
> By debugging the OnDiskIndex data, I've found:
> {noformat}
> ...
> Data Term (partial ? false) : lady gaga. 0x0, TokenTree offset : 21120
> Data Term (partial ? true) : lady of bells. 0x0,

[jira] [Updated] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-23 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12073:

 Reviewer: Pavel Yaskevich
Fix Version/s: 3.7

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the term match but also returns *false* when it's a *partial term* !
> {code:java}
> public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
> {
> if (!hasUpper())
> return true;
> if (nonMatchingPartial(term))
> return false;
> int cmp = term.compareTo(validator, upper.value, false);
> return cmp < 0 || cmp == 0 && upper.inclusive;
> }
> {code}
> By debugging the OnDiskIndex data, I've found:
> {noformat}
> ...
> Data Term (partial ? false) : lady gaga. 0x0, TokenTree offset : 21120
> Data Term (partial

[jira] [Commented] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347495#comment-15347495
 ] 

Pavel Yaskevich commented on CASSANDRA-12073:
-

[~doanduyhai] Can you please assign a proper patch so we can run it through CI 
and integrate?

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the term match but also returns *false* when it's a *partial term* !
> {code:java}
> public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
> {
> if (!hasUpper())
> return true;
> if (nonMatchingPartial(term))
> return false;
> int cmp = term.compareTo(validator, upper.value, false);
> return cmp < 0 || cmp == 0 && upper.inclusive;
> }
> {code}
> By debugging the OnDiskIndex data, I've found:
> {noformat}
> ...
> Data Term (partial ? false) : lady gaga. 0x0, TokenTree offset : 21120
> Data

[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-09 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321981#comment-15321981
 ] 

Pavel Yaskevich commented on CASSANDRA-11182:
-

I think such a change deserves it's own ticket :)

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11182) Enable SASI index for collections

2016-06-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321719#comment-15321719
 ] 

Pavel Yaskevich commented on CASSANDRA-11182:
-

I agree with [~beobal] on that, effectively the most important thing we need to 
enable indexing for collections and partition keys is TokenTree which accepts 
variable size keys (such would enable different parititoners, collections 
support, primary key indexing etc.), once that's done all of the changes are 
going to be pretty straight-forward.

> Enable SASI index for collections
> -
>
> Key: CASSANDRA-11182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11182
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Minor
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Right now it is possible with standard Cassandra 2nd index to:
> 1. index list and set elements ( {{WHERE list CONTAINS xxx}})
> 2. index map keys ( {{WHERE map CONTAINS KEYS 'abc'}} )
> 3. index map entries ( {{WHERE map\['key'\]=value}})
>  It would be nice to enable these features in SASI too.
>  With regard to tokenizing, we might want to allow wildcards ({{%}}) with the 
> CONTAINS syntax as well as with index map entries. Ex:
> * {{WHERE list CONTAINS 'John%'}}
> * {{WHERE map CONTAINS KEY '%an%'}}
> * {{WHERE map\['key'\] LIKE '%val%'}}
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11734) Enable partition component index for SASI

2016-05-09 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276894#comment-15276894
 ] 

Pavel Yaskevich commented on CASSANDRA-11734:
-

Thanks for taking a stub at this, [~doanduyhai]! By the nature of changes it 
looks like we will have to postpone this until I'm done with QueryPlan porting 
(CASSANDRA-10765) which is going to make it more sane to have indexed 
restrictions on partitions with(-out) ranges. From the patch I see couple of 
things right away: CFMetaData.getLiveIndices() you mentioned goes against the 
fact that some of the queries don't even allow usage of the indexes, which 
there is no way (currently) to check from inside of the 
SingleColumnRestrictions, checking 
{{QueryController#hasIndexFor(ColumnDefinition)}} on every run of the results 
checking logic is very inefficient and I think instead of using DecoratedKey 
separately we might be better off providing {{Operation.satisfiedBy}} methods 
with {{UnfilteredRowIterator}} and let it iterate it if needed instead of 
involving {{QueryPlan}}. So I would rather have this after CASSANDRA-10765, 
looks like it would make everybody's life a bit easier :)

> Enable partition component index for SASI
> -
>
> Key: CASSANDRA-11734
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11734
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>  Labels: doc-impacting, sasi, secondaryIndex
> Fix For: 3.8
>
> Attachments: patch.txt
>
>
> Enable partition component index for SASI
> For the given schema:
> {code:sql}
> CREATE TABLE test.comp (
> pk1 int,
> pk2 text,
> val text,
> PRIMARY KEY ((pk1, pk2))
> );
> CREATE CUSTOM INDEX comp_val_idx ON test.comp (val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX comp_pk2_idx ON test.comp (pk2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX', 
> 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> CREATE CUSTOM INDEX comp_pk1_idx ON test.comp (pk1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> The following queries are possible:
> {code:sql}
> SELECT * FROM test.comp WHERE pk1=1;
> SELECT * FROM test.comp WHERE pk1>=1 AND pk1<=5;
> SELECT * FROM test.comp WHERE pk1=1 AND val='xxx' ALLOW FILTERING;
> SELECT * FROM test.comp WHERE pk1>=1 AND pk1<=5 AND val='xxx' ALLOW FILTERING;
> SELECT * FROM test.comp WHERE pk2='some text';
> SELECT * FROM test.comp WHERE pk2 LIKE 'prefix%';
> SELECT * FROM test.comp WHERE pk2='some text' AND val='xxx' ALLOW FILTERING;
> SELECT * FROM test.comp WHERE pk2 LIKE 'prefix%' AND val='xxx' ALLOW 
> FILTERING;
> //Without using SASI
> SELECT * FROM test.comp WHERE pk1 = 1 AND pk2='some text';
> SELECT * FROM test.comp WHERE pk1 IN(1,2,3) AND pk2='some text';
> SELECT * FROM test.comp WHERE pk1 = 1 AND pk2 IN ('text1','text2');
> SELECT * FROM test.comp WHERE pk1 IN(1,2,3) AND pk2 IN ('text1','text2');
> {code}
> However, the following queries *are not possible*
> {code:sql}
> SELECT * FROM test.comp WHERE pk1=1 AND pk2 LIKE 'prefix%';
> SELECT * FROM test.comp WHERE pk1>=1 AND pk1<=5 AND pk2 = 'some text';
> SELECT * FROM test.comp WHERE pk1>=1 AND pk1<=5 AND pk2 LIKE 'prefix%';
> {code}
> All of them are throwing the following exception
> {noformat}
> ava.lang.UnsupportedOperationException: null
>   at 
> org.apache.cassandra.cql3.restrictions.SingleColumnRestriction$LikeRestriction.appendTo(SingleColumnRestriction.java:715)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.restrictions.PartitionKeySingleRestrictionSet.values(PartitionKeySingleRestrictionSet.java:86)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.restrictions.StatementRestrictions.getPartitionKeys(StatementRestrictions.java:585)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:473)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getQuery(SelectStatement.java:265)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:230)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:79)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:208)
>  ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:239) 
> ~[main/:na]
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:224) 
> ~[main/:na]
>   at 
>

[jira] [Updated] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-28 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-5863:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Ok, makes sense, since this way only caching rebufferer can invalidate and as 
everything is wrapping we'll have to double check anyway, maybe cleaner way was 
to include invalidate into factory or rebuffer interface but I will leave it 
for the future to decide :)

+1 and committed. Thanks!

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261537#comment-15261537
 ] 

Pavel Yaskevich edited comment on CASSANDRA-10661 at 4/28/16 4:50 AM:
--

Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

Edit: also NonTokenizingAnalyzer is located in 
'org.apache.cassandra.index.sasi' as well.


was (Author: xedin):
Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261537#comment-15261537
 ] 

Pavel Yaskevich edited comment on CASSANDRA-10661 at 4/28/16 4:50 AM:
--

Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

Edit: also NonTokenizingAnalyzer is located in 
'org.apache.cassandra.index.sasi.analyzer' as well.


was (Author: xedin):
Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

Edit: also NonTokenizingAnalyzer is located in 
'org.apache.cassandra.index.sasi' as well.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261537#comment-15261537
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-27 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261121#comment-15261121
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


+1 on the changes, much more readable now. Maybe one more nit from my original 
comments - is there anyway we can change ChunkCache#invalidatePosition so 
instead of doing instance-of checks and redirects to CachedRebufferer it simply 
does invalidate(new Key(...)), since ChunkReader is effectively stateless maybe 
we could drop RebuffererFactory and use ChunkReader as a source of all 
Rebufferers? This way IMHO it's clearer that ChunkReader is the source of the 
data and doesn't have any bufferering, if buffering/caching is needed it can 
produce Rebufferer which manages the memory, WDYT?

Also how do you want to proceed with this? After all of the changes can you 
squash/rebase, so I can push?



> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-26 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258961#comment-15258961
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


[~blambov] What I meant is similar to per-file map with shared eviction 
strategy, if you already have that in mind - perfect :) What you are saying 
regarding rebufferers makes sense to me, I was just trying to advocate for is 
providing better distinction between BufferlessRebufferer and Rebufferer at 
least via naming so "rebuffer" or "buffer processor" is the thing with holds 
the actual processing logic and BufferlessRebufferer is essentially "data 
source" or "data provider/producer" for it. I'm asking to do this because for 
me, as an observer of the changes, the distinction wasn't clear from the first 
glimpse, what made it especially confusing is rebuffer method itself.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-26 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257687#comment-15257687
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


bq. This is not that trivial for rebuffering – the requested length would 
normally be bigger than the number of available bytes during the last 
{{reBuffer()}} call on the file, thus the changes required to implement this 
check are too substantial to be within the scope of this ticket.

Ok, fair enough.

bq. There are other things to take into account:

Let me address the things you mentioned - all of the data is written into 
SSTable file in fixed size chucks and most of the rebuffers are done (or have 
been done) in the granularity of 64k or what ever compression buffer size was, 
since SSTable has compression parameters we might want to have cache work on 
the level of sstables instead of individual files, that way we can get access 
to some essential metadata. So what I was saying is that cache could hold 
already decompressed 64k (or other power of 2 size) aligned buffers, either raw 
file data or decompressed data based on the file; backend implementation, 
plugged into rebufferer, would mmap or use regular channel read to read buffer 
size aligned chunks based on position given to it, in compressed mode cache 
would hold decompressed buffers so it doesn't have to share mmap'ed buffers. 
ReaderCache can rely on LIRS or LRU as a replacement mechanism for aligned 
buffers, so each buffer is going to be reclaimed when either sstable is removed 
(another reason to work closely with sstables) or replacement mechanism 
indicates that buffer is no longer viable or invalidated manually. Sorry I 
can't provide code, so if you think that rethinking this is not worth it, I'm 
fine with that.

bq. The logic is clearly defined for this round. This patch is targeted at 3.x 
where we can't change sstable format in any way, and support for this format 
will be required long in the future.

I wasn't aware of that. [~jbellis] We can't modify SSTable format at all while 
in 3.x phrase, even backward compatible changes in the even feature releases?

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-25 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257210#comment-15257210
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


bq. The key itself is a small and fixed part of the overhead (all objects it 
references are already found elsewhere); there are also on-heap support 
structures within the implementing cache which are bigger. Though that's not 
trivial, we could also account for those, but I don't know how that helps cache 
management and sizing for the user.

The problem I see with this is the same for any other data structure on JVM - 
if we don't account for additional overhead at some point it will blow up and 
it won't be pretty, especially if we don't account for internal size of the 
data structure which holds the cache and other overhead like keys and their 
containers, can we claim with certainty that at some capacity it's actual size 
in memory is not going to be 2x or 3x? If yes then let's leave it like it is 
today otherwise we need to do something about it right away.

bq. I'm sorry, I do not understand the problem – the code only relies on the 
position of the buffer and since buffer is cleared before the read, an end of 
stream (and only that) will result in an empty buffer; both read() and 
readByte() interpret this correctly.

Sorry, what I mean - we might want to be more conservative and indicate early 
that requested length is bigger than number of available bytes, we already had 
couple of bugs which where hard to debug because EOFException doesn't provide 
any useful information...

bq. I had added a return of the passed buffer for convenience but it also adds 
possibility for error – changed the return of the method to void. On the other 
point, it does not make sense for the callee to return an (aligned) offset as 
the caller may need to have a better control over positioning before allocating 
the buffer – caching rebufferers, specifically, do.

and 

bq. This wasn't the case even before this ticket. When RAR requests rebuffering 
at a certain position, it can either have its buffer filled (direct case), or 
receive a view of a shared buffer that holds the data (mem-mapped case). There 
was a lot of clumsiness in RAR to handle the question of which of these is the 
case, does it own its buffer, should it be allocated or freed. The patch 
addresses this clumsiness as well as allowing for another type of advantageous 
buffer management.

I understand, I actually started with proposition to return "void" but I 
changed it later on because I saw a possibility to unify bufferless with other 
implementations because essentially the question is - where original data comes 
from - directly from the channel or already mmap'ed buffer, so maybe if we had 
a common interface to both of the cases and used it as a backend for rebufferer 
it would simplify things instead of putting that logic into rebufferer itself? 
Just something to think about...

bq. Interesting. Another possibility mentioned before is to implement 
compression in such a way that the compressed size matches the chunk size. Both 
are orthogonal and outside the scope of this ticket – lets open a new issue for 
that?

I'm fine if we make it a separate ticket but I think we will have to tackle it 
first since it would directly affect rebufferer/cache logic.


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-24 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255913#comment-15255913
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


Indeed it comes with expense of utilizing inode cache more but not having to 
worry about keeping CompressionInfo in memory and all other effects and 
complexities which come from current state of compression might be worth it. 
We'll definitely need some experimental results to back it up but even if it 
doesn't work out the way I expect it still would be an interesting experiment 
to make.

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-24 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255531#comment-15255531
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


*General Notes:*
* I think we should try to come up with better names for rebufferer classes, so 
their function is more obvious... Maybe something like {File, FS, Data}Reader 
with load or read method instead of rebuffer.
* Maybe we should try to implement the idea I mentioned when I originally 
worked on the compression support about 5 years go in CASSANDRA-47 which wasn't 
worth it at that time but might be more relevant now :), it might make caching 
of compressed files a lot simpler, here it is - make compressors always return 
a size of the buffer aligned on PAGE_SIZE (default 512 bytes) and leave "holes" 
in the file by seeking to the next alignment, over the years I've double 
checked with multiple familiar people that most of the modern/popular 
filesystems (NTFS, ext*, xfs etc.) already have support for that, and are not 
going to allocate unused blocks as well as place all of the allocated ones 
close together. This is going to help here in following ways:
-- caches don't have to worry about size/alignment of the 
compressed/decompressed chunks;
-- compressed reader is very simple since it has to just align requested 
offsets (allows to remove CompressionInfo segment);
-- there is no need to keep uncompressed size information around since data 
size is the same for compressed/umcompressed cases (everything is already 
aligned);
-- CRAR/CSF and all of the supporting classes are not longer required;
-- and more e.g. we could potentially just re-map compressed pages into 
decompressed on the fly and cache doesn't even have to know.

*Code Notes:*
* why does ReaderCache only account for buffer side instead of key + buffer 
size in weighter, this means cache size is underestimated?
* couple instanceof checks kind of signal that we want to re-evaluate rebuffer 
class hierarchy.
* ReaderCache#invalidateFile is not very efficient O\(n\) from the size of the 
cache, which is used by cleanup of the mmap'ed files, which might be a problem.
* (potential safety improvement) ChecksummedDataInput#readBuffer - should do 
buffer vs. read length validation for -1 situation because otherwise this might 
cause corruption
* HintsReader - adds unused import and commented out seek which should be 
removed
* since CRAR no longer extends RAR header comment about that should be removed 
as well, as the matter of fact since CRAR now is just
  a container of rebuffer implementations, maybe it makes sense to remove it 
all together and just use RAR from CompressedSegmentedFile with different 
"rebuffer" backends,
  so in other words put all of the rebuffers from CRAR to CSF?
* BufferlessRebufferer#rebuffer(long position, ByteBuffer buffer) at least 
requires better clarification of the parameters and return value,
  because in e.g. CRAR it's not exactly clear why would uncompressed buffer be 
provided to also be returned,
  why can't argument just be filled and return type changed to be long which is 
an (aligned) offset of the file?
  Which allows to remove Rebufferer#rebuffer(long) method and always let 
callers provide the buffer to fill,
  since I only see it used in RAR#reBufferAt and LimitingRebufferer where both 
could be made to hold the actual buffer.
  Such allows to converge everything under BufferlessRebufferer and have 
ReaderCache and RAR to handle buffers and divides
  reponsibilities of buffer management and actual block storage handling 
between RAR and Rebufferer.


> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2016-04-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255189#comment-15255189
 ] 

Pavel Yaskevich commented on CASSANDRA-5863:


[~blambov] Sorry for the delay, I'm planing to look at the code shortly. While 
I'm on it, do you think it would be possible (if it hasn't been done already) 
to simulate situation when single key read touches multiple SSTables (aka 
multi-collation case)? That, I think, might be one of the interesting cases for 
cache performance even without writes present, since it closely reflects some 
of the most common real world situations, which require multiple index/data 
reads per request generating different eviction patterns. 

> In process (uncompressed) page cache
> 
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: T Jake Luciani
>Assignee: Branimir Lambov
>  Labels: performance
> Fix For: 3.x
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data, possibly off heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2016-04-12 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238357#comment-15238357
 ] 

Pavel Yaskevich commented on CASSANDRA-10765:
-

[~doanduyhai] Yes, most likely, one of the goals here to be able to use 
different index implementations in a single query efficiently e.g. use built-in 
composite index as iterator and check for intersections inside of SASI index 
for precision before reading row from primary storage.

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 3.x
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2016-04-12 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236889#comment-15236889
 ] 

Pavel Yaskevich commented on CASSANDRA-10765:
-

I'm planing to start working on this pretty soon so just to resurrect the 
discussion, here is the plan I have in mind: 

- since we rely heavily on ReadCommand and all interfaces are build around that 
I'm going try the least intrusive method - modify WhereClause (and maybe 
StatementRestrictions) to support OR and parenthesis between indexes and 
regular columns only so partition/clustering, if present, would still be 
required to always be separated by AND but I will try to make it so it's easy 
to change it as well if we ever handled to support multiple partitions fetched 
by single query (e.g. key = X OR key = Y).

-- WhereClause is going to become a stack of relations separated by logical 
operators (AND, OR) in postfix notation (to support parenthesis);
-- Partition/Clustering restrictions are going to remain "restriction lists" in 
StatementRestrictions;
-- All other columns are going to be converted into QueryPlan which is going to 
return UnfilteredPartitionIterator;
-- Instead of RowFilter ReadCommand is going to accept QueryPlan which will 
drive the query execution from ReadCommand#executeLocally.


> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 3.x
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11525:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed.

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
>

[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233284#comment-15233284
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] I've force pushed updated code/tests to the CASSANDRA-11525 
branch (testall/dtest are currently running). If you want to verify everything 
you (unfortunately) will have to rebuild indexes again, but this time you can 
only do it on the ma-2164 sstable everything else in unaffected. I'm going to 
wait until testall/dtest completes and merge everything to unblock 3.5 release.

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233148#comment-15233148
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] Thanks for the information! We can reproduce it now, trying with 
3.4 to make sure it's the bug we introduced in CASSANDRA-11383, will keep you 
posted.

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>

[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232878#comment-15232878
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] Thanks, we got it running now trying to reproduce with the SI 
files you have included. Also the {{CREATE INDEX}} command you added to 
description creates index with different name, it doesn't have "_int_" which SI 
files do have.

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232809#comment-15232809
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] ok. we are working on reproducing locally. In the meantime would 
you be able to test against 3.4 to help us confirm/deny whether the issue was 
caused by the changes in CASSANDRA-11383. Also, can you provide the output of 
running your script so we can determine specifically where it errored out? Can 
you please also add Stats component which is currently missing, without it we 
can't to do much?

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Updated] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-07 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11525:

 Reviewer: Pavel Yaskevich
Fix Version/s: 3.5
  Summary: StaticTokenTreeBuilder should respect posibility of 
duplicate tokens  (was: SASI index corruption)

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at

[jira] [Updated] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11525:

Assignee: Jordan West

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231400#comment-15231400
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Ok, as a quick update, I think we know what is going on here, [~jrwest] is 
working on the changes to TokenTree, and it's most definitely caused by changes 
in CASSANDRA-11383. We still going to use your files to validate plus add 
additional tests to prevent this in the future.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231348#comment-15231348
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Sounds good, [~doanduyhai]! Meanwhile we are trying to reproduce based on what 
we can figure out theoretically.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Comment Edited] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231220#comment-15231220
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11525 at 4/7/16 10:20 PM:
--

[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

Edit: Meanwhile it would be great if you could test it on 3.4 and see if that 
produces the same error too.


was (Author: xedin):
[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231220#comment-15231220
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230742#comment-15230742
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

/cc [~jrwest]

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230714#comment-15230714
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Also it looks like the actual key from the index file might be read 
successfully, so maybe along side of printing token you can also print actual 
data from the key that might be helpful to figure out if it's a real key or 
just random set of 32k bytes.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230687#comment-15230687
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

I think we just use incorrect serializer in some of the situations e.g. when 
{{clustering order by}} is used, that's what the problem is because it can't 
properly deserialize index entry. 

[~doanduyhai] It would be great if you could share sstables again so i can 
reproduce locally.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
>

[jira] [Updated] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-04 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11183:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Merged. Thanks again, [~doanduyhai]!

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_FINAL_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223900#comment-15223900
 ] 

Pavel Yaskevich commented on CASSANDRA-11183:
-

[~beobal] My commit also fixes SecondaryIndexTest for LIKE operator after 
CASSANDRA-11434 has added support for prefix/eq for CONTAINS mode.

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_FINAL_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223802#comment-15223802
 ] 

Pavel Yaskevich commented on CASSANDRA-11183:
-

Thanks, [~doanduyhai]! I've fixed a couple of styling errors, removed old param 
from JavaDoc for localSatisfiedBy, fixed ColumnIndex to flush staticRow and 
added flush to the SASI test to make sure that it works for both memtable and 
sstable (it was only testing memtable). Everything is pushed and I've kicked 
off CI build

||branch||testall||dtest||
|[CASSANDRA-11183|https://github.com/xedin/cassandra/tree/CASSANDRA-11183]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11183-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11183-dtest/]|

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_FINAL_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-03 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11183:

Reviewer: Pavel Yaskevich

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_AFTER_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-03 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11183:

Component/s: (was: CQL)
 sasi

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_AFTER_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-03 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11183:

Attachment: CASSANDRA-11183-statics.patch

[~doanduyhai] The latest patch is not applicable on the trunk, so you will have 
to rebase. I'm also attaching changes (CASSANDRA-11183-statics.patch) for 
Operation, ColumnIndex.getValueOf and SASIIndexBuilder that I think are less 
intrusive comparing to separating getValue into getValueOf and getStaticValueOf 
which has to check column kind anyway.

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: CASSANDRA-11183-statics.patch, 
> patch_SASI_for_Static_AFTER_Review.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-11434) Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting SA metadata per term

2016-04-03 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich resolved CASSANDRA-11434.
-
Resolution: Fixed

+1, Committed.

> Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting 
> SA metadata per term
> --
>
> Key: CASSANDRA-11434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11434
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Pavel Yaskevich
>Assignee: Jordan West
> Fix For: 3.6
>
>
> We can support EQ/PREFIX requests to CONTAINS indexes by tracking 
> "partiality" of the data stored in the OnDiskIndex and IndexMemtable, if we 
> know exactly if current match represents part of the term or it's original 
> form it would be trivial to support EQ/PREFIX since PREFIX is subset of 
> SUFFIX matches.
> Since we attach uint16 size to each term stored we can take advantage of sign 
> bit so size of the index is not impacted at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-02 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223100#comment-15223100
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11183 at 4/3/16 3:46 AM:
-

I think it would be better if instead of Set satisfiesBy and 
localSatisfiesBy methods are to have staticRow instead e.g. 
{{\{local\}SatisfiedBy(Unfiltered currentCluster, Row statics, boolean 
allowMissingColumns)}} and {{ColumnIndex.getValueOf(ColumnDefinition, Row 
cluster, Row statics, int now)}} would just pick correct row to get value data 
from by adding {{case STATIC}}.

Edit: forgot to mention that doing the way in the patch breaks OR support, the 
only feasible way I see to fix that is to use the way I've described which 
doesn't really distinguesh without normal and static columns.


was (Author: xedin):
I think it would be better if instead of Set satisfiesBy and 
localSatisfiesBy methods are to have staticRow instead e.g. 
{{\{local\}SatisfiedBy(Unfiltered currentCluster, Row statics, boolean 
allowMissingColumns)}} and {{ColumnIndex.getValueOf(ColumnDefinition, Row 
cluster, Row statics, int now)}} would just pick correct row to get value data 
from by adding {{case STATIC}}.

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: patch_SASI_for_Static.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-02 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223100#comment-15223100
 ] 

Pavel Yaskevich commented on CASSANDRA-11183:
-

I think it would be better if instead of Set satisfiesBy and 
localSatisfiesBy methods are to have staticRow instead e.g. 
{{\{local\}SatisfiedBy(Unfiltered currentCluster, Row statics, boolean 
allowMissingColumns)}} and {{ColumnIndex.getValue(ColumnDefinition, Row 
cluster, Row statics, int now)}} would just pick correct row to get value data 
from by adding {{case STATIC}}.

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: patch_SASI_for_Static.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11183) Enable SASI index for static columns

2016-04-02 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223100#comment-15223100
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11183 at 4/3/16 2:00 AM:
-

I think it would be better if instead of Set satisfiesBy and 
localSatisfiesBy methods are to have staticRow instead e.g. 
{{\{local\}SatisfiedBy(Unfiltered currentCluster, Row statics, boolean 
allowMissingColumns)}} and {{ColumnIndex.getValueOf(ColumnDefinition, Row 
cluster, Row statics, int now)}} would just pick correct row to get value data 
from by adding {{case STATIC}}.


was (Author: xedin):
I think it would be better if instead of Set satisfiesBy and 
localSatisfiesBy methods are to have staticRow instead e.g. 
{{\{local\}SatisfiedBy(Unfiltered currentCluster, Row statics, boolean 
allowMissingColumns)}} and {{ColumnIndex.getValue(ColumnDefinition, Row 
cluster, Row statics, int now)}} would just pick correct row to get value data 
from by adding {{case STATIC}}.

> Enable SASI index for static columns
> 
>
> Key: CASSANDRA-11183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
>Priority: Minor
> Fix For: 3.6
>
> Attachments: patch_SASI_for_Static.txt
>
>
> This is a follow up ticket for post Cassandra 3.4 SASI integration.
> Since [CASSANDRA-8103] it is possible to index static columns, which is 
> *extremely useful* for some scenarios (find all sensors whose characteristics 
> are saved in static columns)
> /cc [~xedin] [~rustyrazorblade] [~jkrupan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-11389) Case sensitive in LIKE query althogh index created with false

2016-04-01 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich resolved CASSANDRA-11389.
-
Resolution: Not A Bug
  Assignee: Pavel Yaskevich

> Case sensitive in LIKE query althogh index created with false
> -
>
> Key: CASSANDRA-11389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11389
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Alon Levi
>Assignee: Pavel Yaskevich
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
>
> I created an index on user's first name as following: 
> CREATE CUSTOM INDEX ON users (first_name) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> with options = {
> 'mode' : 'CONTAINS',
> 'case_sensitive' : 'false'
> };
> This is the data I have in my table
> user_id | first_name 
> | last_name
> ---+---+---
> daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 |  Shlomo | Cohen
> ab38ce9d-2823-4e6a-994f-7783953baef1  |  Elad  |  Karakuli
> 5e8371a7-3ed9-479f-9e4b-e4a07c750b12 |  Alon  |  Levi
> ae85cdc0-5eb7-4f08-8e42-2abd89e327ed |  Gil | Elias
> Although i mentioned the option 'case_sensitive' : 'false'
> when I run this query : 
> select user_id, first_name from users where first_name LIKE '%shl%';
> The query returns no results.
> However, when I run this query :
> select user_id, first_name from users where first_name LIKE '%Shl%';
> The query returns the right results,
> and the strangest thing is when I run this query:
> select user_id, first_name from users where first_name LIKE 'shl%';
> suddenly the query is no more case sensitive and the results are fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-31 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220466#comment-15220466
 ] 

Pavel Yaskevich commented on CASSANDRA-11456:
-

[~beobal] LGTM, +1.

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.6
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-31 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11456:

Assignee: Sam Tunnicliffe  (was: Pavel Yaskevich)
Reviewer: Pavel Yaskevich

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.6
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-30 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219078#comment-15219078
 ] 

Pavel Yaskevich commented on CASSANDRA-11456:
-

If we wanted to go with LikeRestriction changes it would means that we need to 
add base Operator.LIKE which will be converted into appropriate type at 
statement execution time.

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
> Fix For: 3.6
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-30 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11456:

Issue Type: Improvement  (was: Bug)

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
> Fix For: 3.6
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-30 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11456:

Fix Version/s: (was: 3.5)
   3.6

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
> Fix For: 3.6
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11456) support for PreparedStatement with LIKE

2016-03-30 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218661#comment-15218661
 ] 

Pavel Yaskevich commented on CASSANDRA-11456:
-

This is actually going to be pretty tough to implement since actual type of 
LIKE operator (LIKE_\{PREFIX, SUFFIX, ...\}) is determined by the value and we 
can't get a value while building restrictions which means we can't say if 
restriction matches index etc. until it's too late already. 

I guess we could cheat in LikeRestriction\#isSupportedBy(Index) method and just 
accept purely based on the column definition and fail once query is actually 
executed since that's only when we will be able to properly determine a type of 
LIKE statement.

Or, as an alternative, we could support something like {{SELECT * FROM X WHERE 
user_name LIKE '?%'}} so only values of the LIKE could be bind, but I'm not 
sure if that's even is currently feasible in CQL.

WDYT, [~beobal]?

> support for PreparedStatement with LIKE
> ---
>
> Key: CASSANDRA-11456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11456
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
> Fix For: 3.5
>
>
> Using the Java driver for example:
> {code}
> PreparedStatement pst = session.prepare("select * from test.users where 
> first_name LIKE ?");
> BoundStatement bs = pst.bind("Jon%");
> {code}
> The first line fails with {{SyntaxError: line 1:47 mismatched input '?' 
> expecting STRING_LITERAL}} (which makes sense since it's how it's declared in 
> the grammar). Other operators declare the right-hand side value as a 
> {{Term.Raw}}, which can also be a bind marker.
> I think users will expect to be able to bind the argument this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11389) Case sensitive in LIKE query althogh index created with false

2016-03-30 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217678#comment-15217678
 ] 

Pavel Yaskevich commented on CASSANDRA-11389:
-

Sorry, it's actually both not either and LIKE 'shl%' (which is prefix query) is 
not currently allowed with CONTAINS index.

{noformat}
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.6-SNAPSHOT | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.
cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};
cqlsh> use test;
cqlsh:test> help create_table;
cqlsh:test> create table users (user_id uuid PRIMARY KEY, first_name text, 
last_name text);
cqlsh:test> CREATE CUSTOM INDEX ON users (first_name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
... with options = { 'mode': 'CONTAINS', 'analyzed': 'true', 
'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive' : 'false' };
cqlsh:test> help INSERT;
cqlsh:test> INSERT INTO users (user_id, first_name, last_name) VALUES 
(daa312ae-ecdf-4eb4-b6e9-206e33e5ca24, 'Shlomo', 'Cohen');
cqlsh:test> INSERT INTO users (user_id, first_name, last_name) VALUES 
(ab38ce9d-2823-4e6a-994f-7783953baef1, 'Elad', 'Karakuli');
cqlsh:test> INSERT INTO users (user_id, first_name, last_name) VALUES 
(5e8371a7-3ed9-479f-9e4b-e4a07c750b12, 'Alon', 'Levi');
cqlsh:test> INSERT INTO users (user_id, first_name, last_name) VALUES 
(ae85cdc0-5eb7-4f08-8e42-2abd89e327ed, 'Gil', 'Elias');
cqlsh:test> select user_id, first_name from users where first_name LIKE '%shl%';

 user_id  | first_name
--+
 daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 | Shlomo

(1 rows)
cqlsh:test> select user_id, first_name from users where first_name LIKE '%Shl%';

 user_id  | first_name
--+
 daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 | Shlomo

(1 rows)
cqlsh:test> select user_id, first_name from users where first_name LIKE 'shl%';
InvalidRequest: code=2200 [Invalid query] message="first_name LIKE '%' 
restriction is only supported on properly indexed columns"
cqlsh:test>
{noformat}

> Case sensitive in LIKE query althogh index created with false
> -
>
> Key: CASSANDRA-11389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11389
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alon Levi
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
>
> I created an index on user's first name as following: 
> CREATE CUSTOM INDEX ON users (first_name) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> with options = {
> 'mode' : 'CONTAINS',
> 'case_sensitive' : 'false'
> };
> This is the data I have in my table
> user_id | first_name 
> | last_name
> ---+---+---
> daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 |  Shlomo | Cohen
> ab38ce9d-2823-4e6a-994f-7783953baef1  |  Elad  |  Karakuli
> 5e8371a7-3ed9-479f-9e4b-e4a07c750b12 |  Alon  |  Levi
> ae85cdc0-5eb7-4f08-8e42-2abd89e327ed |  Gil | Elias
> Although i mentioned the option 'case_sensitive' : 'false'
> when I run this query : 
> select user_id, first_name from users where first_name LIKE '%shl%';
> The query returns no results.
> However, when I run this query :
> select user_id, first_name from users where first_name LIKE '%Shl%';
> The query returns the right results,
> and the strangest thing is when I run this query:
> select user_id, first_name from users where first_name LIKE 'shl%';
> suddenly the query is no more case sensitive and the results are fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11389) Case sensitive in LIKE query althogh index created with false

2016-03-29 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217187#comment-15217187
 ] 

Pavel Yaskevich commented on CASSANDRA-11389:
-

I think what is going on here is that "case_sensetive" is a feature of 
analyzer, indexes are not analyzed by default that's why index returns no 
results since that flag is simply ignored. To fix this you should set - either 
"analyzed": "true" or ‘analyzer_class’: 
‘org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer’ in the index 
options.

> Case sensitive in LIKE query althogh index created with false
> -
>
> Key: CASSANDRA-11389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11389
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alon Levi
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
>
> I created an index on user's first name as following: 
> CREATE CUSTOM INDEX ON users (first_name) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> with options = {
> 'mode' : 'CONTAINS',
> 'case_sensitive' : 'false'
> };
> This is the data I have in my table
> user_id | first_name 
> | last_name
> ---+---+---
> daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 |  Shlomo | Cohen
> ab38ce9d-2823-4e6a-994f-7783953baef1  |  Elad  |  Karakuli
> 5e8371a7-3ed9-479f-9e4b-e4a07c750b12 |  Alon  |  Levi
> ae85cdc0-5eb7-4f08-8e42-2abd89e327ed |  Gil | Elias
> Although i mentioned the option 'case_sensitive' : 'false'
> when I run this query : 
> select user_id, first_name from users where first_name LIKE '%shl%';
> The query returns no results.
> However, when I run this query :
> select user_id, first_name from users where first_name LIKE '%Shl%';
> The query returns the right results,
> and the strangest thing is when I run this query:
> select user_id, first_name from users where first_name LIKE 'shl%';
> suddenly the query is no more case sensitive and the results are fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11389) Case sensitive in LIKE query althogh index created with false

2016-03-29 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217187#comment-15217187
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11389 at 3/30/16 1:14 AM:
--

I think what is going on here is that "case_sensitive" is a feature of 
analyzer, indexes are not analyzed by default that's why index returns no 
results since that flag is simply ignored. To fix this you should set - either 
"analyzed": "true" or ‘analyzer_class’: 
‘org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer’ in the index 
options.


was (Author: xedin):
I think what is going on here is that "case_sensetive" is a feature of 
analyzer, indexes are not analyzed by default that's why index returns no 
results since that flag is simply ignored. To fix this you should set - either 
"analyzed": "true" or ‘analyzer_class’: 
‘org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer’ in the index 
options.

> Case sensitive in LIKE query althogh index created with false
> -
>
> Key: CASSANDRA-11389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11389
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Alon Levi
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
>
> I created an index on user's first name as following: 
> CREATE CUSTOM INDEX ON users (first_name) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> with options = {
> 'mode' : 'CONTAINS',
> 'case_sensitive' : 'false'
> };
> This is the data I have in my table
> user_id | first_name 
> | last_name
> ---+---+---
> daa312ae-ecdf-4eb4-b6e9-206e33e5ca24 |  Shlomo | Cohen
> ab38ce9d-2823-4e6a-994f-7783953baef1  |  Elad  |  Karakuli
> 5e8371a7-3ed9-479f-9e4b-e4a07c750b12 |  Alon  |  Levi
> ae85cdc0-5eb7-4f08-8e42-2abd89e327ed |  Gil | Elias
> Although i mentioned the option 'case_sensitive' : 'false'
> when I run this query : 
> select user_id, first_name from users where first_name LIKE '%shl%';
> The query returns no results.
> However, when I run this query :
> select user_id, first_name from users where first_name LIKE '%Shl%';
> The query returns the right results,
> and the strangest thing is when I run this query:
> select user_id, first_name from users where first_name LIKE 'shl%';
> suddenly the query is no more case sensitive and the results are fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-03-29 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216616#comment-15216616
 ] 

Pavel Yaskevich commented on CASSANDRA-11067:
-

Definitely sounds like a bug, I've created CASSANDRA-11456 to track that.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
>  Labels: client-impacting
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1918 matches

Mail list logo