[jira] [Commented] (CASSANDRA-12674) [SASI] Confusing AND/OR semantics for StandardAnalyzer

2018-04-06 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428229#comment-16428229
 ] 

DOAN DuyHai commented on CASSANDRA-12674:
-

Look like this ticket id dead.

Because the fix for the bug is far from trivial I don't see any solution soon, 
or ever 

> [SASI] Confusing AND/OR semantics for StandardAnalyzer 
> ---
>
> Key: CASSANDRA-12674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12674
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>Assignee: Alex Petrov
>Priority: Major
>
> {code:sql}
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> use test;
> cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY 
> KEY((id), clustering));
> cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
> 'analyzed': 'true'};
> //1st example SAME PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 
> 'homeworker');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 
> 'hardworker');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
>  id | clustering | val
> ++
>   1 |  1 | homeworker
>   1 |  2 | hardworker
> (2 rows)
> //2nd example DIFFERENT PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 
> 'speedrun');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 
> 'longrun');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
>  id | clustering | val
> ++-
>  11 |  1 | longrun
> (1 rows)
> {code}
> In the 1st example, both rows belong to the same partition so SASI returns 
> both values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 
> 'home'}} so the result makes sense
> In the 2nd example, only one row is returned whereas we expect 2 rows because 
> {{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should 
> be returned too.
> So where is the problem ? Explanation:
> When there is only 1 predicate, the root operation type is an *AND*:
> {code:java|title=QueryPlan}
> private Operation analyze()
> {
> try
> {
> Operation.Builder and = new Operation.Builder(OperationType.AND, 
> controller);
> controller.getExpressions().forEach(and::add);
> return and.complete();
> }
>...
> }
> {code}
> During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for 
> the searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. 
> However, this piece of code just ruins the *OR* logic:
> {code:java|title=Operation}
> public Operation complete()
> {
> if (!expressions.isEmpty())
> {
> ListMultimap 
> analyzedExpressions = analyzeGroup(controller, op, expressions);
> RangeIterator.Builder range = 
> controller.getIndexes(op, analyzedExpressions.values());
>  ...
> }
> {code}
> As you can see, we blindly take all the *values* of the MultiMap (which 
> contains a single entry for the {{val}} column with 2 expressions) and pass 
> it to {{controller.getIndexes(...)}}
> {code:java|title=QueryController}
> public RangeIterator.Builder getIndexes(OperationType op, 
> Collection expressions)
> {
> if (resources.containsKey(expressions))
> throw new IllegalArgumentException("Can't process the same 
> expressions multiple times.");
> RangeIterator.Builder builder = op == OperationType.OR
> ? RangeUnionIterator. Token>builder()
> : 
> RangeIntersectionIterator.builder();
> ...
> }
> {code}
> And because the root operation has *AND* type, the 
> {{RangeIntersectionIterator}} will be used on both expressions {{long}} and 
> {{run}}.
> So when data belong to different partitions, we have the *AND* logic that 
> applies and eliminates _speedrun_
> When data belong to the same partition but different row, the 
> {{RangeIntersectionIterator}} returns a single partition and then the rows 
> are filtered further by {{operationTree.satisfiedBy}} and the results are 
> correct
> {code:java|title=QueryPlan}
> while (currentKeys.hasNext())
> {
>

[jira] [Commented] (CASSANDRA-12674) [SASI] Confusing AND/OR semantics for StandardAnalyzer

2017-01-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830351#comment-15830351
 ] 

Alex Petrov commented on CASSANDRA-12674:
-

[~xedin] sure! I was working on finalising and testing [CASSANDRA-11990], so 
didn't get a chance continue on this one. Hope to get to it soon. 

> [SASI] Confusing AND/OR semantics for StandardAnalyzer 
> ---
>
> Key: CASSANDRA-12674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12674
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {code:sql}
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> use test;
> cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY 
> KEY((id), clustering));
> cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
> 'analyzed': 'true'};
> //1st example SAME PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 
> 'homeworker');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 
> 'hardworker');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
>  id | clustering | val
> ++
>   1 |  1 | homeworker
>   1 |  2 | hardworker
> (2 rows)
> //2nd example DIFFERENT PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 
> 'speedrun');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 
> 'longrun');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
>  id | clustering | val
> ++-
>  11 |  1 | longrun
> (1 rows)
> {code}
> In the 1st example, both rows belong to the same partition so SASI returns 
> both values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 
> 'home'}} so the result makes sense
> In the 2nd example, only one row is returned whereas we expect 2 rows because 
> {{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should 
> be returned too.
> So where is the problem ? Explanation:
> When there is only 1 predicate, the root operation type is an *AND*:
> {code:java|title=QueryPlan}
> private Operation analyze()
> {
> try
> {
> Operation.Builder and = new Operation.Builder(OperationType.AND, 
> controller);
> controller.getExpressions().forEach(and::add);
> return and.complete();
> }
>...
> }
> {code}
> During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for 
> the searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. 
> However, this piece of code just ruins the *OR* logic:
> {code:java|title=Operation}
> public Operation complete()
> {
> if (!expressions.isEmpty())
> {
> ListMultimap 
> analyzedExpressions = analyzeGroup(controller, op, expressions);
> RangeIterator.Builder range = 
> controller.getIndexes(op, analyzedExpressions.values());
>  ...
> }
> {code}
> As you can see, we blindly take all the *values* of the MultiMap (which 
> contains a single entry for the {{val}} column with 2 expressions) and pass 
> it to {{controller.getIndexes(...)}}
> {code:java|title=QueryController}
> public RangeIterator.Builder getIndexes(OperationType op, 
> Collection expressions)
> {
> if (resources.containsKey(expressions))
> throw new IllegalArgumentException("Can't process the same 
> expressions multiple times.");
> RangeIterator.Builder builder = op == OperationType.OR
> ? RangeUnionIterator. Token>builder()
> : 
> RangeIntersectionIterator.builder();
> ...
> }
> {code}
> And because the root operation has *AND* type, the 
> {{RangeIntersectionIterator}} will be used on both expressions {{long}} and 
> {{run}}.
> So when data belong to different partitions, we have the *AND* logic that 
> applies and eliminates _speedrun_
> When data belong to the same partition but different row, the 
> {{RangeIntersectionIterator}} returns a single partition and then the rows 
> are filtered further by {{operationTree.satisfiedBy}} and the results are 
> correct
> {code:java|title=QueryPlan}
> while (currentKeys.hasNext())
> {
> DecoratedKey key = 

[jira] [Commented] (CASSANDRA-12674) [SASI] Confusing AND/OR semantics for StandardAnalyzer

2017-01-18 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829060#comment-15829060
 ] 

Pavel Yaskevich commented on CASSANDRA-12674:
-

[~ifesdjeen] Can you please take a look at this one?

> [SASI] Confusing AND/OR semantics for StandardAnalyzer 
> ---
>
> Key: CASSANDRA-12674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12674
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {code:sql}
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> use test;
> cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY 
> KEY((id), clustering));
> cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
> 'analyzed': 'true'};
> //1st example SAME PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 
> 'homeworker');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 
> 'hardworker');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
>  id | clustering | val
> ++
>   1 |  1 | homeworker
>   1 |  2 | hardworker
> (2 rows)
> //2nd example DIFFERENT PARTITION KEY
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 
> 'speedrun');
> cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 
> 'longrun');
> cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
>  id | clustering | val
> ++-
>  11 |  1 | longrun
> (1 rows)
> {code}
> In the 1st example, both rows belong to the same partition so SASI returns 
> both values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 
> 'home'}} so the result makes sense
> In the 2nd example, only one row is returned whereas we expect 2 rows because 
> {{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should 
> be returned too.
> So where is the problem ? Explanation:
> When there is only 1 predicate, the root operation type is an *AND*:
> {code:java|title=QueryPlan}
> private Operation analyze()
> {
> try
> {
> Operation.Builder and = new Operation.Builder(OperationType.AND, 
> controller);
> controller.getExpressions().forEach(and::add);
> return and.complete();
> }
>...
> }
> {code}
> During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for 
> the searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. 
> However, this piece of code just ruins the *OR* logic:
> {code:java|title=Operation}
> public Operation complete()
> {
> if (!expressions.isEmpty())
> {
> ListMultimap 
> analyzedExpressions = analyzeGroup(controller, op, expressions);
> RangeIterator.Builder range = 
> controller.getIndexes(op, analyzedExpressions.values());
>  ...
> }
> {code}
> As you can see, we blindly take all the *values* of the MultiMap (which 
> contains a single entry for the {{val}} column with 2 expressions) and pass 
> it to {{controller.getIndexes(...)}}
> {code:java|title=QueryController}
> public RangeIterator.Builder getIndexes(OperationType op, 
> Collection expressions)
> {
> if (resources.containsKey(expressions))
> throw new IllegalArgumentException("Can't process the same 
> expressions multiple times.");
> RangeIterator.Builder builder = op == OperationType.OR
> ? RangeUnionIterator. Token>builder()
> : 
> RangeIntersectionIterator.builder();
> ...
> }
> {code}
> And because the root operation has *AND* type, the 
> {{RangeIntersectionIterator}} will be used on both expressions {{long}} and 
> {{run}}.
> So when data belong to different partitions, we have the *AND* logic that 
> applies and eliminates _speedrun_
> When data belong to the same partition but different row, the 
> {{RangeIntersectionIterator}} returns a single partition and then the rows 
> are filtered further by {{operationTree.satisfiedBy}} and the results are 
> correct
> {code:java|title=QueryPlan}
> while (currentKeys.hasNext())
> {
> DecoratedKey key = currentKeys.next();
> if (!keyRange.right.isMinimum() && 
>