[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-26 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks [~doanduyhai]! I've removed already committed part from the 
patch and included only change for {{StemmingFilters}} and tests.

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt, patch_V2.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-26 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-12078:

Attachment: patch_V2.txt

Attached is {{patch_V2.txt}}

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt, patch_V2.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-26 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-12078:

Status: Patch Available  (was: Reopened)

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

   Resolution: Fixed
Fix Version/s: (was: 3.7)
   3.8
   Status: Resolved  (was: Patch Available)

Committed.

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.8
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Issue Type: Bug  (was: Improvement)

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

Component/s: (was: CQL)
 sasi

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-12078:

 Reviewer: Pavel Yaskevich
Fix Version/s: 3.7

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Fix For: 3.7
>
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-12078:

Status: Patch Available  (was: Open)

> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12078) [SASI] Move skip_stop_words filter BEFORE stemming

2016-06-23 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-12078:

Description: 
Right now, if skip stop words and stemming are enabled, SASI will put stemming 
in the filter pipeline BEFORE skip_stop_words:

{code:java}

private FilterPipelineTask getFilterPipeline()
{
FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
BasicResultFilters.NoOperation());
 ...
if (options.shouldStemTerms())
builder = builder.add("term_stemming", new 
StemmingFilters.DefaultStemmingFilter(options.getLocale()));
if (options.shouldIgnoreStopTerms())
builder = builder.add("skip_stop_words", new 
StopWordFilters.DefaultStopWordFilter(options.getLocale()));
return builder.build();
}
{code}

The problem is that stemming before removing stop words can yield wrong results.

I have an example:

{code:sql}
SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' ALLOW 
FILTERING;
{code}

Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
vowel is removed). Then skip stop words is applied. Unfortunately *dans* (*in* 
in English) is a stop word in French so it is removed completely.

In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
country='France'}} and of course the results are wrong.

Attached is a trivial patch to move the skip_stop_words filter BEFORE stemming 
filter

/cc [~xedin] [~jrwest] [~beobal]

  was:
Right now, if skip stop words and stemming are enabled, SASI will put stemming 
in the filter pipeline BEFORE skip_stop_words:

{code:java}

private FilterPipelineTask getFilterPipeline()
{
FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
BasicResultFilters.NoOperation());
 ...
if (options.shouldStemTerms())
builder = builder.add("term_stemming", new 
StemmingFilters.DefaultStemmingFilter(options.getLocale()));
if (options.shouldIgnoreStopTerms())
builder = builder.add("skip_stop_words", new 
StopWordFilters.DefaultStopWordFilter(options.getLocale()));
return builder.build();
}
{code}

The problem is that stemming before removing stop words can yield wrong results.

I have an example:

{code:sql}
SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' ALLOW 
FILTERING;
{code}

*danse* = *dance* in English, and because of stemming, it becomes *dans* (the 
final vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
= *in* in English, a stop word in French so it is removed completely.

In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
country='France'}} and of course the results are wrong.

Attached is a trivial patch to move the skip_stop_words filter BEFORE stemming 
filter


> [SASI] Move skip_stop_words filter BEFORE stemming
> --
>
> Key: CASSANDRA-12078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12078
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: Cassandra 3.7, Cassandra 3.8
>Reporter: DOAN DuyHai
>Assignee: DOAN DuyHai
> Attachments: patch.txt
>
>
> Right now, if skip stop words and stemming are enabled, SASI will put 
> stemming in the filter pipeline BEFORE skip_stop_words:
> {code:java}
> private FilterPipelineTask getFilterPipeline()
> {
> FilterPipelineBuilder builder = new FilterPipelineBuilder(new 
> BasicResultFilters.NoOperation());
>  ...
> if (options.shouldStemTerms())
> builder = builder.add("term_stemming", new 
> StemmingFilters.DefaultStemmingFilter(options.getLocale()));
> if (options.shouldIgnoreStopTerms())
> builder = builder.add("skip_stop_words", new 
> StopWordFilters.DefaultStopWordFilter(options.getLocale()));
> return builder.build();
> }
> {code}
> The problem is that stemming before removing stop words can yield wrong 
> results.
> I have an example:
> {code:sql}
> SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' 
> ALLOW FILTERING;
> {code}
> Because of stemming *danse* ( *dance* in English) becomes *dans* (the final 
> vowel is removed). Then skip stop words is applied. Unfortunately *dans* 
> (*in* in English) is a stop word in French so it is removed completely.
> In the end the query is equivalent to {{SELECT * FROM music.albums WHERE 
> country='France'}} and of course the results are wrong.
> Attached is a trivial patch to move the skip_stop_words filter BEFORE 
> stemming filter
> /cc [~xedin] [~jrwest] [~beobal]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)