DOAN DuyHai created CASSANDRA-12078: ---------------------------------------
Summary: [SASI] Move skip_stop_words filter BEFORE stemming Key: CASSANDRA-12078 URL: https://issues.apache.org/jira/browse/CASSANDRA-12078 Project: Cassandra Issue Type: Improvement Components: CQL Environment: Cassandra 3.7, Cassandra 3.8 Reporter: DOAN DuyHai Assignee: DOAN DuyHai Attachments: patch.txt Right now, if skip stop words and stemming are enabled, SASI will put stemming in the filter pipeline BEFORE skip_stop_words: {code:java} private FilterPipelineTask getFilterPipeline() { FilterPipelineBuilder builder = new FilterPipelineBuilder(new BasicResultFilters.NoOperation()); ... if (options.shouldStemTerms()) builder = builder.add("term_stemming", new StemmingFilters.DefaultStemmingFilter(options.getLocale())); if (options.shouldIgnoreStopTerms()) builder = builder.add("skip_stop_words", new StopWordFilters.DefaultStopWordFilter(options.getLocale())); return builder.build(); } {code} The problem is that stemming before removing stop words can yield wrong results. I have an example: {code:sql} SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' ALLOW FILTERING; {code} *danse* = *dance* in English, and because of stemming, it becomes *dans* (the final vowel is removed). Then skip stop words is applied. Unfortunately *dans* = *in* in English, a stop word in French so it is removed completely. In the end the query is equivalent to {{SELECT * FROM music.albums WHERE country='France'}} and of course the results are wrong. Attached is a trivial patch to move the skip_stop_words filter BEFORE stemming filter -- This message was sent by Atlassian JIRA (v6.3.4#6332)