[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elvis Rocha updated SOLR-6468:
------------------------------
    Comment: was deleted

(was: I created a filter to remove gaps between tokens

{code:title=RemoveEmptyTokenFilterFactory.java|borderStyle=solid}
package filter;

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class RemoveEmptyTokenFilterFactory extends TokenFilterFactory {

        public RemoveEmptyTokenFilterFactory(Map<String, String> args) {
                super(args);
        }

        @Override
        public TokenStream create(TokenStream input) {
                RemoveEmptyTokenFilter filter = new 
RemoveEmptyTokenFilter(input);
                return filter;
        }

}

final class RemoveEmptyTokenFilter extends TokenFilter {

        private final PositionIncrementAttribute posIncrAtt = 
addAttribute(PositionIncrementAttribute.class);

        public RemoveEmptyTokenFilter(TokenStream input) {
                super(input);
        }

        @Override
        public final boolean incrementToken() throws IOException {
                while (input.incrementToken()) {
                        posIncrAtt.setPositionIncrement(1);
                        return true;
                }
                return false;
        }
}
{code}



{code:title=schema.xml|borderStyle=solid}
<fieldType name="text_match" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_pt.txt" format="snowball"/>
                <filter class="filter.RemoveEmptyTokenFilterFactory" />
        </analyzer>
</fieldType>
{code})

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6468
>                 URL: https://issues.apache.org/jira/browse/SOLR-6468
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.8.1, 4.9
>            Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
> <fieldType name="words_ngram" class="solr.TextField" omitNorms="false" 
> autoGeneratePhraseQueries="true">
>   <analyzer>
>     <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" />
>     <filter class="solr.StopFilterFactory" words="url_stopwords.txt" 
> ignoreCase="true" />
>     <filter class="solr.LowerCaseFilterFactory" />
>   </analyzer>
> </fieldType>
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to