[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017843#comment-16017843 ] Elvis Rocha commented on SOLR-6468: --- I created a filter to remove token gaps {code:title=RemoveTokenGapsFilterFactory.java|borderStyle=solid} package filters; import java.io.IOException; import java.util.Map; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.util.TokenFilterFactory; public class RemoveTokenGapsFilterFactory extends TokenFilterFactory { public RemoveTokenGapsFilterFactory(Mapargs) { super(args); } @Override public TokenStream create(TokenStream input) { RemoveTokenGapsFilter filter = new RemoveTokenGapsFilter(input); return filter; } } final class RemoveTokenGapsFilter extends TokenFilter { private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class); public RemoveTokenGapsFilter(TokenStream input) { super(input); } @Override public final boolean incrementToken() throws IOException { while (input.incrementToken()) { posIncrAtt.setPositionIncrement(1); return true; } return false; } } {code} {code:title=schema.xml|borderStyle=solid} {code} !FieldValue.png! > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017816#comment-16017816 ] Elvis Rocha commented on SOLR-6468: --- I created a filter to remove gaps between tokens {code:title=RemoveEmptyTokenFilterFactory.java|borderStyle=solid} package filter; import java.io.IOException; import java.util.Map; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.util.TokenFilterFactory; public class RemoveEmptyTokenFilterFactory extends TokenFilterFactory { public RemoveEmptyTokenFilterFactory(Mapargs) { super(args); } @Override public TokenStream create(TokenStream input) { RemoveEmptyTokenFilter filter = new RemoveEmptyTokenFilter(input); return filter; } } final class RemoveEmptyTokenFilter extends TokenFilter { private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class); public RemoveEmptyTokenFilter(TokenStream input) { super(input); } @Override public final boolean incrementToken() throws IOException { while (input.incrementToken()) { posIncrAtt.setPositionIncrement(1); return true; } return false; } } {code} {code:title=schema.xml|borderStyle=solid} {code} > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638299#comment-15638299 ] Diego Oliveira commented on SOLR-6468: -- I read all discussion and can't believe on this decision. I'm having the same problem!!! I need to use stopword filter + shingle filter. But when removed the stop words I stay with a hole that create a bug for shingle filters... they duplicate tokens that cannot be removed by Remove Duplicate Filter due to shingle and tokens be in distinct initial positions. I don't believe that the community cannot solve this problem enabling this old feature... as the people said in here. It is best stay with the 'simplified' version than with the new (plus) version. it will until Solr X? Come on!!! > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541697#comment-15541697 ] Alexander S. commented on SOLR-6468: We now can't upgrade to Solr 6 due to this. > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514785#comment-15514785 ] Roman Chyla commented on SOLR-6468: --- Ha! :-) I've found my own comment above, 2 years later I'm facing this situation again, I completely forgot (and truth be told: preferred running old solr 4x). This is how the new solr sees things: A 350-MHz GBT Survey of 50 Faint Fermi γ ray Sources for Radio Millisecond Pulsars is indexed as ``` null_1 1 :350|350mhz 2 :mhz|syn::mhz 3 :acr::gbt|gbt|syn::gbt|syn::green bank telescope 4 :survey|syn::survey null_1 6 :50 ``` the 1st and 5th position is a gap - so the search for "350-MHz GBT Survey of 50 Faint" will fail - because 'of' is a stopword and the stop-filter will always increment the position (what's the purpose of a stopfilter; if it is leaving gaps?) anyways, the solution with CharFilterFactory cannot work for me, I have to do this: 1. search for synonyms (they can contain stopwords) 2. remove stopwords 3. search for other synonyms (that don't have stopwords) I'm afraid the real life is little bit more complex than what it seems; but there is a logic to your choices, SOLR devs, I'm afraid I can agree with you. People who understand the *why* will make it work again as it *should*. Others will happily keep using the 'simplified' version. > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319960#comment-14319960 ] Okke Klein commented on SOLR-6468: -- I also like to see enablePositionIncrements being returned. It is very useful for removing stopwords when using shingles in the Suggester. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225186#comment-14225186 ] Roman Chyla commented on SOLR-6468: --- I also find this change to be unfortunate. If this is just a developers making decisions for users (then it causes problems to users who really know why they do need that feature: for phrase search that should ignore stopwords). But if the underlying issue is something serious with the indexer not being able to work with the position, than it would be even weirder - and actually very bad for many users. I don't really understand benefits of this change. Any chance to return to the original? Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130090#comment-14130090 ] Alexander S. commented on SOLR-6468: Just tried to add matchVersion but got this error: {code} null:org.apache.solr.common.SolrException: Unable to create core: crm-prod at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:568) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Could not load core configuration for core crm-prod at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554) ... 8 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory'. Schema file is /etc/solr/core2/schema.xml at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) ... 9 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470) ... 14 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 15 more Caused by: org.apache.solr.common.SolrException: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:606) at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382) at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:376) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 19 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:603) ... 22 more Caused by: java.lang.IllegalArgumentException: Unknown parameters: {matchVersion=4.3} at org.apache.lucene.analysis.core.StopFilterFactory.init(StopFilterFactory.java:91) ... 27 more {code} Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130109#comment-14130109 ] Steve Rowe commented on SOLR-6468: -- luceneMatchVersion is what you want, not matchVersion Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130147#comment-14130147 ] Uwe Schindler commented on SOLR-6468: - The parameter is luceneMatchVersion. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130209#comment-14130209 ] Alexander S. commented on SOLR-6468: Thanks, it does work with luceneMatchVersion=4.3, isn't this deprecated? Any chance to return enablePositionIncrements? Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130227#comment-14130227 ] Steve Rowe commented on SOLR-6468: -- bq. Any chance to return enablePositionIncrements? Odds are extremely low that this will happen. An alternative is to use a char filter (e.g. [MappingCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.html] or [PatternReplaceCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html]) to remove stuff you don't want; that way the tokenizer won't leave position holes. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118076#comment-14118076 ] Uwe Schindler commented on SOLR-6468: - You have to enable enablePositionIncrements=false and matchVersion=4.4 on the StopFilterFactory, otherwise you get IllegalArgumentException: enablePositionIncrements=false is not supported anymore as of Lucene 4.4 as it can create broken token streams. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118078#comment-14118078 ] Alexander S. commented on SOLR-6468: Correct, but isn't this behavior deprecated? I mean matchVersion=4.3? I was told this could get removed from 5.0 as well. If I do understand the problem correctly enablePositionIncrements=false could generate wrong tokens for those who do not know how to use this option correctly? It seems it requires a custom tokenizer and solr.PatternTokenizerFactory in my example should work properly. So instead of removing the option the problem with wrong tokens could be explained in the readme and the option could be kept for those who really needs it. That makes more sense to me than simply removing it. Anyway, is there any chance the option could be restored? My usecase should clearly show how useful it might be. And I was trying to google the problem, there's a lot of complaints about this, but no solutions. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org