[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2017-05-19 Thread Elvis Rocha (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017843#comment-16017843
 ] 

Elvis Rocha commented on SOLR-6468:
---

I created a filter to remove token gaps

{code:title=RemoveTokenGapsFilterFactory.java|borderStyle=solid}
package filters;

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class RemoveTokenGapsFilterFactory extends TokenFilterFactory {

public RemoveTokenGapsFilterFactory(Map args) {
super(args);
}

@Override
public TokenStream create(TokenStream input) {
RemoveTokenGapsFilter filter = new RemoveTokenGapsFilter(input);
return filter;
}

}

final class RemoveTokenGapsFilter extends TokenFilter {

private final PositionIncrementAttribute posIncrAtt = 
addAttribute(PositionIncrementAttribute.class);

public RemoveTokenGapsFilter(TokenStream input) {
super(input);
}

@Override
public final boolean incrementToken() throws IOException {
while (input.incrementToken()) {
posIncrAtt.setPositionIncrement(1);
return true;
}
return false;
}
}

{code}

{code:title=schema.xml|borderStyle=solid}









{code}

!FieldValue.png!


> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2017-05-19 Thread Elvis Rocha (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017816#comment-16017816
 ] 

Elvis Rocha commented on SOLR-6468:
---

I created a filter to remove gaps between tokens

{code:title=RemoveEmptyTokenFilterFactory.java|borderStyle=solid}
package filter;

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class RemoveEmptyTokenFilterFactory extends TokenFilterFactory {

public RemoveEmptyTokenFilterFactory(Map args) {
super(args);
}

@Override
public TokenStream create(TokenStream input) {
RemoveEmptyTokenFilter filter = new 
RemoveEmptyTokenFilter(input);
return filter;
}

}

final class RemoveEmptyTokenFilter extends TokenFilter {

private final PositionIncrementAttribute posIncrAtt = 
addAttribute(PositionIncrementAttribute.class);

public RemoveEmptyTokenFilter(TokenStream input) {
super(input);
}

@Override
public final boolean incrementToken() throws IOException {
while (input.incrementToken()) {
posIncrAtt.setPositionIncrement(1);
return true;
}
return false;
}
}
{code}



{code:title=schema.xml|borderStyle=solid}









{code}

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2016-11-04 Thread Diego Oliveira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638299#comment-15638299
 ] 

Diego Oliveira commented on SOLR-6468:
--

I read all discussion and can't believe on this decision. I'm having the same 
problem!!! I need to use stopword filter + shingle filter. But when removed the 
stop words I stay with a hole that create a bug for shingle filters... they 
duplicate tokens that cannot be removed by Remove Duplicate Filter due to 
shingle and tokens be in distinct initial positions. I don't believe that the 
community cannot solve this problem enabling this old feature... as the people 
said in here. It is best stay with the 'simplified' version than with the new 
(plus) version. it will until Solr X? Come on!!!

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2016-10-03 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541697#comment-15541697
 ] 

Alexander S. commented on SOLR-6468:


We now can't upgrade to Solr 6 due to this.

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2016-09-22 Thread Roman Chyla (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514785#comment-15514785
 ] 

Roman Chyla commented on SOLR-6468:
---

Ha! :-)
I've found my own comment above, 2 years later I'm facing this situation again, 
I completely forgot (and truth be told: preferred running old solr 4x).

This is how the new solr sees things:

A 350-MHz GBT Survey of 50 Faint Fermi γ ray Sources for Radio Millisecond 
Pulsars

is indexed as
```
null_1
1   :350|350mhz
2   :mhz|syn::mhz
3   :acr::gbt|gbt|syn::gbt|syn::green bank telescope
4   :survey|syn::survey
null_1
6   :50
```

the 1st and 5th position is a gap - so the search for "350-MHz GBT Survey of 50 
Faint" will fail - because 'of' is a stopword and the stop-filter will always 
increment the position (what's the purpose of a stopfilter; if it is leaving 
gaps?)

anyways, the solution with CharFilterFactory cannot work for me, I have to do 
this:
 
 1. search for synonyms (they can contain stopwords)
 2. remove stopwords
 3. search for other synonyms (that don't have stopwords)

I'm afraid the real life is little bit more complex than what it seems; but 
there is a logic to your choices, SOLR devs, I'm afraid I can agree with you. 
People who understand the *why* will make it work again as it *should*. Others 
will happily keep using the 'simplified' version.

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2015-02-13 Thread Okke Klein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319960#comment-14319960
 ] 

Okke Klein commented on SOLR-6468:
--

I also like to see enablePositionIncrements being returned. It is very useful 
for removing stopwords when using shingles in the Suggester. 

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-11-25 Thread Roman Chyla (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225186#comment-14225186
 ] 

Roman Chyla commented on SOLR-6468:
---

I also find this change to be unfortunate. If this is just a developers making 
decisions for users (then it causes problems to users who really know why they 
do need that feature: for phrase search that should ignore stopwords). But if 
the underlying issue is something serious with the indexer not being able to 
work with the position, than it would be even weirder - and actually very bad 
for many users. I don't really understand benefits of this change. Any chance 
to return to the original?

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130090#comment-14130090
 ] 

Alexander S. commented on SOLR-6468:


Just tried to add matchVersion but got this error:
{code}
null:org.apache.solr.common.SolrException: Unable to create core: crm-prod
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:568)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Could not load core 
configuration for core crm-prod
at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554)
... 8 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] 
analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'. Schema file is 
/etc/solr/core2/schema.xml
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)
at 
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at 
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at 
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89)
at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)
... 9 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] 
analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
... 14 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at 
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 15 more
Caused by: org.apache.solr.common.SolrException: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:606)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:376)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:603)
... 22 more
Caused by: java.lang.IllegalArgumentException: Unknown parameters: 
{matchVersion=4.3}
at 
org.apache.lucene.analysis.core.StopFilterFactory.init(StopFilterFactory.java:91)
... 27 more
{code}

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
 

[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130109#comment-14130109
 ] 

Steve Rowe commented on SOLR-6468:
--

luceneMatchVersion is what you want, not matchVersion

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130147#comment-14130147
 ] 

Uwe Schindler commented on SOLR-6468:
-

The parameter is luceneMatchVersion.

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130209#comment-14130209
 ] 

Alexander S. commented on SOLR-6468:


Thanks, it does work with luceneMatchVersion=4.3, isn't this deprecated? Any 
chance to return enablePositionIncrements?

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130227#comment-14130227
 ] 

Steve Rowe commented on SOLR-6468:
--

bq. Any chance to return enablePositionIncrements?

Odds are extremely low that this will happen.

An alternative is to use a char filter (e.g. 
[MappingCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.html]
 or 
[PatternReplaceCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html])
 to remove stuff you don't want; that way the tokenizer won't leave position 
holes.

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-02 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118076#comment-14118076
 ] 

Uwe Schindler commented on SOLR-6468:
-

You have to enable enablePositionIncrements=false and  matchVersion=4.4 on 
the StopFilterFactory, otherwise you get IllegalArgumentException: 
enablePositionIncrements=false is not supported anymore as of Lucene 4.4 as it 
can create broken token streams.

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-02 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118078#comment-14118078
 ] 

Alexander S. commented on SOLR-6468:


Correct, but isn't this behavior deprecated? I mean matchVersion=4.3? I was 
told this could get removed from 5.0 as well.

If I do understand the problem correctly enablePositionIncrements=false could 
generate wrong tokens for those who do not know how to use this option 
correctly? It seems it requires a custom tokenizer and 
solr.PatternTokenizerFactory in my example should work properly. So instead of 
removing the option the problem with wrong tokens could be explained in the 
readme and the option could be kept for those who really needs it. That makes 
more sense to me than simply removing it.

Anyway, is there any chance the option could be restored? My usecase should 
clearly show how useful it might be. And I was trying to google the problem, 
there's a lot of complaints about this, but no solutions.

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org