Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-07 Thread Koji Sekiguchi

Robin Wojciki wrote:

Koji, I was able to create a minimal replication.

Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
replicate the issue by replacing the conf files in
apache-solr-1.4.0/example/solr/conf and running the class Main. Could
please confirm if this replication is enough.

Also, please let me know if I should log the ticket with Lucene or Solr.

Thanks,
Robin
  


Robin,

I reproduced the problem with your sample data, but it could be 
reproduceable

without HTMLStripCharFilter ... I commented out HTML Strippers
in schema.xml and rebuild indexes with the following data:

add
 doc
   field name=iddebug-1/field
   field name=descriptionhello world WGKEKW AWEHGSE/field
 /doc
/add

still the exception occurred.

Can you check it and open a JIRA issue for Solr?

Thank you!

Koji

--
http://www.rondhuit.com/en/



Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-07 Thread Robin Wojciki
Koji,

In the sample I sent, the exception comes only if the
HTMLStripCharFilter is there.

However, your test case seems to capture the essence. Sorry if I sent
you on a wild goose chase.

Thanks for taking the time! I will log a ticket.
Robin

On Mon, Dec 7, 2009 at 5:09 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 Robin Wojciki wrote:

 Koji, I was able to create a minimal replication.

 Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
 replicate the issue by replacing the conf files in
 apache-solr-1.4.0/example/solr/conf and running the class Main. Could
 please confirm if this replication is enough.

 Also, please let me know if I should log the ticket with Lucene or Solr.

 Thanks,
 Robin


 Robin,

 I reproduced the problem with your sample data, but it could be
 reproduceable
 without HTMLStripCharFilter ... I commented out HTML Strippers
 in schema.xml and rebuild indexes with the following data:

 add
  doc
   field name=iddebug-1/field
   field name=descriptionhello world WGKEKW AWEHGSE/field
  /doc
 /add

 still the exception occurred.

 Can you check it and open a JIRA issue for Solr?

 Thank you!

 Koji

 --
 http://www.rondhuit.com/en/




Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-07 Thread Robin Wojciki
Logged a ticket for Solr: https://issues.apache.org/jira/browse/SOLR-1630

Thanks,
Robin

On Mon, Dec 7, 2009 at 9:36 PM, Robin Wojciki robin.wojc...@gmail.com wrote:
 Koji,

 In the sample I sent, the exception comes only if the
 HTMLStripCharFilter is there.

 However, your test case seems to capture the essence. Sorry if I sent
 you on a wild goose chase.

 Thanks for taking the time! I will log a ticket.
 Robin

 On Mon, Dec 7, 2009 at 5:09 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
 Robin Wojciki wrote:

 Koji, I was able to create a minimal replication.

 Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
 replicate the issue by replacing the conf files in
 apache-solr-1.4.0/example/solr/conf and running the class Main. Could
 please confirm if this replication is enough.

 Also, please let me know if I should log the ticket with Lucene or Solr.

 Thanks,
 Robin


 Robin,

 I reproduced the problem with your sample data, but it could be
 reproduceable
 without HTMLStripCharFilter ... I commented out HTML Strippers
 in schema.xml and rebuild indexes with the following data:

 add
  doc
   field name=iddebug-1/field
   field name=descriptionhello world WGKEKW AWEHGSE/field
  /doc
 /add

 still the exception occurred.

 Can you check it and open a JIRA issue for Solr?

 Thank you!

 Koji

 --
 http://www.rondhuit.com/en/





Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-05 Thread Koji Sekiguchi

Robin Wojciki wrote:

I am running a search in Solr 1.4 and I am getting the
StringIndexOutOfBoundsException pasted below. The spell check field
uses HTMLStripCharFilterFactory. However, the search works fine if I
do not use the HTMLStripCharFilterFactory.

If I set a breakpoint at SpellCheckComponent.java: 248, the value of
the variable best is as shown in the screenshot:
http://yfrog.com/j5solrdebuginspectp

At the end of first iteration, offset = 5 - (24 - 0) = -19
This causes the index out of bounds exception.

The spell check field is defined as:

fieldType name=text_spell class=solr.TextField
positionIncrementGap=100 
analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt
enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType



Stack Trace:
=
String index out of range: -19

java.lang.StringIndexOutOfBoundsException: String index out of range: -19
at java.lang.AbstractStringBuilder.replace(Unknown Source)
at java.lang.StringBuilder.replace(Unknown Source)
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

  

I couldn't reproduce it with simple test data.
Can you open a JIRA and attach a test case that reproduces
the problem with spellchecker definition in solrconfig.xml.

Koji

--
http://www.rondhuit.com/en/



Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

2009-12-04 Thread Robin Wojciki
I am running a search in Solr 1.4 and I am getting the
StringIndexOutOfBoundsException pasted below. The spell check field
uses HTMLStripCharFilterFactory. However, the search works fine if I
do not use the HTMLStripCharFilterFactory.

If I set a breakpoint at SpellCheckComponent.java: 248, the value of
the variable best is as shown in the screenshot:
http://yfrog.com/j5solrdebuginspectp

At the end of first iteration, offset = 5 - (24 - 0) = -19
This causes the index out of bounds exception.

The spell check field is defined as:

fieldType name=text_spell class=solr.TextField
positionIncrementGap=100 
analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt
enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType



Stack Trace:
=
String index out of range: -19

java.lang.StringIndexOutOfBoundsException: String index out of range: -19
at java.lang.AbstractStringBuilder.replace(Unknown Source)
at java.lang.StringBuilder.replace(Unknown Source)
at 
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)