Highlighting failure caused by InvalidTokenOffsetsException
-----------------------------------------------------------
Key: SOLR-1883
URL: https://issues.apache.org/jira/browse/SOLR-1883
Project: Solr
Issue Type: Bug
Components: highlighter
Affects Versions: 1.4
Environment: {code:title=java}
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
{code}
{code:title=solr lib manifest}
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 14.1-b02-90 (Apple Inc.)
Extension-Name: org.apache.solr
Specification-Title: Apache Solr Search Server
Specification-Version: 1.4.0
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.solr
Implementation-Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:
33:40
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: 1.5
X-Compile-Target-JDK: 1.5
{code}
{code:title=OS}
Linux myhost 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64
x86_64 GNU/Linux
{code}
Reporter: Luke Forehand
This issue seems to be the same as a previous issue that was bulk closed in
solr 1.4 https://issues.apache.org/jira/browse/SOLR-1404, and I see someone
reported this bug in lucene 2.9.1
https://issues.apache.org/jira/browse/LUCENE-2208 We are experiencing this
issue as well.
I have pasted the important part of our schema.xml and the solr exception. I
have also attached the document that fails when queried as a highlight query.
The invalid token seems to be 'system' which is the very last token in the
document field if you look at the attached file.
{code:title=schema.xml}
<?xml version="1.0" encoding="UTF-8"?>
<schema name="xxx" version="1.1">
<types>
<fieldType name="scrubbedText" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer
class="solr.StandardTokenizerFactory" />
<charFilter
class="solr.HTMLStripCharFilterFactory" />
<filter class="solr.StandardFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopFilterFactory" />
</analyzer>
</fieldType>
...
</types>
<fields>
<field name="id" type="string" stored="true" indexed="true" />
<field name="textScrubbed" type="scrubbedText" stored="true"
indexed="true" />
...
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>textScrubbed</defaultSearchField>
</schema>
{code}
{code:title=solr.log exception}
Apr 13, 2010 3:08:35 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system
exceeds length of provided text sized 17063
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574)
at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token system exceeds length of provided text sized 17063
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
... 18 more
{code}
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira