Hi, SOLR gurus

we're experiencing a crash with SOLR 4.0 whenever the results contain
multibyte characters (more precisely: German umlauts, utf-8 encoded).

The crashes only occur when using ReversedWildcardFilterFactory (which
is necessary in 4.0 to be able to have wildcards at the beginning of
the search pattern, as far as I understand), *and* the highlighter is
on. The stack trace (heavily snipped) looks like this:

 | 12.09.2012 13:08:12 org.apache.solr.common.SolrException log
 | SCHWERWIEGEND: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 
substantial exceeds length of provided text sized 5107
 |         at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:517)
 |         at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
 |         at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
 |         at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
 | [...]
 |         at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
 |         at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
 |         at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
 |         at java.lang.Thread.run(Thread.java:662)
 | Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: 
Token substantial exceeds length of provided text sized 5107
 |         at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
 |         at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
 |         ... 32 more

(excuse the German locale.) 

Poking around in the sources seems to point (to my untrained eye, that
is) to:

  <https://issues.apache.org/jira/browse/LUCENE-3080>

Is this the issue biting us? Any known workarounds? Anything
we might try to pin-point the problem resp. to fix the bug?

Thanks for any insights, regards
-- 
Tomás Zerolo
Axel Springer AG
Axel Springer media Systems
BILD Produktionssysteme
Axel-Springer-Straße 65
10888 Berlin
Tel.: +49 (30) 2591-72875
tomas.zer...@axelspringer.de
www.axelspringer.de

Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998
Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita
Vorstand: Dr. Mathias Döpfner (Vorsitzender)
Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele

Reply via email to