Hi, SOLR gurus we're experiencing a crash with SOLR 4.0 whenever the results contain multibyte characters (more precisely: German umlauts, utf-8 encoded).
The crashes only occur when using ReversedWildcardFilterFactory (which is necessary in 4.0 to be able to have wildcards at the beginning of the search pattern, as far as I understand), *and* the highlighter is on. The stack trace (heavily snipped) looks like this: | 12.09.2012 13:08:12 org.apache.solr.common.SolrException log | SCHWERWIEGEND: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token substantial exceeds length of provided text sized 5107 | at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:517) | at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) | at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136) | at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) | [...] | at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) | at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) | at java.lang.Thread.run(Thread.java:662) | Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token substantial exceeds length of provided text sized 5107 | at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) | at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510) | ... 32 more (excuse the German locale.) Poking around in the sources seems to point (to my untrained eye, that is) to: <https://issues.apache.org/jira/browse/LUCENE-3080> Is this the issue biting us? Any known workarounds? Anything we might try to pin-point the problem resp. to fix the bug? Thanks for any insights, regards -- Tomás Zerolo Axel Springer AG Axel Springer media Systems BILD Produktionssysteme Axel-Springer-Straße 65 10888 Berlin Tel.: +49 (30) 2591-72875 tomas.zer...@axelspringer.de www.axelspringer.de Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998 Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita Vorstand: Dr. Mathias Döpfner (Vorsitzender) Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele