Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1
Is it possible to share the sequence of steps with data causing this issue On Wed, Nov 5, 2014 at 4:51 PM, Alan Woodward a...@flax.co.uk wrote: Hi Min, Do you have the specific bit of text that caused this exception to be thrown? Alan Woodward www.flax.co.uk On 4 Nov 2014, at 23:15, Min L wrote: Hi All: I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got errors during indexing. I thought LUCENE-5111 has fixed issues with WordDelimitedFilter. The error is as below: Caused by: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, and offsets must not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for field 'description_texts' at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) My schema.xml looks like below: dynamicField name=*_texts stored=true type=text multiValued=true indexed=true storeOffsetsWithPositions=true/ fieldType name=text class=solr.TextField omitNorms=false analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.PatternReplaceFilterFactory pattern= ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.KStemFilterFactory/ /analyzer /fieldType Any help is appreciated. Thanks. Min
Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1
Thanks for the replies. I found out that although we are using solr 4.9.1, the luceneMatchVersion specified in the solrconfig is still 4.7 which doesn't have the bug fix. On Sun, Nov 9, 2014 at 10:24 AM, Anurag Sharma anura...@gmail.com wrote: Is it possible to share the sequence of steps with data causing this issue On Wed, Nov 5, 2014 at 4:51 PM, Alan Woodward a...@flax.co.uk wrote: Hi Min, Do you have the specific bit of text that caused this exception to be thrown? Alan Woodward www.flax.co.uk On 4 Nov 2014, at 23:15, Min L wrote: Hi All: I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got errors during indexing. I thought LUCENE-5111 has fixed issues with WordDelimitedFilter. The error is as below: Caused by: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, and offsets must not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for field 'description_texts' at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) My schema.xml looks like below: dynamicField name=*_texts stored=true type=text multiValued=true indexed=true storeOffsetsWithPositions=true/ fieldType name=text class=solr.TextField omitNorms=false analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.PatternReplaceFilterFactory pattern= ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.KStemFilterFactory/ /analyzer /fieldType Any help is appreciated. Thanks. Min
Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1
Hi Min, Do you have the specific bit of text that caused this exception to be thrown? Alan Woodward www.flax.co.uk On 4 Nov 2014, at 23:15, Min L wrote: Hi All: I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got errors during indexing. I thought LUCENE-5111 has fixed issues with WordDelimitedFilter. The error is as below: Caused by: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, and offsets must not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for field 'description_texts' at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) My schema.xml looks like below: dynamicField name=*_texts stored=true type=text multiValued=true indexed=true storeOffsetsWithPositions=true/ fieldType name=text class=solr.TextField omitNorms=false analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.PatternReplaceFilterFactory pattern= ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords_english.txt ignoreCase=true enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 splitOnNumerics=0 catenateWords=1 / filter class=solr.StemmerOverrideFilterFactory dictionary= stemdict_en.txt / filter class=solr.KStemFilterFactory/ /analyzer /fieldType Any help is appreciated. Thanks. Min