Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1

2014-11-09 Thread Anurag Sharma
Is it possible to share the sequence of steps with data causing this issue

On Wed, Nov 5, 2014 at 4:51 PM, Alan Woodward a...@flax.co.uk wrote:

 Hi Min,

 Do you have the specific bit of text that caused this exception to be
 thrown?

 Alan Woodward
 www.flax.co.uk


 On 4 Nov 2014, at 23:15, Min L wrote:

  Hi All:
 
  I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I
 got
  errors during indexing. I thought LUCENE-5111 has fixed issues with
  WordDelimitedFilter. The error is as below:
 
  Caused by: java.lang.IllegalArgumentException: startOffset must be
  non-negative, and endOffset must be = startOffset, and offsets must
  not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for
  field 'description_texts'
at
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630)
at
 org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
at
 org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
at
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
at
 org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
 
 
  My schema.xml looks like below:
 
  dynamicField name=*_texts stored=true type=text multiValued=true
  indexed=true storeOffsetsWithPositions=true/
 
  fieldType name=text class=solr.TextField omitNorms=false
 
   analyzer type=index
 
 charFilter class=solr.HTMLStripCharFilterFactory/
 
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 filter class=solr.LowerCaseFilterFactory/
 
 filter class=solr.StemmerOverrideFilterFactory dictionary=
  stemdict_en.txt /
 
 filter class=solr.PatternReplaceFilterFactory pattern=
  ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 
 filter class=solr.KStemFilterFactory/
 
 filter class=solr.StopFilterFactory
 words=stopwords_english.txt
  ignoreCase=true enablePositionIncrements=true /
 
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1
  splitOnNumerics=0 catenateWords=1 /
 
   /analyzer
 
   analyzer type=query
 
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 filter class=solr.LowerCaseFilterFactory/
 
 filter class=solr.StopFilterFactory
 words=stopwords_english.txt
  ignoreCase=true enablePositionIncrements=true /
 
 filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1
  splitOnNumerics=0 catenateWords=1 /
 
 filter class=solr.StemmerOverrideFilterFactory dictionary=
  stemdict_en.txt /
 
 filter class=solr.KStemFilterFactory/
 
   /analyzer
 
 /fieldType
 
 
  Any help is appreciated.
 
 
  Thanks.
 
  Min




Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1

2014-11-09 Thread Min L
Thanks for the replies. I found out that although we are using solr 4.9.1,
the luceneMatchVersion specified in the solrconfig is still 4.7 which
doesn't have the bug fix.

On Sun, Nov 9, 2014 at 10:24 AM, Anurag Sharma anura...@gmail.com wrote:

 Is it possible to share the sequence of steps with data causing this issue

 On Wed, Nov 5, 2014 at 4:51 PM, Alan Woodward a...@flax.co.uk wrote:

  Hi Min,
 
  Do you have the specific bit of text that caused this exception to be
  thrown?
 
  Alan Woodward
  www.flax.co.uk
 
 
  On 4 Nov 2014, at 23:15, Min L wrote:
 
   Hi All:
  
   I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I
  got
   errors during indexing. I thought LUCENE-5111 has fixed issues with
   WordDelimitedFilter. The error is as below:
  
   Caused by: java.lang.IllegalArgumentException: startOffset must be
   non-negative, and endOffset must be = startOffset, and offsets must
   not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for
   field 'description_texts'
 at
 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630)
 at
 
 org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
 at
 
 org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
 at
 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
 at
 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
 at
  org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
 at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
  
  
   My schema.xml looks like below:
  
   dynamicField name=*_texts stored=true type=text
 multiValued=true
   indexed=true storeOffsetsWithPositions=true/
  
   fieldType name=text class=solr.TextField omitNorms=false
  
analyzer type=index
  
  charFilter class=solr.HTMLStripCharFilterFactory/
  
  tokenizer class=solr.WhitespaceTokenizerFactory/
  
  filter class=solr.LowerCaseFilterFactory/
  
  filter class=solr.StemmerOverrideFilterFactory dictionary=
   stemdict_en.txt /
  
  filter class=solr.PatternReplaceFilterFactory pattern=
   ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
  
  filter class=solr.KStemFilterFactory/
  
  filter class=solr.StopFilterFactory
  words=stopwords_english.txt
   ignoreCase=true enablePositionIncrements=true /
  
  filter class=solr.WordDelimiterFilterFactory
  preserveOriginal=1
   splitOnNumerics=0 catenateWords=1 /
  
/analyzer
  
analyzer type=query
  
  tokenizer class=solr.WhitespaceTokenizerFactory/
  
  filter class=solr.LowerCaseFilterFactory/
  
  filter class=solr.StopFilterFactory
  words=stopwords_english.txt
   ignoreCase=true enablePositionIncrements=true /
  
  filter class=solr.WordDelimiterFilterFactory
  preserveOriginal=1
   splitOnNumerics=0 catenateWords=1 /
  
  filter class=solr.StemmerOverrideFilterFactory dictionary=
   stemdict_en.txt /
  
  filter class=solr.KStemFilterFactory/
  
/analyzer
  
  /fieldType
  
  
   Any help is appreciated.
  
  
   Thanks.
  
   Min
 
 



Re: indexing errors when storeOffsetsWithPositions=true in solr 4.9.1

2014-11-05 Thread Alan Woodward
Hi Min,

Do you have the specific bit of text that caused this exception to be thrown?

Alan Woodward
www.flax.co.uk


On 4 Nov 2014, at 23:15, Min L wrote:

 Hi All:
 
 I am using solr 4.9.1. and trying to use PostingsSolrHighlighter. But I got
 errors during indexing. I thought LUCENE-5111 has fixed issues with
 WordDelimitedFilter. The error is as below:
 
 Caused by: java.lang.IllegalArgumentException: startOffset must be
 non-negative, and endOffset must be = startOffset, and offsets must
 not go backwards startOffset=31,endOffset=44,lastStartOffset=37 for
 field 'description_texts'
   at 
 org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:630)
   at 
 org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
   at 
 org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
   at 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
 
 
 My schema.xml looks like below:
 
 dynamicField name=*_texts stored=true type=text multiValued=true
 indexed=true storeOffsetsWithPositions=true/
 
 fieldType name=text class=solr.TextField omitNorms=false
 
  analyzer type=index
 
charFilter class=solr.HTMLStripCharFilterFactory/
 
tokenizer class=solr.WhitespaceTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
 
filter class=solr.StemmerOverrideFilterFactory dictionary=
 stemdict_en.txt /
 
filter class=solr.PatternReplaceFilterFactory pattern=
 ^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 
filter class=solr.KStemFilterFactory/
 
filter class=solr.StopFilterFactory words=stopwords_english.txt
 ignoreCase=true enablePositionIncrements=true /
 
filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
 splitOnNumerics=0 catenateWords=1 /
 
  /analyzer
 
  analyzer type=query
 
tokenizer class=solr.WhitespaceTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
 
filter class=solr.StopFilterFactory words=stopwords_english.txt
 ignoreCase=true enablePositionIncrements=true /
 
filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
 splitOnNumerics=0 catenateWords=1 /
 
filter class=solr.StemmerOverrideFilterFactory dictionary=
 stemdict_en.txt /
 
filter class=solr.KStemFilterFactory/
 
  /analyzer
 
/fieldType
 
 
 Any help is appreciated.
 
 
 Thanks.
 
 Min