This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it).
Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basically the offsets overflow. the test stuff in BaseTokenStreamTestCase should be able to detect this as well... On Fri, Jan 3, 2014 at 1:56 PM, Benson Margulies <ben...@basistech.com> wrote: > Using Solr Cloud with 4.3.1. > > We've got a problem with a tokenizer that manifests as calling > OffsetAtt.setOffsets() with invalid inputs. OK, so, we want to figure out > what input provokes our code into getting into this pickle. > > The problem happens on SolrCloud nodes. > > The problem manifests as this sort of thing: > > Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.IllegalArgumentException: startOffset must be > non-negative, and endOffset must be >= startOffset, > startOffset=-1811581632,endOffset=-1811581632 > > How could we get a document ID so that we can tell which document was being > processed?