Re: IOException: Mark invalid while analyzing HTML

2008-12-06 Thread Grant Ingersoll
About the only thing you can do here is to increase the readAheadLimit on the BufferedReader, but, by the looks of it, that also means we need to modify the TokenStream Factories that create the HTMLStripReader so that they take in some optional attributes. If you can open a JIRA issue for

Re: IOException: Mark invalid while analyzing HTML

2008-12-05 Thread Dean Thompson
Was this one ever addressed? I'm seeing it in some small percentage of the documents that I index in 1.4-dev 708596M. I don't see a corresponding JIRA issue. James Brady-3 wrote: > > Hi, > I'm seeing a problem mentioned in Solr-42, Highlighting problems with > HTMLStripWhitespaceTokenizerFa

IOException: Mark invalid while analyzing HTML

2008-05-04 Thread James Brady
Hi, I'm seeing a problem mentioned in Solr-42, Highlighting problems with HTMLStripWhitespaceTokenizerFactory: https://issues.apache.org/jira/browse/SOLR-42 I'm indexing HTML documents, and am getting reams of "Mark invalid" IOExceptions: SEVERE: java.io.IOException: Mark invalid at