About the only thing you can do here is to increase the readAheadLimit
on the BufferedReader, but, by the looks of it, that also means we
need to modify the TokenStream Factories that create the
HTMLStripReader so that they take in some optional attributes. If you
can open a JIRA issue for
Was this one ever addressed? I'm seeing it in some small percentage of the
documents that I index in 1.4-dev 708596M. I don't see a corresponding JIRA
issue.
James Brady-3 wrote:
>
> Hi,
> I'm seeing a problem mentioned in Solr-42, Highlighting problems with
> HTMLStripWhitespaceTokenizerFa
Hi,
I'm seeing a problem mentioned in Solr-42, Highlighting problems with
HTMLStripWhitespaceTokenizerFactory:
https://issues.apache.org/jira/browse/SOLR-42
I'm indexing HTML documents, and am getting reams of "Mark invalid"
IOExceptions:
SEVERE: java.io.IOException: Mark invalid
at