It is not a bug. XML parsers are required to reject documents with undefined 
character entities.

Try parsing it as HTML or XHTML.

wunder

On Apr 4, 2013, at 11:14 AM, eShard wrote:

> Yes, that's it exactly.
> I crawled a link with these ( ›) in each list item and solr
> couldn't handle it threw the xml parse error and the crawler terminated the
> job.
> 
> Is this fixable? Or do I have to submit a bug to the tika folks?
> 
> Thanks,
> 




Reply via email to