It is not a bug. XML parsers are required to reject documents with undefined character entities.
Try parsing it as HTML or XHTML. wunder On Apr 4, 2013, at 11:14 AM, eShard wrote: > Yes, that's it exactly. > I crawled a link with these ( ›) in each list item and solr > couldn't handle it threw the xml parse error and the crawler terminated the > job. > > Is this fixable? Or do I have to submit a bug to the tika folks? > > Thanks, >