Re: detailed Error reporting in Solr

2013-04-05 Thread Walter Underwood
It is not a bug. XML parsers are required to reject documents with undefined character entities. Try parsing it as HTML or XHTML. wunder On Apr 4, 2013, at 11:14 AM, eShard wrote: Yes, that's it exactly. I crawled a link with these (nbsp;rsaquo;) in each list item and solr couldn't handle

detailed Error reporting in Solr

2013-04-04 Thread eShard
Cyber Systems and Technologynbsp;rsaquo; /mission/CST/CST.html /li My question is two fold: 1) how do I get solr to report more detailed errors and 2) how do I get tika to accept (or ignore) nbsp? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
ok, one possible fix is to add the xml equivalent to nbsp with is: ?xml version=1.0? !DOCTYPE some_name [ lt;!ENTITY nbsp quot;amp;#160;quot; ] but how do I add this into the tika configuration? -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
I'm trying to understand the context is here... are you trying to crawl web pages that have bad HTML? Or, ... what? -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 10:23 AM To: solr-user@lucene.apache.org Subject: detailed Error reporting in Solr

Re: detailed Error reporting in Solr

2013-04-04 Thread eShard
://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: detailed Error reporting in Solr

2013-04-04 Thread Jack Krupansky
simply be ignored. Yes, by all means ask on the Tika list. Solr is just wrapping the error Tika reports. -- Jack Krupansky -Original Message- From: eShard Sent: Thursday, April 04, 2013 2:14 PM To: solr-user@lucene.apache.org Subject: Re: detailed Error reporting in Solr Yes