Amuseme,

   Thanks for the reply. I reviewed the exceptions given on the link and I
am not getting any of those. I have more than 5 million documents crawled
and was able to index 120 K documents to Solr before this exception occurred
for invalid XML character.

I was trying to investigate around this issue and found that there are
previous posts on the same topic where the patch was being applied to
stripNonCharCodepoints(). But that is already part of Nutch 1.6 and I am
still getting the same exception.

My "parser.character.encoding.default" was set to windows-1252 when crawling
all these documents. Could that have let to this exception when indexing?

Any insight on this will be helpful.

Thanks,
Neeraj.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-6-Need-help-with-Indexing-tp4048290p4048391.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to