Amuseme, Thanks for the reply. I reviewed the exceptions given on the link and I am not getting any of those. I have more than 5 million documents crawled and was able to index 120 K documents to Solr before this exception occurred for invalid XML character.
I was trying to investigate around this issue and found that there are previous posts on the same topic where the patch was being applied to stripNonCharCodepoints(). But that is already part of Nutch 1.6 and I am still getting the same exception. My "parser.character.encoding.default" was set to windows-1252 when crawling all these documents. Could that have let to this exception when indexing? Any insight on this will be helpful. Thanks, Neeraj. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-6-Need-help-with-Indexing-tp4048290p4048391.html Sent from the Nutch - User mailing list archive at Nabble.com.

