Hi, We are using Nutch 1.9 to crawl an internal website, and index the content to Solr 3.5. What we found is that the page title indexed for certain html pages are wrong. For example the "Contact us" page has "Login" as page title in the Solr index. This only happens when we use multiple threads to fetch (fetcher.threads.per.queue=5), while a single thread fetching seems to be ok.
Can someone please point me to the right direction as to how to debug this problem in Nutch? I would like to find out at what stage did the title gets messed up, during fetching, parsing or indexing, but not sure where to start. How can I examine the result of each step for a particular html page? Any suggestions are really appreciated! Alex -- <http://crossview.com/> www.CrossView.com <http://www.crossview.com/> | Follow us: <https://twitter.com/CrossView_Inc> <https://www.youtube.com/user/CrossViewInc1> <https://www.linkedin.com/company/crossview-inc-> <https://plus.google.com/+Crossview> <http://www.crossview.com/blog/> This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.

