Hi all, I recently downloaded nutch onto my local machine. I wrote a few plugins for it and successfully crawled a few sites to make sure that my parsers and indexers worked well. I then moved the nutch installation onto our pre-existing hadoop cluster by copying the needed libs, confs, and the build/plugins dir onto every machine in the hadoop cluster, I also adjusted the nutch-site.xml to point the plugins to the hard coded path where the plugins sit. The nutch system runs without errors, however it never past a few pages. It just seems to get stuck only grabbing one page per level and gets that page on every pass. I have included the interesting files and sys logs in the attachment for easy viewing. Anyone have any ideas on why it's not going forward? It also just seems to abort threads, any ideas?
2011-06-03 16:20:51,559 WARN org.apache.nutch.parse.ParserFactory: ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml file does not claim to support contentType: application/xhtml+xml 2011-06-03 16:20:51,629 INFO org.apache.nutch.fetcher.Fetcher: -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19 2011-06-03 16:20:51,629 WARN org.apache.nutch.fetcher.Fetcher: Aborting with 10 hung threads. -- Brian Griffey ShopSavvy Android and Big Data Developer 650-352-1429

