Aborting does not look wrong, it always does it at the end of a fetch cycle.
Do you use the one stop crawl command or step-by-step. In the latter case you have more ability to see where it might fail. We don't get attachments in this mailing list. 2011/6/3 Brian Griffey <[email protected]> > Hi all, > > I recently downloaded nutch onto my local machine. I wrote a few plugins > for it and successfully crawled a few sites to make sure that my parsers and > indexers worked well. I then moved the nutch installation onto our > pre-existing hadoop cluster by copying the needed libs, confs, and the > build/plugins dir onto every machine in the hadoop cluster, I also adjusted > the nutch-site.xml to point the plugins to the hard coded path where the > plugins sit. The nutch system runs without errors, however it never past a > few pages. It just seems to get stuck only grabbing one page per level and > gets that page on every pass. I have included the interesting files and sys > logs in the attachment for easy viewing. Anyone have any ideas on why it's > not going forward? It also just seems to abort threads, any ideas? > > 2011-06-03 16:20:51,559 WARN org.apache.nutch.parse.ParserFactory: > ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to > contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml > file does not claim to support contentType: application/xhtml+xml > 2011-06-03 16:20:51,629 INFO org.apache.nutch.fetcher.Fetcher: > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19 > 2011-06-03 16:20:51,629 WARN org.apache.nutch.fetcher.Fetcher: Aborting with > 10 hung threads. > > > -- > Brian Griffey > ShopSavvy Android and Big Data Developer > 650-352-1429 > > -- -MilleBii-

