Hi Brian, Would be easier to simply generate a job file and the script in bin to run the tasks. Hardcopying the plugins + jars on each machine is not practical. The reason we separated the jars+plugins approach from the job in the runtimes for 1.3 was to avoid possible conflicts.
Julien > I recently downloaded nutch onto my local machine. I wrote a few plugins > for it and successfully crawled a few sites to make sure that my parsers and > indexers worked well. I then moved the nutch installation onto our > pre-existing hadoop cluster by copying the needed libs, confs, and the > build/plugins dir onto every machine in the hadoop cluster, I also adjusted > the nutch-site.xml to point the plugins to the hard coded path where the > plugins sit. The nutch system runs without errors, however it never past a > few pages. It just seems to get stuck only grabbing one page per level and > gets that page on every pass. I have included the interesting files and sys > logs in the attachment for easy viewing. Anyone have any ideas on why it's > not going forward? It also just seems to abort threads, any ideas? > > 2011-06-03 16:20:51,559 WARN org.apache.nutch.parse.ParserFactory: > ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to > contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml > file does not claim to support contentType: application/xhtml+xml > 2011-06-03 16:20:51,629 INFO org.apache.nutch.fetcher.Fetcher: > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19 > 2011-06-03 16:20:51,629 WARN org.apache.nutch.fetcher.Fetcher: Aborting with > 10 hung threads. > > > -- > Brian Griffey > ShopSavvy Android and Big Data Developer > 650-352-1429 > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

