Hi, I have Nutch running on a Hadoop cluster. Inject, generate, fetch are working fine, they are executed on multiple nodes. We seam to get only one mapper for the parse job and the parse step only runs on one node and it takes a minute or so to parse one page. Please see the log below (1min 41s to parse thetimes).
2013-02-13 13:46:02,658 INFO org.apache.nutch.parse.ParserJob: Parsing http://www.thetimes.co.uk/tto/news/ 2013-02-13 13:47:43,415 INFO org.apache.nutch.parse.ParserJob: Parsing http://online.wsj.com/home-page I am using parse-html plugin to do the job. Cassandra as the DB. When running locally all is fine. Running parse with this: hadoop jar apache-nutch-2.1-SNAPSHOT.job org.apache.nutch.parse.ParserJob $id Also including log from jobtracker Hadoop job_201302131311_0006 onJob Name: parse Job-ACLs: All users are allowed Status: Succeeded Started at: Wed Feb 13 13:44:06 GMT 2013 Finished at: Wed Feb 13 14:06:30 GMT 2013 Finished in: 22mins, 23sec Counter Map Reduce Total ParserStatus success 13 0 13 notparsed 1 0 1 Job Counters SLOTS_MILLIS_MAPS 0 0 1,335,834 Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 Total time spent by all maps waiting after reserving slots (ms) 0 0 0 Launched map tasks 0 0 1 SLOTS_MILLIS_REDUCES 0 0 0 File Output Format Counters Bytes Written 0 0 0 File Input Format Counters Bytes Read 0 0 0 FileSystemCounters HDFS_BYTES_READ 689 0 689 FILE_BYTES_WRITTEN 32,142 0 32,142 Map-Reduce Framework Map input records 138 0 138 Physical memory (bytes) snapshot 417,538,048 0 417,538,048 Spilled Records 0 0 0 Total committed heap usage (bytes) 186,449,920 0 186,449,920 CPU time spent (ms) 1,379,340 0 1,379,340 Virtual memory (bytes) snapshot 1,163,165,696 0 1,163,165,696 SPLIT_RAW_BYTES 689 0 689 Map output records 14 0 14 ________________________________

