We're just now moving from a nutch .9 installation to 1.0, so I'm not entirely new to this. However, I can't even get past the first fetch now, due to a hadoop error.
Looking in the mailing list archives, normally this error is caused from either permissions or a full disk. I overrode the use of /tmp by setting hadoop.tmp.dir to a place with plenty of space, and I'm running the crawl as root, yet I'm still getting the error below. Any thoughts? Running on AIX with plenty of disk and RAM. 2010-04-16 12:49:51,972 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 2010-04-16 12:49:52,267 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 2010-04-16 12:49:52,268 INFO fetcher.Fetcher - -activeThreads=0, 2010-04-16 12:49:52,270 WARN mapred.LocalJobRunner - job_local_0005 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_local_0005/attempt_local_0005_m_000000_0/output/spill0.out at org.apache.hadoop.fs.LocalDirAllocator $AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite (LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite (MapOutputFile.java:107) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill (MapTask.java:930) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush (MapTask.java:842) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job.run (LocalJobRunner.java:138)