Hi Viksit, It's a known issue now: https://issues.apache.org/jira/browse/NUTCH-1029
Cheers, On Thursday 12 May 2011 22:10:12 Viksit Gaur wrote: > Hi all, > > When trying to run nutch's crawldb reader to get stats for my crawl > database, I get the following error when calling it using hadoop, > > Is this a known issue? > > Thanks, > Viksit > > > sudo -u hdfs hadoop jar /opt/nutch-build/build/nutch-1.2.job > org.apache.nutch.crawl.CrawlDbReader > /crawl/crawl-dir-1305167589/crawldb -stats > 1 > 1/05/12 19:48:08 INFO crawl.CrawlDbReader: CrawlDb statistics start: > /crawl/crawl-dir-1305167589/crawldb > 11/05/12 19:48:08 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the > same. > 11/05/12 19:48:09 INFO mapred.FileInputFormat: Total input paths to process > : 10 11/05/12 19:48:09 INFO mapred.JobClient: Running job: > job_201105120113_0202 11/05/12 19:48:10 INFO mapred.JobClient: map 0% > reduce 0% > 11/05/12 19:48:18 INFO mapred.JobClient: map 10% reduce 0% > 11/05/12 19:48:19 INFO mapred.JobClient: map 20% reduce 0% > 11/05/12 19:48:20 INFO mapred.JobClient: map 30% reduce 0% > 11/05/12 19:48:23 INFO mapred.JobClient: map 40% reduce 0% > 11/05/12 19:48:24 INFO mapred.JobClient: map 50% reduce 0% > 11/05/12 19:48:25 INFO mapred.JobClient: map 60% reduce 0% > 11/05/12 19:48:27 INFO mapred.JobClient: map 70% reduce 0% > 11/05/12 19:48:28 INFO mapred.JobClient: map 80% reduce 0% > 11/05/12 19:48:30 INFO mapred.JobClient: map 90% reduce 0% > 11/05/12 19:48:31 INFO mapred.JobClient: map 100% reduce 0% > 11/05/12 19:52:22 INFO mapred.JobClient: map 100% reduce 3% > 11/05/12 19:52:23 INFO mapred.JobClient: map 100% reduce 10% > 11/05/12 19:52:38 INFO mapred.JobClient: map 100% reduce 13% > 11/05/12 19:52:39 INFO mapred.JobClient: map 100% reduce 20% > 11/05/12 19:52:48 INFO mapred.JobClient: map 100% reduce 30% > 11/05/12 19:53:01 INFO mapred.JobClient: map 100% reduce 33% > 11/05/12 19:53:02 INFO mapred.JobClient: map 100% reduce 40% > 11/05/12 19:53:20 INFO mapred.JobClient: map 100% reduce 43% > 11/05/12 19:53:21 INFO mapred.JobClient: map 100% reduce 50% > 11/05/12 19:53:36 INFO mapred.JobClient: map 100% reduce 53% > 11/05/12 19:53:38 INFO mapred.JobClient: map 100% reduce 60% > 11/05/12 19:53:44 INFO mapred.JobClient: map 100% reduce 63% > 11/05/12 19:53:46 INFO mapred.JobClient: map 100% reduce 70% > 11/05/12 19:53:54 INFO mapred.JobClient: map 100% reduce 73% > 11/05/12 19:53:55 INFO mapred.JobClient: map 100% reduce 80% > 11/05/12 19:53:57 INFO mapred.JobClient: map 100% reduce 90% > 11/05/12 19:54:05 INFO mapred.JobClient: map 100% reduce 100% > 11/05/12 19:54:07 INFO mapred.JobClient: Job complete: > job_201105120113_0202 11/05/12 19:54:07 INFO mapred.JobClient: Counters: > 23 > 11/05/12 19:54:07 INFO mapred.JobClient: Job Counters > 11/05/12 19:54:07 INFO mapred.JobClient: Launched reduce tasks=10 > 11/05/12 19:54:07 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=46180 > 11/05/12 19:54:07 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 11/05/12 19:54:07 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 11/05/12 19:54:07 INFO mapred.JobClient: Launched map tasks=10 > 11/05/12 19:54:07 INFO mapred.JobClient: Data-local map tasks=10 > 11/05/12 19:54:07 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=87373 > 11/05/12 19:54:07 INFO mapred.JobClient: FileSystemCounters > 11/05/12 19:54:07 INFO mapred.JobClient: FILE_BYTES_READ=34517 > 11/05/12 19:54:07 INFO mapred.JobClient: HDFS_BYTES_READ=111602383 > 11/05/12 19:54:07 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1395398 > 11/05/12 19:54:07 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1871 > 11/05/12 19:54:07 INFO mapred.JobClient: Map-Reduce Framework > 11/05/12 19:54:07 INFO mapred.JobClient: Reduce input groups=49 > 11/05/12 19:54:07 INFO mapred.JobClient: Combine output records=219 > 11/05/12 19:54:07 INFO mapred.JobClient: Map input records=808925 > 11/05/12 19:54:07 INFO mapred.JobClient: Reduce shuffle bytes=3161 > 11/05/12 19:54:07 INFO mapred.JobClient: Reduce output records=49 > 11/05/12 19:54:07 INFO mapred.JobClient: Spilled Records=657 > 11/05/12 19:54:07 INFO mapred.JobClient: Map output bytes=42873025 > 11/05/12 19:54:07 INFO mapred.JobClient: Map input bytes=111599813 > 11/05/12 19:54:07 INFO mapred.JobClient: Combine input records=3235700 > 11/05/12 19:54:07 INFO mapred.JobClient: Map output records=3235700 > 11/05/12 19:54:07 INFO mapred.JobClient: SPLIT_RAW_BYTES=1710 > 11/05/12 19:54:07 INFO mapred.JobClient: Reduce input records=219 > Exception in thread "main" java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1465) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1437) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419) > at > org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileO > utputFormat.java:89) at > org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:320 > ) at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: > 39) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm > pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

