I know that a bug was reported on that version of trunk (the latest) and a 
patch is currently in the works. If that isn't the cause of this, then the fact 
that your getting checksum errors signals your missing part of the file or 
corruption has occurred.


----- Original Message ----
From: Lukas Vlcek <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, January 10, 2007 11:29:30 AM
Subject: nutch-0.9 trunk is failing in Indexer


Hi,

I am using Nutch trunk version (493556) and it is failing in Indexer.

java.io.IOException: Not a file:
/nutch/nutchcrawl/segments/20070110171621/crawl_fetch/part-00000/data
        at org.apache.hadoop.mapred.InputFormatBase.getSplits(
InputFormatBase.java:125)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:93)
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)

Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)

Also I noticed that there were some issues during parsing (which run prior
to indexing). The following is what I got when I allowed finer logging:

Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data to
/bad_files/data.1751375967
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/data error
: Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data
at 0
map 100% reduce 0%
Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
to /bad_files/part-00000.377330604
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_parse/part-00000
error : Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
at 0
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_fetch/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_generate/part-00000
error : 
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000:
No such file or directory

I am not sure if this can cause the IOException described above. Does
anybody know what I did incorrectly?

Regards,
Lukas
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to