Thanks for the detailed report. This is a bug. The problem is that the
default is not to permit files to be overwritten. But when a reduce
task re-executes (because something failed) it needs to overwrite data.
My guess is that the cause of the initial failure might have been the
same: that this was not your first attempt to fetch this segment, that
you were overwriting the last attempt. Is that right, or did something
else first cause the reduce task to fail?
I think the fix is to change the filesystem code (local and NDFS) so
that overwriting is permitted by default. With MapReduce, tasks may be
re-executed, so overwriting is normal. Application code should add
error checking code at the start to check that output files do not
already exist if we wish to prevent unintentional overwriting.
If there are no objections, I will make this change in the mapred branch.
Doug
Gal Nitzan wrote:
Hello,
I'm testing mapred on one machine only.
Everything worked fine from the start until I got the exception in the
reduce task:
Diagnostic Text
java.io.IOException: Cannot create file
/user/root/crawl-20050927142856/segments/20050928075732/crawl_fetch/part-00000/data
at org.apache.nutch.ipc.Client.call(Client.java:294) at
org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) at
$Proxy1.create(Unknown Source) at
org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOutputStream(NDFSClient.java:574)
at
org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.(NDFSClient.java:549)
at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:83) at
org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:76) at
org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:71) at
org.apache.nutch.io.SequenceFile$Writer.(SequenceFile.java:94) at
org.apache.nutch.io.MapFile$Writer.(MapFile.java:108) at
org.apache.nutch.io.MapFile$Writer.(MapFile.java:76) at
org.apache.nutch.crawl.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:48)
at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:245) at
org.apache.nutch.mapred.TaskTracker$Child.main(TaskTracker.java:580)
In the jontracker log:
050928 155253 Server connection on port 8011 from 127.0.0.1: exiting
050928 160814 Server connection on port 8011 from 127.0.0.1: starting
050928 160814 parsing file:/mapred/conf/nutch-default.xml
050928 160814 parsing file:/mapred/conf/mapred-default.xml
050928 160814 parsing /nutch/mapred/local/job_s4isvd.xml
050928 160814 parsing file:/mapred/conf/nutch-site.xml
050928 160814 parsing file:/mapred/conf/nutch-default.xml
050928 160815 parsing file:/mapred/conf/mapred-default.xml
050928 160815 parsing /nutch/mapred/local/job_s4isvd.xml
050928 160815 parsing file:/mapred/conf/nutch-site.xml
050928 160815 Adding task 'task_m_ax7n90' to set for tracker
'tracker_41883'
050928 160821 Task 'task_m_ax7n90' has finished successfully.
050928 160821 Adding task 'task_m_vl2bge' to set for tracker
'tracker_41883'
050928 160827 Task 'task_m_vl2bge' has finished successfully.
050928 160827 Adding task 'task_m_i54kht' to set for tracker
'tracker_41883'
050928 160830 Task 'task_m_i54kht' has finished successfully.
050928 160830 Adding task 'task_m_1eymym' to set for tracker
'tracker_41883'
050928 160833 Task 'task_m_1eymym' has finished successfully.
050928 160833 Adding task 'task_r_w9azpi' to set for tracker
'tracker_41883'
050928 160839 Task 'task_r_w9azpi' has finished successfully.
050928 160839 Server connection on port 8011 from 127.0.0.1: exiting
050928 171406 Task 'task_m_klo24y' has finished successfully.
050928 171406 Adding task 'task_r_x48xa3' to set for tracker
'tracker_41883'
050928 171434 Task 'task_r_x48xa3' has been lost.
050928 171434 Adding task 'task_r_x48xa3' to set for tracker
'tracker_41883'
050928 171501 Task 'task_r_x48xa3' has been lost.
050928 171501 Adding task 'task_r_x48xa3' to set for tracker
'tracker_41883'
050928 171520 Task 'task_r_x48xa3' has been lost.
050928 171520 Adding task 'task_r_x48xa3' to set for tracker
'tracker_41883'
050928 171551 Task 'task_r_x48xa3' has been lost.
050928 171551 Task task_r_x48xa3 has failed 4 times. Aborting owning
job job_mtzp7h
050928 171552 Server connection on port 8011 from 127.0.0.1: exiting
In namenode log
050928 171547 Server handler on 8009 call error: java.io.IOException:
Cannot create file
/user/root/crawl-20050927142856/segments/20050928075732/crawl_fetch/part-00000/data
java.io.IOException: Cannot create file
/user/root/crawl-20050927142856/segments/20050928075732/crawl_fetch/part-00000/data
at org.apache.nutch.ndfs.NameNode.create(NameNode.java:98)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
at org.apache.nutch.ipc.Server$Handler.run(Server.java:198)
In fetch log
050928 171526 reduce 47%
050928 171538 reduce 50%
050928 171551 reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:309)
at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:364)
Any idea, anyone?
Thanks, Gal