[ https://issues.apache.org/jira/browse/NUTCH-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1315: ---------------------------------------- Fix Version/s: 1.7 > reduce speculation on but ParseOutputFormat doesn't name output files > correctly? > -------------------------------------------------------------------------------- > > Key: NUTCH-1315 > URL: https://issues.apache.org/jira/browse/NUTCH-1315 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.4 > Environment: ubuntu 64bit, hadoop 1.0.1, 3 Node Cluster, segment size > 1.5M urls > Reporter: Rafael > Labels: hadoop, hdfs > Fix For: 1.7 > > > From time to time the Reducer log contains the following and one tasktracker > gets blacklisted. > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to > create file > /user/test/crawl/segments/20120316065507/parse_text/part-00001/data for > DFSClient_attempt_201203151054_0028_r_000001_1 on client xx.x.xx.xx.10, > because this file is already being created by > DFSClient_attempt_201203151054_0028_r_000001_0 on xx.xx.xx.9 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1404) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1186) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) > at org.apache.hadoop.ipc.Client.call(Client.java:1066) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy2.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy2.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3245) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:182) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) > at > org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1132) > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397) > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354) > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:476) > at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:157) > at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:134) > at org.apache.hadoop.io.MapFile$Writer.<init>(MapFile.java:92) > at > org.apache.nutch.parse.ParseOutputFormat.getRecordWriter(ParseOutputFormat.java:110) > at > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:448) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > I asked the hdfs-user mailing list and i got the following answer: > "Looks like you have reduce speculation turned on, but the > ParseOutputFormat you're using doesn't properly name its output files > distinctly based on the task attempt ID. As a workaround you can > probably turn off speculative execution for reduces, but you should > also probably file a Nutch bug." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira