You can mix in a combination of Pipeline.run and Pipeline.cleanup calls to control job execution and cleanup. On Sat, Sep 26, 2015 at 1:48 PM Everett Anderson <[email protected]> wrote:
> On Thu, Sep 24, 2015 at 5:46 PM, Josh Wills <[email protected]> wrote: > >> Hrm. If you never call Pipeline.done, you should never cleanup the >> temporary files for the job... >> > > Interesting. > > We're currently exploring giving the datanodes more memory as there's some > evidence they were getting overloaded. > > Right now, our Crunch pipeline is long, with many stages, but not all data > is used in each stage. If our problem is that we're overloading some part > of HDFS (and in other cluster configs we have seen ourselves hit our disk > capacity cap), I wonder if it'd help if we DID somehow prune away temporary > outputs that were no longer necessary. > > > > > > >> >> On Thu, Sep 24, 2015 at 5:44 PM, Everett Anderson <[email protected]> >> wrote: >> >>> While we tried to take comfort in the fact that we'd only seen this only >>> HD-based cc2.8xlarges, I'm afraid we're now seeing it when processing >>> larger amounts of data on SSD-based c3.4x8larges. >>> >>> My two hypotheses are >>> >>> 1) Somehow these temp files are getting cleaned up before they're >>> accessed for the last time. Perhaps either something in HDFS or Hadoop >>> cleans up these temp directories, or perhaps there's a bunch in Crunch's >>> planner. >>> >>> 2) HDFS has chosen 3 machines to replicate data to, but it is performing >>> a very lopsided replication. While the cluster overall looks like it has >>> HDFS capacity, perhaps a small subset of the machines is actually at >>> capacity. Things seem to fail in obscure ways when running out of disk. >>> >>> >>> 2015-09-24 23:28:58,850 WARN [main] org.apache.hadoop.mapred.YarnChild: >>> Exception running child : org.apache.crunch.CrunchRuntimeException: Could >>> not read runtime node information >>> at >>> org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:48) >>> at >>> org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172) >>> at >>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>> Caused by: java.io.FileNotFoundException: File does not exist: >>> /tmp/crunch-2031291770/p567/REDUCE >>> at >>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) >>> at >>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:497) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at >>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >>> at >>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) >>> at >>> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1147) >>> at >>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1135) >>> at >>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1125) >>> at >>> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:273) >>> at >>> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:240) >>> at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:233) >>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296) >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:296) >>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) >>> at org.apache.crunch.util.DistCache.read(DistCache.java:72) >>> at >>> org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:46) >>> ... 9 more >>> Caused by: >>> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File >>> does not exist: /tmp/crunch-2031291770/p567/REDUCE >>> at >>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) >>> at >>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:497) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1410) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1363) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:215) >>> at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) >>> at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:219) >>> at >>> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1145) >>> ... 22 more >>> >>> >>> On Fri, Aug 21, 2015 at 3:52 PM, Jeff Quinn <[email protected]> wrote: >>> >>>> Also worth noting, we inspected the hadoop configuration defaults that >>>> the AWS EMR service populates for the two different instance types, for >>>> mapred-site.xml, core-site.xml, and hdfs-site.xml all settings were >>>> identical, with the exception of slight differences in JVM memory allotted. >>>> Further investigated the max number of file descriptors for each instance >>>> type via ulimit, and saw no differences there either. >>>> >>>> So not sure what the main difference is between these two clusters that >>>> would cause these very different outcomes, other than cc2.8xlarge having >>>> SSDs and c3.8xlarge having spinning disks. >>>> >>>> On Fri, Aug 21, 2015 at 1:03 PM, Everett Anderson <[email protected]> >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> Jeff graciously agreed to try it out. >>>>> >>>>> I'm afraid we're still getting failures on that instance type, though >>>>> with 0.11 with the patches, the cluster ended up in a state that no new >>>>> applications could be submitted afterwards. >>>>> >>>>> The errors when running the pipeline seem to be similarly HDFS >>>>> related. It's quite odd. >>>>> >>>>> Examples when using 0.11 + the patches: >>>>> >>>>> >>>>> 2015-08-20 23:17:50,455 WARN [Thread-38] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source >>>>> file >>>>> "/tmp/crunch-274499863/p504/output/_temporary/1/_temporary/attempt_1440102643297_out0_0107_r_000001_0/out0-r-00001" >>>>> - Aborting... >>>>> >>>>> >>>>> 2015-08-20 22:39:42,184 WARN [Thread-51] >>>>> org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception >>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >>>>> No lease on >>>>> /tmp/crunch-274499863/p510/output/_temporary/1/_temporary/attempt_1440102643297_out12_0103_r_000167_2/out12-r-00167 >>>>> (inode 83784): File does not exist. [Lease. Holder: >>>>> DFSClient_attempt_1440102643297_0103_r_000167_2_964529009_1, >>>>> pendingcreates: 24] >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516) >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.abandonBlock(FSNamesystem.java:3486) >>>>> at >>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.abandonBlock(NameNodeRpcServer.java:687) >>>>> at >>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.abandonBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:467) >>>>> at >>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>> at >>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:635) >>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) >>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) >>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) >>>>> >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1468) >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1399) >>>>> at >>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:241) >>>>> at com.sun.proxy.$Proxy13.abandonBlock(Unknown Source) >>>>> at >>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandonBlock(ClientNamenodeProtocolTranslatorPB.java:376) >>>>> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> at >>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>>>> at >>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>>>> at com.sun.proxy.$Proxy14.abandonBlock(Unknown Source) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1377) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594) >>>>> 2015-08-20 22:39:42,184 WARN [Thread-51] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source >>>>> file >>>>> "/tmp/crunch-274499863/p510/output/_temporary/1/_temporary/attempt_1440102643297_out12_0103_r_000167_2/out12-r-00167" >>>>> - Aborting... >>>>> >>>>> >>>>> >>>>> 2015-08-20 23:34:59,276 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream >>>>> java.io.IOException: Bad connect ack with firstBadLink as >>>>> 10.55.1.103:50010 >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594) >>>>> 2015-08-20 23:34:59,276 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Abandoning >>>>> BP-835517662-10.55.1.32-1440102626965:blk_1073828261_95268 >>>>> 2015-08-20 23:34:59,278 INFO [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Excluding datanode 10.55.1.103:50010 >>>>> 2015-08-20 23:34:59,278 WARN [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception >>>>> java.io.IOException: Unable to create new block. >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1386) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594) >>>>> 2015-08-20 23:34:59,278 WARN [Thread-37] >>>>> org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source >>>>> file >>>>> "/tmp/crunch-274499863/p504/output/_temporary/1/_temporary/attempt_1440102643297_out0_0107_r_000001_2/out0-r-00001" >>>>> - Aborting... >>>>> 2015-08-20 23:34:59,279 WARN [main] >>>>> org.apache.hadoop.mapred.YarnChild: Exception running child : >>>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: Bad connect >>>>> ack with firstBadLink as 10.55.1.103:50010 >>>>> at >>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java:74) >>>>> at >>>>> org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.java:64) >>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) >>>>> at >>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) >>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) >>>>> Caused by: java.io.IOException: Bad connect ack with firstBadLink as >>>>> 10.55.1.103:50010 >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Aug 21, 2015 at 11:59 AM, Josh Wills <[email protected]> >>>>> wrote: >>>>> >>>>>> Curious how this went. :) >>>>>> >>>>>> On Tue, Aug 18, 2015 at 4:26 PM, Everett Anderson <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Sure, let me give it a try. I'm going to take 0.11 and patch it with >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/CRUNCH-553 >>>>>>> https://issues.apache.org/jira/browse/CRUNCH-517 >>>>>>> >>>>>>> as we also rely on 517. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 18, 2015 at 4:09 PM, Josh Wills <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> (In particular, I'm wondering if something in CRUNCH-481 is related >>>>>>>> to this problem.) >>>>>>>> >>>>>>>> On Tue, Aug 18, 2015 at 4:07 PM, Josh Wills <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey Everett, >>>>>>>>> >>>>>>>>> Shot in the dark-- would you mind trying it w/0.11.0-hadoop2 w/the >>>>>>>>> 553 patch? Is that easy to do? >>>>>>>>> >>>>>>>>> J >>>>>>>>> >>>>>>>>> On Tue, Aug 18, 2015 at 3:18 PM, Everett Anderson < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I verified that the pipeline succeeds on the same cc2.8xlarge >>>>>>>>>> hardware when setting crunch.max.running.jobs to 1. I generally >>>>>>>>>> feel like the pipeline application itself logic is sound, at this >>>>>>>>>> point. It >>>>>>>>>> could be that this is just taxing these machines too hard and we >>>>>>>>>> need to >>>>>>>>>> increase the number of retries? >>>>>>>>>> >>>>>>>>>> It reliably fails on this hardware when crunch.max.running.jobs >>>>>>>>>> set to its default. >>>>>>>>>> >>>>>>>>>> Can you explain a little what the /tmp/crunch-XXXXXXX files are >>>>>>>>>> as well as how Crunch uses side effect files? Do you know if HDFS >>>>>>>>>> would >>>>>>>>>> clean up those directories from underneath Crunch? >>>>>>>>>> >>>>>>>>>> There are usually 4 failed applications, failing due to reduces. >>>>>>>>>> The failures seem to be one of the following three kinds -- (1) No >>>>>>>>>> lease on >>>>>>>>>> <side effect file>, (2) File not found </tmp/crunch-XXXXXXX> file, >>>>>>>>>> (3) >>>>>>>>>> SocketTimeoutException. >>>>>>>>>> >>>>>>>>>> Examples: >>>>>>>>>> >>>>>>>>>> [1] No lease exception >>>>>>>>>> >>>>>>>>>> Error: org.apache.crunch.CrunchRuntimeException: >>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >>>>>>>>>> No lease on >>>>>>>>>> /tmp/crunch-4694113/p662/output/_temporary/1/_temporary/attempt_1439917295505_out7_0018_r_000003_1/out7-r-00003: >>>>>>>>>> File does not exist. Holder >>>>>>>>>> DFSClient_attempt_1439917295505_0018_r_000003_1_824053899_1 does not >>>>>>>>>> have >>>>>>>>>> any open files. at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2944) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3008) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2988) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:641) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:484) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java:74) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.java:64) >>>>>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) at >>>>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>>>>>>>>> Caused by: >>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >>>>>>>>>> No lease on >>>>>>>>>> /tmp/crunch-4694113/p662/output/_temporary/1/_temporary/attempt_1439917295505_out7_0018_r_000003_1/out7-r-00003: >>>>>>>>>> File does not exist. Holder >>>>>>>>>> DFSClient_attempt_1439917295505_0018_r_000003_1_824053899_1 does not >>>>>>>>>> have >>>>>>>>>> any open files. at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2944) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3008) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2988) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:641) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:484) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at >>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at >>>>>>>>>> org.apache.hadoop.ipc.Client.call(Client.java:1410) at >>>>>>>>>> org.apache.hadoop.ipc.Client.call(Client.java:1363) at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:215) >>>>>>>>>> at com.sun.proxy.$Proxy13.complete(Unknown Source) at >>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at >>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>>>>> at >>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) at >>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) >>>>>>>>>> at com.sun.proxy.$Proxy13.complete(Unknown Source) at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:404) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2130) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2114) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1289) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:87) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.io.CrunchOutputs$OutputState.close(CrunchOutputs.java:300) >>>>>>>>>> at org.apache.crunch.io.CrunchOutputs.close(CrunchOutputs.java:180) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java:72) >>>>>>>>>> ... 9 more >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [2] File does not exist >>>>>>>>>> >>>>>>>>>> 2015-08-18 17:36:10,195 INFO [AsyncDispatcher event handler] >>>>>>>>>> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: >>>>>>>>>> Diagnostics report from attempt_1439917295505_0034_r_000004_1: >>>>>>>>>> Error: org.apache.crunch.CrunchRuntimeException: Could not read >>>>>>>>>> runtime node information >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:48) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40) >>>>>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:172) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) >>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) >>>>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>>>>>>>>> Caused by: java.io.FileNotFoundException: File does not exist: >>>>>>>>>> /tmp/crunch-4694113/p470/REDUCE >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:497) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:599) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>>>>>> >>>>>>>>>> at >>>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>>>>>>>>> at >>>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>>>>>> at >>>>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>>>>> at >>>>>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1147) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1135) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1125) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:273) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:240) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:233) >>>>>>>>>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1298) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:296) >>>>>>>>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) >>>>>>>>>> at org.apache.crunch.util.DistCache.read(DistCache.java:72) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:46) >>>>>>>>>> ... 9 more >>>>>>>>>> >>>>>>>>>> [3] SocketTimeoutException >>>>>>>>>> >>>>>>>>>> Error: org.apache.crunch.CrunchRuntimeException: >>>>>>>>>> java.net.SocketTimeoutException: 70000 millis timeout while waiting >>>>>>>>>> for channel to be ready for read. ch : >>>>>>>>>> java.nio.channels.SocketChannel[connected local=/10.55.1.229:35720 >>>>>>>>>> remote=/10.55.1.230:9200] at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchTaskContext.cleanup(CrunchTaskContext.java:74) >>>>>>>>>> at >>>>>>>>>> org.apache.crunch.impl.mr.run.CrunchReducer.cleanup(CrunchReducer.java:64) >>>>>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195) at >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656) >>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394) at >>>>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at >>>>>>>>>> java.security.AccessController.doPrivileged(Native Method) at >>>>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>>>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170) >>>>>>>>>> Caused by: java.net.SocketTimeoutException: 70000 millis timeout >>>>>>>>>> while waiting for channel to be ready for read. ch : >>>>>>>>>> java.nio.channels.SocketChannel[connected local=/10.55.1.229:35720 >>>>>>>>>> remote=/10.55.1.230:9200] at >>>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) >>>>>>>>>> at java.io.FilterInputStream.read(FilterInputStream.java:83) at >>>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:83) at >>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1985) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1075) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1042) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1186) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:935) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:491) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 14, 2015 at 3:54 PM, Everett Anderson < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 14, 2015 at 3:26 PM, Josh Wills <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Everett, >>>>>>>>>>>> >>>>>>>>>>>> Initial thought-- there are lots of reasons for lease expired >>>>>>>>>>>> exceptions, and their usually more symptomatic of other problems >>>>>>>>>>>> in the >>>>>>>>>>>> pipeline. Are you sure none of the jobs in the Crunch pipeline on >>>>>>>>>>>> the >>>>>>>>>>>> non-SSD instances are failing for some other reason? I'd be >>>>>>>>>>>> surprised if no >>>>>>>>>>>> other errors showed up in the app master, although there are >>>>>>>>>>>> reports of >>>>>>>>>>>> some weirdness around LeaseExpireds when writing to S3-- but >>>>>>>>>>>> you're not >>>>>>>>>>>> doing that here, right? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We're reading from and writing to HDFS, here. (We've copied in >>>>>>>>>>> input from S3 to HDFS in another step.) >>>>>>>>>>> >>>>>>>>>>> There are a few exceptions in the logs. Most seem related to >>>>>>>>>>> missing temp files. >>>>>>>>>>> >>>>>>>>>>> Let me see if I can reproduce it with crunch.max.running.jobs >>>>>>>>>>> set to 1 to try to narrow down the originating failure. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> J >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 14, 2015 at 2:10 PM, Everett Anderson < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I recently started trying to run our Crunch pipeline on more >>>>>>>>>>>>> data and have been trying out different AWS instance types in >>>>>>>>>>>>> anticipation >>>>>>>>>>>>> of our storage and compute needs. >>>>>>>>>>>>> >>>>>>>>>>>>> I was using EMR 3.8 (so Hadoop 2.4.0) with Crunch 0.12 >>>>>>>>>>>>> (patched with the CRUNCH-553 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CRUNCH-553> fix). >>>>>>>>>>>>> >>>>>>>>>>>>> Our pipeline finishes fine in these cluster configurations: >>>>>>>>>>>>> >>>>>>>>>>>>> - 50 c3.4xlarge Core, 0 Task >>>>>>>>>>>>> - 10 c3.8xlarge Core, 0 Task >>>>>>>>>>>>> - 25 c3.8xlarge Core, 0 Task >>>>>>>>>>>>> >>>>>>>>>>>>> However, it always fails on the same data when using 10 >>>>>>>>>>>>> cc2.8xlarge Core instances. >>>>>>>>>>>>> >>>>>>>>>>>>> The biggest obvious hardware difference is that the >>>>>>>>>>>>> cc2.8xlarges use hard disks instead of SSDs. >>>>>>>>>>>>> >>>>>>>>>>>>> While it's a little hard to track down the exact originating >>>>>>>>>>>>> failure, I think it's from errors like: >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-08-13 21:34:38,379 ERROR [IPC Server handler 24 on 45711] >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: >>>>>>>>>>>>> attempt_1439499407003_0028_r_000153_1 - exited : >>>>>>>>>>>>> org.apache.crunch.CrunchRuntimeException: >>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >>>>>>>>>>>>> No lease on >>>>>>>>>>>>> /tmp/crunch-970849245/p662/output/_temporary/1/_temporary/attempt_1439499407003_out7_0028_r_000153_1/out7-r-00153: >>>>>>>>>>>>> File does not exist. Holder >>>>>>>>>>>>> DFSClient_attempt_1439499407003_0028_r_000153_1_609888542_1 does >>>>>>>>>>>>> not have >>>>>>>>>>>>> any open files. >>>>>>>>>>>>> >>>>>>>>>>>>> Those paths look like these side effect files >>>>>>>>>>>>> <https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> Would Crunch have generated applications that depend on side >>>>>>>>>>>>> effect paths as input across MapReduce applications and something >>>>>>>>>>>>> in HDFS >>>>>>>>>>>>> is cleaning up those paths, unaware of the higher level >>>>>>>>>>>>> dependencies? AWS >>>>>>>>>>>>> configures Hadoop differently for each instance type, and might >>>>>>>>>>>>> have more >>>>>>>>>>>>> aggressive cleanup settings on HDs, though this is very uninformed >>>>>>>>>>>>> hypothesis. >>>>>>>>>>>>> >>>>>>>>>>>>> A sample full log is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for any guidance! >>>>>>>>>>>>> >>>>>>>>>>>>> - Everett >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>>>>>>>> attachments, may contain information that is confidential, >>>>>>>>>>>>> proprietary in >>>>>>>>>>>>> nature, protected health information (PHI), or otherwise >>>>>>>>>>>>> protected by law >>>>>>>>>>>>> from disclosure, and is solely for the use of the intended >>>>>>>>>>>>> recipient(s). If >>>>>>>>>>>>> you are not the intended recipient, you are hereby notified that >>>>>>>>>>>>> any use, >>>>>>>>>>>>> disclosure or copying of this email, including any attachments, is >>>>>>>>>>>>> unauthorized and strictly prohibited. If you have received this >>>>>>>>>>>>> email in >>>>>>>>>>>>> error, please notify the sender of this email. Please delete this >>>>>>>>>>>>> and all >>>>>>>>>>>>> copies of this email from your system. Any opinions either >>>>>>>>>>>>> expressed or >>>>>>>>>>>>> implied in this email and all attachments, are those of its >>>>>>>>>>>>> author only, >>>>>>>>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Director of Data Science >>>>>>>>>>>> Cloudera <http://www.cloudera.com> >>>>>>>>>>>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>>>>> attachments, may contain information that is confidential, >>>>>>>>>> proprietary in >>>>>>>>>> nature, protected health information (PHI), or otherwise protected >>>>>>>>>> by law >>>>>>>>>> from disclosure, and is solely for the use of the intended >>>>>>>>>> recipient(s). If >>>>>>>>>> you are not the intended recipient, you are hereby notified that any >>>>>>>>>> use, >>>>>>>>>> disclosure or copying of this email, including any attachments, is >>>>>>>>>> unauthorized and strictly prohibited. If you have received this >>>>>>>>>> email in >>>>>>>>>> error, please notify the sender of this email. Please delete this >>>>>>>>>> and all >>>>>>>>>> copies of this email from your system. Any opinions either expressed >>>>>>>>>> or >>>>>>>>>> implied in this email and all attachments, are those of its author >>>>>>>>>> only, >>>>>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Director of Data Science >>>>>>>>> Cloudera <http://www.cloudera.com> >>>>>>>>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Director of Data Science >>>>>>>> Cloudera <http://www.cloudera.com> >>>>>>>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> *DISCLAIMER:* The contents of this email, including any >>>>>>> attachments, may contain information that is confidential, proprietary >>>>>>> in >>>>>>> nature, protected health information (PHI), or otherwise protected by >>>>>>> law >>>>>>> from disclosure, and is solely for the use of the intended >>>>>>> recipient(s). If >>>>>>> you are not the intended recipient, you are hereby notified that any >>>>>>> use, >>>>>>> disclosure or copying of this email, including any attachments, is >>>>>>> unauthorized and strictly prohibited. If you have received this email in >>>>>>> error, please notify the sender of this email. Please delete this and >>>>>>> all >>>>>>> copies of this email from your system. Any opinions either expressed or >>>>>>> implied in this email and all attachments, are those of its author only, >>>>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Director of Data Science >>>>>> Cloudera <http://www.cloudera.com> >>>>>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>>>>> >>>>> >>>>> >>>> >>> >>> *DISCLAIMER:* The contents of this email, including any attachments, >>> may contain information that is confidential, proprietary in nature, >>> protected health information (PHI), or otherwise protected by law from >>> disclosure, and is solely for the use of the intended recipient(s). If you >>> are not the intended recipient, you are hereby notified that any use, >>> disclosure or copying of this email, including any attachments, is >>> unauthorized and strictly prohibited. If you have received this email in >>> error, please notify the sender of this email. Please delete this and all >>> copies of this email from your system. Any opinions either expressed or >>> implied in this email and all attachments, are those of its author only, >>> and do not necessarily reflect those of Nuna Health, Inc. >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protected > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not the > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of this > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc.
