[jira] [Updated] (MAPREDUCE-7076) TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build
[ https://issues.apache.org/jira/browse/MAPREDUCE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-7076: -- Description: TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build. {noformat} java.lang.AssertionError: create_write should create the file at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) {noformat} Below is my analysis for why it didn't create the file. {code:java|title=NNBench.java|borderStyle=solid} // Some comments here public void map(Text key, LongWritable value, OutputCollectoroutput, Reporter reporter) throws IOException { if (barrier()) { String fileName = "file_" + value; if (op.equals(OP_CREATE_WRITE)) { startTimeTPmS = System.currentTimeMillis(); doCreateWriteOp(fileName, reporter); } ... } else { output.collect(new Text("l:latemaps"), new Text("1")); } // Below are the relevant parts of barrier() method private boolean barrier() { .. // If the sleep time is greater than 0, then sleep and return ... LOG.info("Waiting in barrier for: " + sleepTime + " ms"); return retVal; } // Below are the relevant parts of the doCreateWriteOp private void doCreateWriteOp(String name, Reporter reporter) { FSDataOutputStream out; byte[] buffer = new byte[bytesToWrite]; for (long l = 0l; l < numberOfFiles; l++) { Path filePath = new Path(new Path(baseDir, dataDirName), name + "_" + l); } } {code} This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task starts before the time mentioned by {{startTime}}. Refer the chunk which I pasted above. {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true it will call {{doCreateWriteOp}} which will eventually create the file. In test case, the delay value is 3 seconds as per {{"-startTime", "" + (Time.now() / 1000 + 3)}} In this failing test case, I can see the task starting minimum 6 seconds after the test case started. {noformat} 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1711545156_0001 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ {noformat} Also when I run this test on my laptop, I see the following line being printed. {noformat} 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms {noformat} This line will be printed only in {{barrier()}} method and I don't see this line in the logs of failed test. In our environment, the jenkins server was very slow and it took more than 6 seconds to launch a map task. The correct fix in my opinion would be to return true in case there is no sleep in {{barrier() method}}. Only in exception, it should return false. was: TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build. {noformat} java.lang.AssertionError: create_write should create the file at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) {noformat} Below is my analysis for why it didn't create the file. {code:title=NNBench.java|borderStyle=solid} // Some comments here public void map(Text key, LongWritable value, OutputCollector output, Reporter reporter) throws IOException { if (barrier()) { String fileName = "file_" + value; if (op.equals(OP_CREATE_WRITE)) { startTimeTPmS = System.currentTimeMillis(); doCreateWriteOp(fileName, reporter); } ... } else { output.collect(new Text("l:latemaps"), new Text("1")); } // Below are the relevant parts of barrier() method private boolean barrier() { .. // If the sleep time is greater than 0, then sleep and return ... LOG.info("Waiting in barrier for: " + sleepTime + " ms"); return retVal; } // Below are the relevant parts of the doCreateWriteOp private void doCreateWriteOp(String name, Reporter reporter) { FSDataOutputStream out; byte[] buffer = new byte[bytesToWrite]; for (long l = 0l; l < numberOfFiles; l++) { Path filePath = new Path(new Path(baseDir, dataDirName), name + "_" + l); } } {code} This file {{BASE_DIR/data/file_0_0}} is getting created
[jira] [Updated] (MAPREDUCE-7076) TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build
[ https://issues.apache.org/jira/browse/MAPREDUCE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-7076: -- Labels: newbie (was: ) > TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build > > > Key: MAPREDUCE-7076 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Rushabh S Shah >Priority: Minor > Labels: newbie > > TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our > internal jenkins build. > {noformat} > java.lang.AssertionError: create_write should create the file > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) > {noformat} > Below is my analysis for why it didn't create the file. > {code:title=NNBench.java|borderStyle=solid} > // Some comments here > public void map(Text key, > LongWritable value, > OutputCollectoroutput, > Reporter reporter) throws IOException { > if (barrier()) { > String fileName = "file_" + value; > if (op.equals(OP_CREATE_WRITE)) { > startTimeTPmS = System.currentTimeMillis(); > doCreateWriteOp(fileName, reporter); > } ... > } else { > output.collect(new Text("l:latemaps"), new Text("1")); > } > // Below are the relevant parts of barrier() method > private boolean barrier() { > .. > // If the sleep time is greater than 0, then sleep and return > ... > LOG.info("Waiting in barrier for: " + sleepTime + " ms"); > return retVal; > } > // Below are the relevant parts of the doCreateWriteOp > private void doCreateWriteOp(String name, > Reporter reporter) { > FSDataOutputStream out; > byte[] buffer = new byte[bytesToWrite]; > for (long l = 0l; l < numberOfFiles; l++) { > Path filePath = new Path(new Path(baseDir, dataDirName), > name + "_" + l); > } > > } > {code} > This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task > starts before the time mentioned by {{startTime}}. > Refer the chunk which I pasted above. > {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true > it will call {{doCreateWriteOp}} which will eventually create the file. > In test case, the delay value is 3 seconds as per {{"-startTime", "" + > (Time.now() / 1000 + 3)}} > In this failing test case, I can see the task starting minimum 6 seconds > after the test case started. > {noformat} > 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter > (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: > job_local1711545156_0001 > 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job > (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ > {noformat} > Also when I run this test on my laptop, I see the following line being > printed. > {noformat} > 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] > hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms > {noformat} > This line will be printed only in {{barrier()}} method and I don't see this > line in the logs of failed test. > In our environment, the jenkins server was very slow and it took more than 6 > seconds to launch a map task. > The correct fix in my opinion would be to return true in case there is no > sleep in {{barrier() method}}. Only in exception, it will return false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7076) TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build
Rushabh S Shah created MAPREDUCE-7076: - Summary: TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build Key: MAPREDUCE-7076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.8.0 Reporter: Rushabh S Shah TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build. {noformat} java.lang.AssertionError: create_write should create the file at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) {noformat} Below is my analysis for why it didn't create the file. {code:title=NNBench.java|borderStyle=solid} // Some comments here public void map(Text key, LongWritable value, OutputCollectoroutput, Reporter reporter) throws IOException { if (barrier()) { String fileName = "file_" + value; if (op.equals(OP_CREATE_WRITE)) { startTimeTPmS = System.currentTimeMillis(); doCreateWriteOp(fileName, reporter); } ... } else { output.collect(new Text("l:latemaps"), new Text("1")); } // Below are the relevant parts of barrier() method private boolean barrier() { .. // If the sleep time is greater than 0, then sleep and return ... LOG.info("Waiting in barrier for: " + sleepTime + " ms"); return retVal; } // Below are the relevant parts of the doCreateWriteOp private void doCreateWriteOp(String name, Reporter reporter) { FSDataOutputStream out; byte[] buffer = new byte[bytesToWrite]; for (long l = 0l; l < numberOfFiles; l++) { Path filePath = new Path(new Path(baseDir, dataDirName), name + "_" + l); } } {code} This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task starts before the time mentioned by {{startTime}}. Refer the chunk which I pasted above. {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true it will call {{doCreateWriteOp}} which will eventually create the file. In test case, the delay value is 3 seconds as per {{"-startTime", "" + (Time.now() / 1000 + 3)}} In this failing test case, I can see the task starting minimum 6 seconds after the test case started. {noformat} 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1711545156_0001 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ {noformat} Also when I run this test on my laptop, I see the following line being printed. {noformat} 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms {noformat} This line will be printed only in {{barrier()}} method and I don't see this line in the logs of failed test. In our environment, the jenkins server was very slow and it took more than 6 seconds to launch a map task. The correct fix in my opinion would be to return true in case there is no sleep in {{barrier() method}}. Only in exception, it will return false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376952#comment-16376952 ] Rushabh S Shah commented on MAPREDUCE-7059: --- In future, we can add support for all servers to find out what their current version is and then we can make a decision based on that. Like we can add support in FsServerDefaults which will return the version number that server is running with. > Compatibility issue: job submission fails with RpcNoSuchMethodException when > submitting to 2.x cluster > -- > > Key: MAPREDUCE-7059 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: job submission >Affects Versions: 3.0.0 >Reporter: Jiandan Yang >Priority: Minor > > Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8. > {code:java} > bin/hadoop jar > share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar teragen > 10 /teragen > {code} > The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy. > one solution is parsing RemoteException in > JobResourceUploader#disableErasure like this: > {code:java} > private void disableErasureCodingForPath(FileSystem fs, Path path) > throws IOException { > try { > if (jtFs instanceof DistributedFileSystem) { > LOG.info("Disabling Erasure Coding for path: " + path); > DistributedFileSystem dfs = (DistributedFileSystem) jtFs; > dfs.setErasureCodingPolicy(path, > SystemErasureCodingPolicies.getReplicationPolicy().getName()); > } > } catch (RemoteException e) { > if (!(e.getCause() instanceof RpcNoSuchMethodException)) { > throw e; > } > } > } > {code} > Does anyone have better solution? > The detailed exception trace is: > {code:java} > 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): > Unknown method setErasureCodingPolicy called on > org.apache.hadoop.hdfs.protocol.ClientProtocol protocol. > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678) > at > org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665) > at > org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2662) > at >
[jira] [Created] (MAPREDUCE-6996) FileInputFormat#getBlockIndex should include file name in the exception.
Rushabh S Shah created MAPREDUCE-6996: - Summary: FileInputFormat#getBlockIndex should include file name in the exception. Key: MAPREDUCE-6996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Priority: Minor {code:title=FileInputFormat..java|borderStyle=solid} // Some comments here protected int getBlockIndex(BlockLocation[] blkLocations, long offset) { { ... ... BlockLocation last = blkLocations[blkLocations.length -1]; long fileLength = last.getOffset() + last.getLength() -1; throw new IllegalArgumentException("Offset " + offset + " is outside of file (0.." + fileLength + ")"); } {code} When the file is open for writing, the {{last.getLength()}} and {{last.getOffset()}} will be zero and we see the following exception stack trace. {noformat} org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288) Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file (0..-1) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) ... 18 more {noformat} Its difficult to debug which file was open. So creating this ticket to include the filename in the exception. Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the signature of that method and add file name to arguments. The only way I can think to fix this is: {code:title=FileInputFormat..java|borderStyle=solid} public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException { { ... ... for (FileStatus file: files) { Path path = file.getPath(); long length = file.getLen(); if (length != 0) { FileSystem fs = path.getFileSystem(job); BlockLocation[] blkLocations; if (file instanceof LocatedFileStatus) { blkLocations = ((LocatedFileStatus) file).getBlockLocations(); } else { blkLocations = fs.getFileBlockLocations(file, 0, length); } if (isSplitable(fs, path)) { long blockSize = file.getBlockSize(); long splitSize = computeSplitSize(goalSize, minSize, blockSize); long bytesRemaining = length; while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length-bytesRemaining, splitSize, clusterMap); splits.add(makeSplit(path, length-bytesRemaining, splitSize, splitHosts[0], splitHosts[1])); bytesRemaining -= splitSize; } if (bytesRemaining != 0) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length - bytesRemaining, bytesRemaining, clusterMap); splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining, splitHosts[0], splitHosts[1])); } } else { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap); splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1])); } } else { //Create empty hosts array for zero length files splits.add(makeSplit(path, 0, length, new String[0])); } } {code} Have a try-catch block around the above code chunk and catch {{IllegalArgumentException}} and check for message {{Offset 0 is outside of file (0..-1)}}. If yes, add the file name and rethrow {{IllegalArgumentException}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168504#comment-16168504 ] Rushabh S Shah commented on MAPREDUCE-6958: --- +1 ltgm non-binding. > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168013#comment-16168013 ] Rushabh S Shah edited comment on MAPREDUCE-6958 at 9/15/17 3:19 PM: Overall the patch looks good. Just one very minor nit. {noformat} for (String mapId : mapIds) { sb.append(" "); sb.append(mapId); } {noformat} Instead of going through {{mapId}}, we can just print {{mapIds}}. It will be just less code. was (Author: shahrs87): Overall the patch looks good. Just one very minor nit. {quote} +for (String mapId : mapIds) { + sb.append(" "); + sb.append(mapId); +} {quote} Instead of going through {{mapId}}, we can just print {{mapIds}}. It will be just less code. > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: MAPREDUCE-6958.001.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168013#comment-16168013 ] Rushabh S Shah commented on MAPREDUCE-6958: --- Overall the patch looks good. Just one very minor nit. {quote} +for (String mapId : mapIds) { + sb.append(" "); + sb.append(mapId); +} {quote} Instead of going through {{mapId}}, we can just print {{mapIds}}. It will be just less code. > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: MAPREDUCE-6958.001.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6938) Question
[ https://issues.apache.org/jira/browse/MAPREDUCE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah resolved MAPREDUCE-6938. --- Resolution: Invalid [~remil] This jira board is for bug/improvement/feature tracking system not for asking some random questions/programs. Please send an email to {{gene...@hadoop.apache.org}} or {{u...@hadoop.apache.org}} and hope someone would reply. Thanks, Rushabh Shah. > Question > > > Key: MAPREDUCE-6938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6938 > Project: Hadoop Map/Reduce > Issue Type: Task >Reporter: Remil >Priority: Minor > > I need 2 helps. > 1) need a Java map reducer sample program where multiple parameters > are passed from mapper to reducer. > 2) need a Java map reducer program where there is a write to a file inside > hdfs filesystem as well as a read from a file inside hdfs other than > the normal input file and output file mentioned in the mapper and reducer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-6633: -- Target Version/s: 3.0.0, 2.8.0, 2.7.3 > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235139#comment-15235139 ] Rushabh S Shah commented on MAPREDUCE-6633: --- [~eepayne]: Thanks for the reviews and committing. Does it make sense to fix it in 2.7 branch also ? > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229066#comment-15229066 ] Rushabh S Shah commented on MAPREDUCE-6633: --- Ran the failed junit failure on bith jdk7 and jdk8. Both of them passed fine on my machine. {noformat} Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.54 sec <<< FAILURE! - in org.apache.hadoop.mapreduce.tools.TestCLI testGetJob(org.apache.hadoop.mapreduce.tools.TestCLI) Time elapsed: 0.084 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.mapreduce.tools.TestCLI.testGetJob(TestCLI.java:181) {noformat} > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229032#comment-15229032 ] Rushabh S Shah commented on MAPREDUCE-6633: --- bq. If there is a runtime exception on the reducer (memory error, NPE, etc.), maps would be re-run unnecessarily. In this case the decompressor threw RuntimeException (ArrayIndexOutOfBondsException is a subclass). If we had re run the map on another node, the job would have succeeded. bq. I am a little nervous about re-fetching for any exception. I understand your concern but I think its a good change according to me. > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-6633: -- Status: Patch Available (was: In Progress) In Fetcher#copyMapOutput method, I added Exception to catch block so that it will retry on any compression related Exception. {noformat} try { // Go! LOG.info("fetcher#" + id + " about to shuffle output of map " + mapOutput.getMapId() + " decomp: " + decompressedLength + " len: " + compressedLength + " to " + mapOutput.getDescription()); mapOutput.shuffle(host, is, compressedLength, decompressedLength, metrics, reporter); } catch (java.lang.InternalError e) { LOG.warn("Failed to shuffle for fetcher#"+id, e); throw new IOException(e); } {noformat} > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-6633: -- Attachment: MAPREDUCE-6633.patch > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: MAPREDUCE-6633.patch > > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-6633 started by Rushabh S Shah. - > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > {noformat} > In this case, the node on which the map task ran had a bad drive. > If the AM had retried running that map task somewhere else, the job > definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
Rushabh S Shah created MAPREDUCE-6633: - Summary: AM should retry map attempts if the reduce task encounters commpression related errors. Key: MAPREDUCE-6633 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.2 Reporter: Rushabh S Shah Assignee: Rushabh S Shah When reduce task encounters compression related errors, AM doesn't retry the corresponding map task. In one of the case we encountered, here is the stack trace. {noformat} 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ArrayIndexOutOfBoundsException at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) {noformat} In this case, the node on which the map task ran had a bad drive. If the AM had retried running that map task somewhere else, the jib definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-6633: -- Description: When reduce task encounters compression related errors, AM doesn't retry the corresponding map task. In one of the case we encountered, here is the stack trace. {noformat} 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ArrayIndexOutOfBoundsException at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) {noformat} In this case, the node on which the map task ran had a bad drive. If the AM had retried running that map task somewhere else, the job definitely would have succeeded. was: When reduce task encounters compression related errors, AM doesn't retry the corresponding map task. In one of the case we encountered, here is the stack trace. {noformat} 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ArrayIndexOutOfBoundsException at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) {noformat} In this case, the node on which the map task ran had a bad drive. If the AM had retried running that map task somewhere else, the jib definitely would have succeeded. > AM should retry map attempts if the reduce task encounters commpression > related errors. > --- > > Key: MAPREDUCE-6633 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > When reduce task encounters compression related errors, AM doesn't retry the > corresponding map task. > In one of the case we encountered, here is the stack trace. > {noformat} > 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#29 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at
[jira] [Commented] (MAPREDUCE-5948) org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well
[ https://issues.apache.org/jira/browse/MAPREDUCE-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569101#comment-14569101 ] Rushabh S Shah commented on MAPREDUCE-5948: --- Sorry this fell off my radar too. I don't have enough cycles to work on this right now. We can move this to next release. Or if someone is interested to work on this, I am more than happy to let him/her take. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well -- Key: MAPREDUCE-5948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.23.9, 2.2.0 Environment: CDH3U2 Redhat linux 5.7 Reporter: Kris Geusebroek Assignee: Rushabh S Shah Priority: Critical Attachments: HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch, HADOOP-9867.patch Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input. This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003526#comment-14003526 ] Rushabh S Shah commented on MAPREDUCE-5309: --- Thanks Jason for reviewing and committing the patch. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Fix For: 3.0.0, 2.5.0 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing {
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Open (was: Patch Available) Current patch has a typo in one of the log statements. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309-v5.patch Initially EventReader#reader was initialized like: this.reader = new SpecificDatumReader(schema, schema); This assumed the reader schema and writer schema is the same. But when the schema was upgraded from 2.0.3 to 2.0.4, new fields were added in 2.0.4 which were not present in 2.0.3. When the parser tried to parse 2.0.3 logs (which doesn't have the new fields), the parser returned with errors. So basically we need to differentiate between the new schema and the schema of the input jhist files and avro will do the rest of the mapping by field name. For the fields that were recently added, we need to assign the default values. So in case if we are parsing the old schema jhist files, it will assign the default value. [~vinodkv]: I hope this helps. [~viraj]: Yes, this patch will parse both 0.23.x and 2.4.x logs. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new
[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998768#comment-13998768 ] Rushabh S Shah commented on MAPREDUCE-5309: --- Hey Viraj, This is not a stable fix. This will not parse the history files that are generated since 2.0.4. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 =
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309-v2.patch 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 = new
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) This patch will be parsing all the logs. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist);
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Open (was: Patch Available) Broke a couple of test cases. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist);
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309-v3.patch 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 =
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) Attaching a new patch correcting the previous test failures. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Open (was: Patch Available) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 =
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) Jason, Thanks for reviewing my patch. Submitting a new patch incorporating all of the comments. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309-v4.patch 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist);
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Patch Available (was: Open) Changed the order of counters field in Events.avpr 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Status: Open (was: Patch Available) This fix the history files that were generated before 2.4.0 but breaks the history files that are generated since 2.4.0. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 =
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Description: When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 = new JobHistoryParser(FileSystem.getLocal(new Configuration()), local_path2); try { JobInfo ji2 = parser2.parse(); System.out.println( job info: + ji2.getJobname() + + ji2.getFinishedMaps() + + ji2.getTotalMaps() + + ji2.getJobId() ) ; } catch (IOException e) { throw new YarnException(Could not load history file + local_path2.getName(), e); } } } This seems to stem from the fix in https://issues.apache.org/jira/browse/MAPREDUCE-4693 that added counters to the historyserver for failed tasks. This breaks backward compatibility with JobHistoryServer. was: When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the
[jira] [Assigned] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah reassigned MAPREDUCE-5309: - Assignee: Rushabh S Shah 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 = new JobHistoryParser(FileSystem.getLocal(new Configuration()), local_path2);
[jira] [Updated] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5309: -- Attachment: MAPREDUCE-5309.patch 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Attachments: MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Test code and the job history file are attached. Test code: package com.twitter.somepackagel; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser; import org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.JobInfo; import org.junit.Test; import org.apache.hadoop.yarn.YarnException; public class Test20JobHistoryParsing { @Test public void testFileAvro() throws IOException { Path local_path2 = new Path(/tmp/job_2_0_3-KILLED.jhist); JobHistoryParser parser2 = new JobHistoryParser(FileSystem.getLocal(new
[jira] [Updated] (MAPREDUCE-4766) in diagnostics task ids and task attempt ids should become clickable links
[ https://issues.apache.org/jira/browse/MAPREDUCE-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-4766: -- Target Version/s: 3.0.0 (was: 3.0.0, 0.23.11) Talked with [~revans2] and removing the target version 0.23.11 in diagnostics task ids and task attempt ids should become clickable links -- Key: MAPREDUCE-4766 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4766 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans It would be great if when we see a task id or a task attempt id in the diagnostics that we change it to be a clickable link. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-4901) JobHistoryEventHandler errors should be fatal
[ https://issues.apache.org/jira/browse/MAPREDUCE-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-4901: -- Target Version/s: 3.0.0 (was: 3.0.0, 0.23.11) Talked to [~revans2] offline and removing 0.23.11 from target version. JobHistoryEventHandler errors should be fatal - Key: MAPREDUCE-4901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4901 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 2.0.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4901-trunk.txt To be able to truly fix issues like MAPREDUCE-4819 and MAPREDUCE-4832, we need a 2 phase commit where a subsequent AM can be sure that at a specific point in time it knows exactly if any tasks/jobs are committing. The job history log is already used for similar functionality so we would like to reuse this, but we need to be sure that errors while writing out to the job history log are now fatal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-4775) Reducer will never commit suicide
[ https://issues.apache.org/jira/browse/MAPREDUCE-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-4775: -- Target Version/s: 3.0.0 (was: 3.0.0, 0.23.11) Talked to [~revans2] offline and removing 0.23.11 from target version. Reducer will never commit suicide --- Key: MAPREDUCE-4775 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4775 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans In 1.0 there are a number of conditions that will cause a reducer to commit suicide and exit. This includes if it is stalled, if the error percentage of total fetches is too high. In the new code it will only commit suicide when the total number of failures for a single task attempt is = max(30, totalMaps/10). In the best case with the quadratic back-off to get a single map attempt to reach 30 failure it would take 20.5 hours. And unless there is only one reducer running the map task would have been restarted before then. We should go back to include the same reducer suicide checks that are in 1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Patch Available (was: Open) Thanks Jonathan for your comments. Changes incorporated in the new patch Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Open (was: Patch Available) Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Attachment: patch-MapReduce-5797-v2.patch Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Open (was: Patch Available) indentation errors. need to update Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Attachment: MAPREDUCE-5797-v3.patch Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Patch Available (was: Open) Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937967#comment-13937967 ] Rushabh S Shah commented on MAPREDUCE-5797: --- Corrected indentation errors. Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: MAPREDUCE-5797-v3.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5570: -- Attachment: patch-MapReduce-5570-v2.patch Thanks Jason for the comments. Incorporated them in the new patch. Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.9, 2.1.1-beta Reporter: Jason Lowe Assignee: Rushabh S Shah Attachments: patch-MapReduce-5570-v2.patch, patch-MapReduce-5570.patch If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5570: -- Status: Patch Available (was: Open) Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.1.1-beta, 0.23.9 Reporter: Jason Lowe Assignee: Rushabh S Shah Attachments: patch-MapReduce-5570-v2.patch, patch-MapReduce-5570.patch If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5570: -- Status: Open (was: Patch Available) Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.1.1-beta, 0.23.9 Reporter: Jason Lowe Assignee: Rushabh S Shah Attachments: patch-MapReduce-5570.patch If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5797) The elapsed time for tasks in a failed job that were never started can be way off.
Rushabh S Shah created MAPREDUCE-5797: - Summary: The elapsed time for tasks in a failed job that were never started can be way off. Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) The elapsed time for tasks in a failed job that were never started can be way off.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Attachment: patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. --- Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) The elapsed time for tasks in a failed job that were never started can be way off.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Patch Available (was: Open) Add a new check in javascript if the returned date is '-1'. If it is then return N/A. Minor changes to Times.java also and added a test case to confirm that. The elapsed time for tasks in a failed job that were never started can be way off. --- Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) The elapsed time for tasks in a failed job is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Summary: The elapsed time for tasks in a failed job is wrong (was: The elapsed time for tasks in a failed job that were never started can be way off. ) The elapsed time for tasks in a failed job is wrong - Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Summary: Elapsed time for failed tasks that never started is wrong (was: The elapsed time for tasks in a failed job is wrong ) Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Open (was: Patch Available) Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Attachment: patch-MapReduce-5797-v2.patch Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5797) Elapsed time for failed tasks that never started is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5797: -- Status: Patch Available (was: Open) Added Apache License Agreement TestTimes.java Elapsed time for failed tasks that never started is wrong Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5797-v2.patch, patch-MapReduce-5797.patch The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5789: -- Status: Open (was: Patch Available) Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 2.3.0, 0.23.10 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789-v2.patch, patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932274#comment-13932274 ] Rushabh S Shah commented on MAPREDUCE-5789: --- Thanks Jason for the comments. Changes incorporated. Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.10, 2.3.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789-v2.patch, patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5789: -- Attachment: patch-MapReduce-5789-v2.patch Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.10, 2.3.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789-v2.patch, patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5789: -- Status: Patch Available (was: Open) Changes incorporated Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 2.3.0, 0.23.10 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789-v2.patch, patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5570: -- Attachment: patch-MapReduce-5570.patch Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.9, 2.1.1-beta Reporter: Jason Lowe Assignee: Rushabh S Shah Attachments: patch-MapReduce-5570.patch If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5570: -- Status: Patch Available (was: Open) Removed the code where its updating the finishTime. Added a test case to verify. Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.1.1-beta, 0.23.9 Reporter: Jason Lowe Assignee: Rushabh S Shah Attachments: patch-MapReduce-5570.patch If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5570) Map task attempt with fetch failure has incorrect attempt finish time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah reassigned MAPREDUCE-5570: - Assignee: Rushabh S Shah Map task attempt with fetch failure has incorrect attempt finish time - Key: MAPREDUCE-5570 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5570 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.9, 2.1.1-beta Reporter: Jason Lowe Assignee: Rushabh S Shah If a map task attempt is retroactively failed due to excessive fetch failures reported by reducers then the attempt's finish time is set to the time the task was retroactively failed rather than when the task attempt completed. This causes the map task attempt to appear to have run for much longer than it actually did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
Rushabh S Shah created MAPREDUCE-5789: - Summary: Average Reduce time is incorrect on Job Overview page Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 2.3.0, 0.23.10 Reporter: Rushabh S Shah Assignee: Rushabh S Shah The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5789: -- Status: Patch Available (was: Open) Fixed the issue and confirmed with test case Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 2.3.0, 0.23.10 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
[ https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated MAPREDUCE-5789: -- Attachment: patch-MapReduce-5789.patch Average Reduce time is incorrect on Job Overview page - Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.10, 2.3.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: patch-MapReduce-5789.patch The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)