[jira] [Updated] (MAPREDUCE-7023) TestHadoopArchiveLogs.testCheckFilesAndSeedApps fails on rerun

2018-04-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated MAPREDUCE-7023:
-
Fix Version/s: (was: 3.0.2)
   3.0.3

> TestHadoopArchiveLogs.testCheckFilesAndSeedApps fails on rerun
> --
>
> Key: MAPREDUCE-7023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7023
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.3
>
> Attachments: MAPREDUCE-7023.001.patch
>
>
> Since the test doesn't clean up the created "logs" dir, when rerunning it 
> fails on this line:
> {code}
> Assert.assertEquals(0, hal.eligibleApplications.size());
> hal.checkFilesAndSeedApps(fs, rootLogDir, suffix);
>  >>> Assert.assertEquals(1, hal.eligibleApplications.size());
> java.lang.AssertionError: 
> Expected :1
> Actual   :2 (or more for consecutive reruns)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-04-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated MAPREDUCE-7064:
-
Fix Version/s: (was: 3.0.2)
   3.0.3

> Flaky test TestTaskAttempt#testReducerCustomResourceTypes
> -
>
> Key: MAPREDUCE-7064
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.1.0, 3.0.3
>
> Attachments: MAPREDUCE-7064-001.patch, MAPREDUCE-7064-002.patch, 
> MAPREDUCE-7064-003.patch
>
>
> The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
> fail with the following error:
> {noformat}
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
> 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
> COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
> 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, 
> minimum allocation: 0, maximum allocation: 9223372036854775807]
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
> {noformat}
> The root cause seems to be an interference from previous tests that start 
> instance(s) of {{FailingAttemptsMRApp}} or 
> {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
> {{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-04-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated MAPREDUCE-7053:
-
Fix Version/s: (was: 3.0.2)
   3.0.3

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.3
>
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6930) mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present twice in mapred-default.xml

2018-04-05 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated MAPREDUCE-6930:
-
Fix Version/s: (was: 3.0.2)
   3.0.3

> mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present 
> twice in mapred-default.xml
> -
>
> Key: MAPREDUCE-6930
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6930
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.4, 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Sen Zhao
>Priority: Major
>  Labels: newbie
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.3
>
> Attachments: MAPREDUCE-6930.001.patch
>
>
> The second set should be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7059) Downward Compatibility issue: MR job fails because of unknown setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster

2018-04-04 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated MAPREDUCE-7059:
-
Fix Version/s: (was: 3.0.2)
   3.0.3

> Downward Compatibility issue: MR job fails because of unknown 
> setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster
> ---
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Fix For: 3.1.0, 3.0.3
>
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch, 
> MAPREDUCE-7059.006.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
>