[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136323#comment-16136323 ] Jian He commented on MAPREDUCE-6838: Yep, comment race - I just resolved this jira too. > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, > MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, > MAPREDUCE-6838-YARN-5355.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6838: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~varun_saxena] and [~rohithsharma] ! > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, > MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, > MAPREDUCE-6838-YARN-5355.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136305#comment-16136305 ] Jian He commented on MAPREDUCE-6838: I tried to commit to YARN-5355_branch2, but looks like YARN-5355_branch2 has compilation error without this patch. [~rohithsharma], [~varun_saxena], can you check ? I've committed the patch to YARN-5355 branch - but I forgot to update the aforementioned codecomment..[~rohithsharma], [~varun_saxena], maybe you can just update it in next whatever patch you have.. > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, > MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, > MAPREDUCE-6838-YARN-5355.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136288#comment-16136288 ] Jian He commented on MAPREDUCE-6838: bq. The code condition is correct. Will change the comment. No worry, I can fix this at commit, no need to upload a new patch just for this. bq. Could not find any API to remove the token from UGI. Not sure why. Should we add one? Yeah, I think we can open a jira in hadoop-common for this request, and fix the issue later. I'm committing the patch , thanks > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, > MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, > MAPREDUCE-6838-YARN-5355.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135858#comment-16135858 ] Jian He commented on MAPREDUCE-6838: - The comment says is OR condition where as the code is AND, which one is true ? Also, when will the "delegationToken.getService()" be empty ? looks like the NodeTimelineCollectorManager#generateTokenAndSetTimer is always setting the service field. {code} // Token need not be updated if either address or token service does not // exist. String service = delegationToken.getService(); if ((service == null || service.isEmpty()) && (collectorAddr == null || collectorAddr.isEmpty())) { LOG.warn("Timeline token does not have service and timeline service " + "address is not yet set. Not updating the token"); return; } {code} - Here if this method is called for the first time, timelineServiceAddress is null, and collectorAddr is null {code} if (collectorAddr == null || collectorAddr.isEmpty()) { collectorAddr = timelineServiceAddress; } {code} later here, it uses "SecurityUtil.getTokenServiceAddr(timelineToken)" to set the token service. Then next time collectorAddr is not null because timelineServiceAddress is not null, it always call "NetUtils.createSocketAddr(collectorAddr) " to set the token service. Is my understanding correct? why not just consistently use one of them to make it look simpler? {code} // Prefer timeline service address over service coming in the token for // updating the token service. InetSocketAddress serviceAddr = (collectorAddr != null && !collectorAddr.isEmpty()) ? NetUtils.createSocketAddr(collectorAddr) : SecurityUtil.getTokenServiceAddr(timelineToken); SecurityUtil.setTokenService(timelineToken, serviceAddr); authUgi.addToken(timelineToken); {code} - Does the collector address change if NM restarts? If so, we may have two keys(different address) for two tokens in the UGI. > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, > MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, > MAPREDUCE-6838-YARN-5355.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009 ] Jian He edited comment on MAPREDUCE-6838 at 8/19/17 8:08 AM: - today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know why it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets replaced properly - in case ip changes on restart?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? was (Author: jianhe): today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know why it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134014#comment-16134014 ] Jian He commented on MAPREDUCE-6838: Think one other way would be when we create the token service in generateTokenForAppCollector, using the same SecurityUtil#buildTokenService API - doing this approach requires AM and NM be consistent on the use_ip config. > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009 ] Jian He edited comment on MAPREDUCE-6838 at 8/19/17 7:57 AM: - today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know why it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? was (Author: jianhe): today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know what it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009 ] Jian He edited comment on MAPREDUCE-6838 at 8/19/17 7:56 AM: - today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know what it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? was (Author: jianhe): today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know what it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets probably replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009 ] Jian He commented on MAPREDUCE-6838: today, for other delegation tokens RMDelegationToken, the old ATSv1 DelegationToken, the token service is not set at server side, it is set at client side - the client call the SecurityUtils#buildTokenService and then set the token service. I don't know what it was done like that - maybe because it avoids the use_ip config inconsistency between client and serve ? Should we follow the same ? The client can construct the tokenService based on the collector address info ? (One caveat is to make sure the old token gets probably replaced properly - in case ip changes ?) The CollectorInfo#getCollectorAddr right now is a string, should it be an address type ? > [ATSv2 Security] Add timeline delegation token received in allocate response > to UGI > --- > > Key: MAPREDUCE-6838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355 > > Attachments: MAPREDUCE-6838-YARN-5355.01.patch, > MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084298#comment-16084298 ] Jian He commented on MAPREDUCE-5621: lgtm > mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time > > > Key: MAPREDUCE-5621 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5621-branch-2.02.patch, > MAPREDUCE-5621-branch-2.patch, MAPREDUCE-5621.patch > > > mr-jobhistory-daemon.sh executes mkdir and chown command to output the log > files. > This is always executed with or without a directory. In addition, this is > executed not only starting daemon but also stopping daemon. > It add "if" like hadoop-daemon.sh and yarn-daemon.sh and should control it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037665#comment-16037665 ] Jian He commented on MAPREDUCE-6288: [~rkanter], your last comment mentioned it is already reverted. But I still see it in branch-2 and trunk. There's one revert commit in branch-2 and trunk, but the content of that only changed the CHANGEST.txt. I think we should go ahead and revert the patch from trunk and branch-2 ? {code} commit 4cf44bef5ca5fee69f712c448f6969e2e046d495 Author: Vinod Kumar VavilapalliDate: Tue Mar 31 13:29:20 2015 -0700 Reverted MAPREDUCE-6286, MAPREDUCE-6199, and MAPREDUCE-5875 from branch-2.7. Editing CHANGES.txt to reflect this. (cherry picked from commit e428fea73029ea0c3494c71a50c5f6c994888fd2) {code} > mapred job -status fails with AccessControlException > - > > Key: MAPREDUCE-6288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Priority: Blocker > Attachments: MAPREDUCE-6288.002.patch, MAPREDUCE-6288-gera-001.patch, > MAPREDUCE-6288.patch > > > After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred > job -status job_1427080398288_0001}} > {noformat} > Exception in thread "main" org.apache.hadoop.security.AccessControlException: > Permission denied: user=jenkins, access=EXECUTE, > inode="/user/history/done":mapred:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) > at > org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at >
[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6852: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks Junping ! > Job#updateStatus() failed with NPE due to race condition > > > Key: MAPREDUCE-6852 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch > > > Like MAPREDUCE-6762, we found this issue in a cluster where Pig query > occasionally failed on NPE - "Pig uses JobControl API to track MR job status, > but sometimes Job History Server failed to flush job meta files to HDFS which > caused the status update failed." Beside NPE in > o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the > exception is as following: > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) > at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604) > {noformat} > We found state here is null. However, we already check the job state to be > RUNNING as code below: > {noformat} > public boolean isComplete() throws IOException { > ensureState(JobState.RUNNING); > updateStatus(); > return status.isJobComplete(); > } > {noformat} > The only possible reason here is two threads are calling here for the same > time: ensure state first, then one thread update the state to null while the > other thread hit NPE issue here. > We should fix this NPE exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889376#comment-15889376 ] Jian He commented on MAPREDUCE-6852: lgtm, committing tomorrow > Job#updateStatus() failed with NPE due to race condition > > > Key: MAPREDUCE-6852 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch > > > Like MAPREDUCE-6762, we found this issue in a cluster where Pig query > occasionally failed on NPE - "Pig uses JobControl API to track MR job status, > but sometimes Job History Server failed to flush job meta files to HDFS which > caused the status update failed." Beside NPE in > o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the > exception is as following: > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) > at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604) > {noformat} > We found state here is null. However, we already check the job state to be > RUNNING as code below: > {noformat} > public boolean isComplete() throws IOException { > ensureState(JobState.RUNNING); > updateStatus(); > return status.isJobComplete(); > } > {noformat} > The only possible reason here is two threads are calling here for the same > time: ensure state first, then one thread update the state to null while the > other thread hit NPE issue here. > We should fix this NPE exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889206#comment-15889206 ] Jian He commented on MAPREDUCE-6852: looks like getJobID is used in the same class in several other places, we may just use this method. > Job#updateStatus() failed with NPE due to race condition > > > Key: MAPREDUCE-6852 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6852.patch > > > Like MAPREDUCE-6762, we found this issue in a cluster where Pig query > occasionally failed on NPE - "Pig uses JobControl API to track MR job status, > but sometimes Job History Server failed to flush job meta files to HDFS which > caused the status update failed." Beside NPE in > o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the > exception is as following: > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) > at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604) > {noformat} > We found state here is null. However, we already check the job state to be > RUNNING as code below: > {noformat} > public boolean isComplete() throws IOException { > ensureState(JobState.RUNNING); > updateStatus(); > return status.isJobComplete(); > } > {noformat} > The only possible reason here is two threads are calling here for the same > time: ensure state first, then one thread update the state to null while the > other thread hit NPE issue here. > We should fix this NPE exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522514#comment-15522514 ] Jian He edited comment on MAPREDUCE-6726 at 9/26/16 9:10 AM: - [~srikanth.sampath], thanks for the patch , I looked at it. IIUC, we are also going to have a different mechanism to retrieve the AM address via YARN-4758. The patch right now is hardcoded to depend on registry approach only, this part of the code needs to be made pluggable so that the approach listed in YARN-4758 can be plugged in. We could implement different FailoverProvider like RegistryBasedFailoverProvider for this jira or RPCBasedFailoverProvider for YARN-4758. Regarding the JVMId changes, could you separate that out and upload it on to MAPREDUCE-6754 ? we can get that reviewed and committed first. was (Author: jianhe): [~srikanth.sampath], thanks for the patch , I looked at it. IIUC, we are also going to have a different mechanism to retrieve the AM address via YARN-4758. The patch right now is hardcoded to depend on registry approach only, this part of the code needs to be made pluggable so that the approach listed in YARN-4758 can be plugged in. We could implement different FailoverProvider like RegistryBasedFailoverProvider or RPCBasedFailoverProvider. Regarding the JVMId changes, could you separate that out and upload it on to MAPREDUCE-6754 ? we can get that reviewed and committed first. > YARN Registry based AM discovery with retry and in-flight task persistent via > JHS > - > > Key: MAPREDUCE-6726 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster >Reporter: Junping Du >Assignee: Srikanth Sampath > Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, > MAPREDUCE-6726-MAPREDUCE-6608.001.patch, > MAPREDUCE-6726-MAPREDUCE-6608.002.patch, WorkPreservingMRAppMaster.pdf > > > Several tasks will be achieved in this JIRA based on the demo patch in > MAPREDUCE-6608: > 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 > later due to scale up issue. > 2. Retry logic for TaskUmbilicalProtocol RPC connection > 3. In-flight task recover after AM restart via JHS > 4. Configuration to control the behavior compatible with previous when not > enable this feature (by default). > All security related issues and other concerns discussed in MAPREDUCE-6608 > will be addressed in follow up JIRAs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522514#comment-15522514 ] Jian He commented on MAPREDUCE-6726: [~srikanth.sampath], thanks for the patch , I looked at it. IIUC, we are also going to have a different mechanism to retrieve the AM address via YARN-4758. The patch right now is hardcoded to depend on registry approach only, this part of the code needs to be made pluggable so that the approach listed in YARN-4758 can be plugged in. We could implement different FailoverProvider like RegistryBasedFailoverProvider or RPCBasedFailoverProvider. Regarding the JVMId changes, could you separate that out and upload it on to MAPREDUCE-6754 ? we can get that reviewed and committed first. > YARN Registry based AM discovery with retry and in-flight task persistent via > JHS > - > > Key: MAPREDUCE-6726 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster >Reporter: Junping Du >Assignee: Srikanth Sampath > Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, > MAPREDUCE-6726-MAPREDUCE-6608.001.patch, > MAPREDUCE-6726-MAPREDUCE-6608.002.patch, WorkPreservingMRAppMaster.pdf > > > Several tasks will be achieved in this JIRA based on the demo patch in > MAPREDUCE-6608: > 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 > later due to scale up issue. > 2. Retry logic for TaskUmbilicalProtocol RPC connection > 3. In-flight task recover after AM restart via JHS > 4. Configuration to control the behavior compatible with previous when not > enable this feature (by default). > All security related issues and other concerns discussed in MAPREDUCE-6608 > will be addressed in follow up JIRAs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436220#comment-15436220 ] Jian He commented on MAPREDUCE-6754: Thanks for the feedback, Jason, Vinod. I think we can add a attemptId into the JvmID, given that it's internal only. [~srikanth.sampath], your opinion ? > Container Ids for an yarn application should be monotonically increasing in > the scope of the application > > > Key: MAPREDUCE-6754 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Srikanth Sampath >Assignee: Srikanth Sampath > > Currently across application attempts, container Ids are reused. The > container id is stored in AppSchedulingInfo and it is reinitialized with > every application attempt. So the containerId scope is limited to the > application attempt. > In the MR Framework, It is important to note that the containerId is being > used as part of the JvmId. JvmId has 3 componentscontainerId>. The JvmId is used in datastructures in TaskAttemptListener and > is passed between the AppMaster and the individual tasks. For an application > attempt, no two tasks have the same JvmId. > This works well currently, since inflight tasks get killed if the AppMaster > goes down. However, if we want to enable WorkPreserving nature for the AM, > containers (and hence containerIds) live across application attempts. If we > recycle containerIds across attempts, then two independent tasks (one > inflight from a previous attempt and another new in a succeeding attempt) > can have the same JvmId and cause havoc. > This can be solved in two ways: > *Approach A*: Include attempt id as part of the JvmId. This is a viable > solution, however, there is a change in the format of the JVMid. Changing > something that has existed so long for an optional feature is not persuasive. > *Approach B*: Keep the container id to be a monotonically increasing id for > the life of an application. So, container ids are not reused across > application attempts containers should be able to outlive an application > attempt. This is the preferred approach as it is clean, logical and is > backwards compatible. Nothing changes for existing applications or the > internal workings. > *How this is achieved:* > Currently, we maintain latest containerId only for application attempts and > reinitialize for new attempts. With this approach, we retrieve the latest > containerId from the just-failed attempt and initialize the new attempt with > the latest containerId (instead of 0). I can provide the patch if it helps. > It currently exists in MAPREDUCE-6726 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434421#comment-15434421 ] Jian He commented on MAPREDUCE-6754: Hi [~jlowe], mind help shedding some light on this ? any reason the JvmID did not include the attemptId ? or any problem if we add that. If we cannot add the attempt Id in the JvmID, we'll go with approach B to make ContainerId#getContainerId uniq across attempts. > Container Ids for an yarn application should be monotonically increasing in > the scope of the application > > > Key: MAPREDUCE-6754 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Srikanth Sampath >Assignee: Srikanth Sampath > > Currently across application attempts, container Ids are reused. The > container id is stored in AppSchedulingInfo and it is reinitialized with > every application attempt. So the containerId scope is limited to the > application attempt. > In the MR Framework, It is important to note that the containerId is being > used as part of the JvmId. JvmId has 3 componentscontainerId>. The JvmId is used in datastructures in TaskAttemptListener and > is passed between the AppMaster and the individual tasks. For an application > attempt, no two tasks have the same JvmId. > This works well currently, since inflight tasks get killed if the AppMaster > goes down. However, if we want to enable WorkPreserving nature for the AM, > containers (and hence containerIds) live across application attempts. If we > recycle containerIds across attempts, then two independent tasks (one > inflight from a previous attempt and another new in a succeeding attempt) > can have the same JvmId and cause havoc. > This can be solved in two ways: > *Approach A*: Include attempt id as part of the JvmId. This is a viable > solution, however, there is a change in the format of the JVMid. Changing > something that has existed so long for an optional feature is not persuasive. > *Approach B*: Keep the container id to be a monotonically increasing id for > the life of an application. So, container ids are not reused across > application attempts containers should be able to outlive an application > attempt. This is the preferred approach as it is clean, logical and is > backwards compatible. Nothing changes for existing applications or the > internal workings. > *How this is achieved:* > Currently, we maintain latest containerId only for application attempts and > reinitialize for new attempts. With this approach, we retrieve the latest > containerId from the just-failed attempt and initialize the new attempt with > the latest containerId (instead of 0). I can provide the patch if it helps. > It currently exists in MAPREDUCE-6726 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6197: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, thanks Junping ! > Cache MapOutputLocations in ShuffleHandler > -- > > Key: MAPREDUCE-6197 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Siddharth Seth >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6197.patch > > > ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / > index information) when it receives a message. > This should be caching map info across requests, so that the a scan of all > directories is not required for each reducer fetching from the same map. > Also, the scan for each map output / index file is performed twice per mapId > within a request. In populateHeaders - once in the call to getMapOutputInfo, > and then directly in the method. > For an invocation where we do end up with more than 1000 (default) mapIds in > a single call, and don't cache them in the map - the path constructed for > such entries will be invalid. This is highly unlikely to be the case though, > until there's proper caching. > {code} > MapOutputInfo info = mapOutputInfoMap.get(mapId); > if (info == null) { > info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333020#comment-15333020 ] Jian He commented on MAPREDUCE-6197: lgtm, one question is how/why do you choose such policy for determining the weight ? {code} maximumWeight(MAX_WEIGHT).weigher( new Weigher() { @Override public int weigh(AttemptPathIdentifier key, AttemptPathInfo value) { return key.jobId.length() + key.user.length() + key.attemptId.length()+ value.indexPath.toString().length() + value.dataPath.toString().length(); } } ) {code} > Cache MapOutputLocations in ShuffleHandler > -- > > Key: MAPREDUCE-6197 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Siddharth Seth >Assignee: Junping Du > Attachments: MAPREDUCE-6197.patch > > > ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / > index information) when it receives a message. > This should be caching map info across requests, so that the a scan of all > directories is not required for each reducer fetching from the same map. > Also, the scan for each map output / index file is performed twice per mapId > within a request. In populateHeaders - once in the call to getMapOutputInfo, > and then directly in the method. > For an invocation where we do end up with more than 1000 (default) mapIds in > a single call, and don't cache them in the map - the path constructed for > such entries will be invalid. This is highly unlikely to be the case though, > until there's proper caching. > {code} > MapOutputInfo info = mapOutputInfoMap.get(mapId); > if (info == null) { > info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6703: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, thanks [~asuresh] ! > Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers > -- > > Key: MAPREDUCE-6703 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6703.001.patch, MAPREDUCE-6703.002.patch, > MAPREDUCE-6703.003.patch > > > YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes > and specifically OPPORTUNISTIC containers. > The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to > provide hints via config to the MR framework as to the number of containers > it would like to schedule as OPPORTUNISTIC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298653#comment-15298653 ] Jian He commented on MAPREDUCE-6703: lgtm, I can commit later today if no comments from others. > Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers > -- > > Key: MAPREDUCE-6703 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: MAPREDUCE-6703.001.patch, MAPREDUCE-6703.002.patch, > MAPREDUCE-6703.003.patch > > > YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes > and specifically OPPORTUNISTIC containers. > The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to > provide hints via config to the MR framework as to the number of containers > it would like to schedule as OPPORTUNISTIC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6696: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Target Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, thanks Zhihai ! > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292491#comment-15292491 ] Jian He commented on MAPREDUCE-6703: I see, just few minor comments: - could you add comments to the newly added config about what this config means ? - Here, we can just call addOpportunisticResourceRequest and so addContainerReq method does not need to be refactored. {code} maps.put(event.getAttemptID(), request); addContainerReq(request, ExecutionType.OPPORTUNISTIC); {code} > Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers > -- > > Key: MAPREDUCE-6703 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: MAPREDUCE-6703.001.patch > > > YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes > and specifically OPPORTUNISTIC containers. > The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to > provide hints via config to the MR framework as to the number of containers > it would like to schedule as OPPORTUNISTIC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292343#comment-15292343 ] Jian He commented on MAPREDUCE-6703: I see. depending on how locality sensitive the MR job is, this may not benefit as much. Wonder whether you have statistics to show how much this improves, or this is mainly for example purpose ? > Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers > -- > > Key: MAPREDUCE-6703 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: MAPREDUCE-6703.001.patch > > > YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes > and specifically OPPORTUNISTIC containers. > The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to > provide hints via config to the MR framework as to the number of containers > it would like to schedule as OPPORTUNISTIC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292313#comment-15292313 ] Jian He commented on MAPREDUCE-6703: looks like the locality is ignored for opportunistic containers, does YARN-2877 consider locality for opportunistic containers ? > Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers > -- > > Key: MAPREDUCE-6703 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: MAPREDUCE-6703.001.patch > > > YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes > and specifically OPPORTUNISTIC containers. > The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to > provide hints via config to the MR framework as to the number of containers > it would like to schedule as OPPORTUNISTIC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292106#comment-15292106 ] Jian He commented on MAPREDUCE-6696: lgtm, +1 > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290171#comment-15290171 ] Jian He commented on MAPREDUCE-6696: also, may be throw IllegalArgumentException instead of RuntimeException ? > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290170#comment-15290170 ] Jian He commented on MAPREDUCE-6696: I see, thanks for your explanation. patch looks good to me, minor nit: may be useful to print the current number of map tasks too in the exception message ? just to be more clear. {code} new RuntimeException("The number of map tasks exceeded limit " + maxMaps); {code} > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287888#comment-15287888 ] Jian He commented on MAPREDUCE-6696: I think the MRJobConfig.NUM_MAPS is giving a hint about, not the actual, number of maps. Btw, seems JobImpl#checkTaskLimits was the very initial code for the task limit. I guess it was removed when YARN got created based on git history > Add a configuration to limit the number of map tasks allowed per job. > - > > Key: MAPREDUCE-6696 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 2.8.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, > MAPREDUCE-6696.002.patch > > > Add a configuration "mapreduce.job.max.map" to limit the number of map tasks > allowed per job. It will be useful for Hadoop admin to save Hadoop cluster > resource by preventing users from submitting big mapreduce jobs. A mapredeuce > job with too many mappers may fail with OOM after running for long time. It > will be a big waste. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6513: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 2.8.0 Status: Resolved (was: Patch Available) Committed to branch-2.7, thanks Wangda ! Thanks Varun for reviewing the patch ! > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob.zhao >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, > MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, > MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch > > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6099) Adding getSplits(JobContext job, List stats) to mapreduce CombineFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved MAPREDUCE-6099. Resolution: Won't Fix Close as Jason mentioned > Adding getSplits(JobContext job, List stats) to mapreduce > CombineFileInputFormat > - > > Key: MAPREDUCE-6099 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6099 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.4.1 >Reporter: Pankit Thapar >Priority: Critical > Attachments: MAPREDUCE-6099.patch > > > Currently we have getSplits(JobContext job) in CombineFileInputFormat. > This api does not give freedom to the client to create a list if file status > it self and then create splits on the resultant List stats. > The client might be able to perform some filtering on its end on the File > sets in the input paths. For the reasons, above it would be a good idea to > have getSplits(JobContext, List). > Please let me know what you think about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4758) jobhistory web ui not showing correct # failed reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-4758: --- Target Version/s: 2.9.0 (was: 2.8.0) Priority: Major (was: Critical) An improvement on the UI. Unlikely, this will get done. move out > jobhistory web ui not showing correct # failed reducers > --- > > Key: MAPREDUCE-4758 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4758 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, webapps >Affects Versions: 0.23.4 >Reporter: Thomas Graves > > we had a job fail due to a reducer failing 4 times. Unfortunately the job > history UI didn't show this particular failed reducer which lead to > confusion as to why the job failed. > This reducer failed to launch all 4 task attempts with a Token Expiration > error and the jobhistory file only gets an event when the task attempt > transitions to launched. The webapp JobInfo object only counts the task > attempts in the jobhistory file to display under the "Attempt Type" table, so > since this task didn't have an attempt with it, it did show it on the UI. > We need to reconcile the task list with the task attempts or also shows more > stats for the tasks vs task attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar
[ https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-4683: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) I guess this could break existing script , close > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar > > > Key: MAPREDUCE-4683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Reporter: Arun C Murthy >Assignee: Akira AJISAKA >Priority: Critical > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4683.patch > > > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280963#comment-15280963 ] Jian He commented on MAPREDUCE-6513: looks like TaskAttemptKillEvent will be sent twice for each mapper First at below code in RMContainerAllocator#handleUpdatedNodes, JobImpl will in turn send the TaskAttemptKillEvent event for each mapper on the unusable node. {code} // send event to the job to act upon completed tasks eventHandler.handle(new JobUpdatedNodesEvent(getJob().getID(), updatedNodes)); {code} Second time at this code in the same method {code} // If map, reschedule next task attempt. boolean rescheduleNextAttempt = (i == 0) ? true : false; eventHandler.handle(new TaskAttemptKillEvent(tid, "TaskAttempt killed because it ran on unusable node" + taskAttemptNodeId, rescheduleNextAttempt)); } {code} This is how it was long time ago, Not sure why that is. With the new change, will this cause more container requests get scheduled ? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob.zhao >Assignee: Varun Saxena >Priority: Critical > Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, > MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, > MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch > > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6680: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8, branch-2.7, thanks Junping ! > JHS UserLogDir scan algorithm sometime could skip directory with update in > CloudFS (Azure FileSystem, S3, etc.) > --- > > Key: MAPREDUCE-6680 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du > Labels: Azure, S3 > Fix For: 2.8.0, 2.7.3 > > Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, > MAPREDUCE-6680.patch > > > In our cluster based on a Cloud FileSystem, we notice JHS sometimes could > skip directory with .jhist file in scanning. > The behavior is like: > First round scan, doesn't found .jhist file: > {noformat} > 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a > directory with 6 files in it. > 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files > ... > {noformat} > Then, we see "Scan not needed of ..." for the same directory every 3 minutes > until application failed as timeout. > From our analysis, we found the root cause is: most of Cloud File System > (Azure FS, S3, etc.) is truncating file/directory modification time to > seconds instead of milliseconds - which could due to limit of http protocol > (from discussion at: > https://forums.aws.amazon.com/thread.jspa?messageID=476615). > So if the time sequence is happen to be: latest non .jhist file modification > on directory happens at T1, directory scanning happens at T2, .jhist file > added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 > after truncating to seconds, this issue could appear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6619) HADOOP_CLASSPATH is overwritten in MR container
[ https://issues.apache.org/jira/browse/MAPREDUCE-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6619: --- Resolution: Fixed Fix Version/s: 2.6.4 2.7.3 2.8.0 Status: Resolved (was: Patch Available) Committed to branch-2, branch-2.8, branch-2.7, branch-2.6, thanks [~djp] ! Thanks [~shanyu] for reviewing the patch ! > HADOOP_CLASSPATH is overwritten in MR container > --- > > Key: MAPREDUCE-6619 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6619 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.8.0, 2.7.2 >Reporter: shanyu zhao >Assignee: Junping Du > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: MAPREDUCE-6619-branch-2.patch > > > Previously env variable HADOOP_CLASSPAH in MR containers inherit from > defaults of the worker node. MAPREDUCE-6454 introduced change to overwrite > HADOOP_CLASSPATH completely. This caused regression. We need to add > additional entries to HADOOP_CLASSPATH instead of completely replacing it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6619) HADOOP_CLASSPATH is overwritten in MR container
[ https://issues.apache.org/jira/browse/MAPREDUCE-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118626#comment-15118626 ] Jian He commented on MAPREDUCE-6619: lgtm , committing. > HADOOP_CLASSPATH is overwritten in MR container > --- > > Key: MAPREDUCE-6619 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6619 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.8.0, 2.7.2 >Reporter: shanyu zhao >Assignee: Junping Du > Attachments: MAPREDUCE-6619-branch-2.patch > > > Previously env variable HADOOP_CLASSPAH in MR containers inherit from > defaults of the worker node. MAPREDUCE-6454 introduced change to overwrite > HADOOP_CLASSPATH completely. This caused regression. We need to add > additional entries to HADOOP_CLASSPATH instead of completely replacing it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6610) JobHistoryEventHandler should not swallow timeline response
[ https://issues.apache.org/jira/browse/MAPREDUCE-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6610: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8 thanks [~gtCarrera] Thanks [~Naganarasimha] for reviewing ! > JobHistoryEventHandler should not swallow timeline response > --- > > Key: MAPREDUCE-6610 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6610 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Trivial > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6610-trunk.001.patch, > MAPREDUCE-6610-trunk.002.patch > > > As discussed in YARN-4596, JobHistoryEventHandler should process and log > timeline put errors after the timeline put call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6610) JobHistoryEventHandler should not swallow timeline response
[ https://issues.apache.org/jira/browse/MAPREDUCE-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116339#comment-15116339 ] Jian He commented on MAPREDUCE-6610: lgtm, > JobHistoryEventHandler should not swallow timeline response > --- > > Key: MAPREDUCE-6610 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6610 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Trivial > Attachments: MAPREDUCE-6610-trunk.001.patch, > MAPREDUCE-6610-trunk.002.patch > > > As discussed in YARN-4596, JobHistoryEventHandler should process and log > timeline put errors after the timeline put call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007094#comment-15007094 ] Jian He commented on MAPREDUCE-5485: lgtm, committing > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007097#comment-15007097 ] Jian He commented on MAPREDUCE-5485: [~djp], there are some findbugs and ut failures, mind checking ? > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5485: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Target Version/s: 2.7.3 (was: 2.6.3, 2.7.3) Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.7, thanks Junping ! thanks Bikas for reviewing ! > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.3 > > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5-branch-2.7.patch, MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744709#comment-14744709 ] Jian He commented on MAPREDUCE-5870: I see, I'm ok to keep it supporting the enum as that's what I originally thought. Just want to bring this up. > Support for passing Job priority through Application Submission Context in > Mapreduce Side > - > > Key: MAPREDUCE-5870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, > 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, > 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch > > > Job Prioirty can be set from client side as below [Configuration and api]. > a. JobConf.getJobPriority() and > Job.setPriority(JobPriority priority) > b. We can also use configuration > "mapreduce.job.priority". > Now this Job priority can be passed in Application Submission > context from Client side. > Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743154#comment-14743154 ] Jian He commented on MAPREDUCE-5870: I earlier thought we can keep backward compatible with the enum priority, but now am thinking the value of doing this. This does bring extra complexity to support both. [~jlowe], do you know if there are many apps from MR1 are actually expecting this enum based priority to work ? Since priority is never supported since hadoop 2 for such a long time, I'm thinking if we can deprecate the old API and claim only support integers to be simple and clear. > Support for passing Job priority through Application Submission Context in > Mapreduce Side > - > > Key: MAPREDUCE-5870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, > 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, > 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch > > > Job Prioirty can be set from client side as below [Configuration and api]. > a. JobConf.getJobPriority() and > Job.setPriority(JobPriority priority) > b. We can also use configuration > "mapreduce.job.priority". > Now this Job priority can be passed in Application Submission > context from Client side. > Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642080#comment-14642080 ] Jian He commented on MAPREDUCE-5870: [~sunilg], thanks for updating, - should we use the JobConf.getJobPriority API so that it can accept the current CLI specified priority too ? {code} String jobPriority = jobConf.get(MRJobConfig.PRIORITY); {code} - to simplify a little bit, {code} int iPriority = TypeConverter.toYarn(jobPriority); // If the given input not a JobPriority enum, verify whether its an // integer. if (0 == iPriority) { iPriority = Integer.parseInt(jobPriority); } {code} we can do something like {code} try { iPriority = TypeConverter.toYarn(jobPriority); } catch (IllegalArgumentException exception) { iPriority = Integer.parseInt(jobPriority); } {code} Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: MAPREDUCE-5870 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Sunil G Assignee: Sunil G Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640790#comment-14640790 ] Jian He commented on MAPREDUCE-5870: patch looks good to me, triggering jenkins Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: MAPREDUCE-5870 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Sunil G Assignee: Sunil G Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5870: --- Status: Patch Available (was: Open) Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: MAPREDUCE-5870 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Sunil G Assignee: Sunil G Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641134#comment-14641134 ] Jian He commented on MAPREDUCE-5870: [~sunilg], could you check if the test failure is related ? Support for passing Job priority through Application Submission Context in Mapreduce Side - Key: MAPREDUCE-5870 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Sunil G Assignee: Sunil G Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch Job Prioirty can be set from client side as below [Configuration and api]. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration mapreduce.job.priority. Now this Job priority can be passed in Application Submission context from Client side. Here we can reuse the MRJobConfig.PRIORITY configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6350: --- Component/s: jobhistoryserver JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He moved YARN-1614 to MAPREDUCE-6350: -- Key: MAPREDUCE-6350 (was: YARN-1614) Project: Hadoop Map/Reduce (was: Hadoop YARN) JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key
[ https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296073#comment-14296073 ] Jian He commented on MAPREDUCE-6230: +1 MR AM does not survive RM restart if RM activated a new AMRM secret key --- Key: MAPREDUCE-6230 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-6230.001.patch A MapReduce AM will fail to reconnect to an RM that performed restart in the following scenario: # MapReduce job launched with AMRM token generated from AMRM secret X # RM rolls new AMRM secret Y and activates the new key # RM performs a work-preserving restart # MapReduce job AM now unable to connect to RM with Invalid AMRMToken exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key
[ https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6230: --- Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks Jason ! MR AM does not survive RM restart if RM activated a new AMRM secret key --- Key: MAPREDUCE-6230 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: MAPREDUCE-6230.001.patch A MapReduce AM will fail to reconnect to an RM that performed restart in the following scenario: # MapReduce job launched with AMRM token generated from AMRM secret X # RM rolls new AMRM secret Y and activates the new key # RM performs a work-preserving restart # MapReduce job AM now unable to connect to RM with Invalid AMRMToken exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5568: --- Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks [~minjikim] ! JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer. --- Key: MAPREDUCE-5568 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Jian He Assignee: MinJi Kim Fix For: 2.7.0 Attachments: 5568.patch01, 5568.patch02, 5568.patch03, 5568.patch04 JobCLient shows like: {code} 13/10/05 16:26:09 INFO mapreduce.Job: map 100% reduce NaN% 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed successfully 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=76741 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 {code} With mapped job -status command, it shows: {code} Uber job : false Number of maps: 1 Number of reduces: 0 map() completion: 1.0 reduce() completion: NaN Job state: SUCCEEDED retired: false reason for failure: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225462#comment-14225462 ] Jian He commented on MAPREDUCE-5785: After this patch, job somehow fails due to not able to launch task container {{Error: Could not find or load main class null}}. (might be my own setup problem) Derive heap size or mapreduce.*.memory.mb automatically --- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 3.0.0 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221569#comment-14221569 ] Jian He commented on MAPREDUCE-5568: [~minjikim], thanks for your contribution. patch looks good, just some format issues: - the convention is to use two spaces for indentation. {code} if ( getTotalMaps() == 0 ) { report.setMapProgress(1.0f); } else { report.setMapProgress((float) getCompletedMaps() / getTotalMaps()); } if ( getTotalReduces() == 0 ) { report.setReduceProgress(1.0f); } else { report.setReduceProgress((float) getCompletedReduces() / getTotalReduces()); } {code} JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer. --- Key: MAPREDUCE-5568 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.4.1, 2.5.1 Reporter: Jian He Assignee: MinJi Kim Attachments: 5568.patch01, 5568.patch02, 5568.patch03 JobCLient shows like: {code} 13/10/05 16:26:09 INFO mapreduce.Job: map 100% reduce NaN% 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed successfully 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=76741 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 {code} With mapped job -status command, it shows: {code} Uber job : false Number of maps: 1 Number of reduces: 0 map() completion: 1.0 reduce() completion: NaN Job state: SUCCEEDED retired: false reason for failure: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6048) TestJavaSerialization fails in trunk build
[ https://issues.apache.org/jira/browse/MAPREDUCE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6048: --- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) looks good. Committed to trunk, branch-2, branch-2.6, thanks Varun! TestJavaSerialization fails in trunk build -- Key: MAPREDUCE-6048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6048 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Ted Yu Assignee: Varun Vasudev Priority: Minor Fix For: 2.6.0 Attachments: apache-mapreduce-6048.0.patch This happened in builds #1871 and #1872 {code} testMapReduceJob(org.apache.hadoop.mapred.TestJavaSerialization) Time elapsed: 2.784 sec FAILURE! junit.framework.ComparisonFailure: expected:[a ]1 but was:[0 1]1 at junit.framework.Assert.assertEquals(Assert.java:100) at junit.framework.Assert.assertEquals(Assert.java:107) at junit.framework.TestCase.assertEquals(TestCase.java:269) at org.apache.hadoop.mapred.TestJavaSerialization.testMapReduceJob(TestJavaSerialization.java:127) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180218#comment-14180218 ] Jian He commented on MAPREDUCE-6126: make sense, +1 (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6126: --- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.6. thanks Junping ! (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Fix For: 2.6.0 Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type
[ https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179529#comment-14179529 ] Jian He commented on MAPREDUCE-6126: in JobHistoryEventHandler, seems we already skip writing this event {code} HistoryEvent historyEvent = event.getHistoryEvent(); if (! (historyEvent instanceof NormalizedResourceEvent)) { mi.writeEvent(historyEvent); } {code} (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type -- Key: MAPREDUCE-6126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-6126.patch java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305) at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-6087: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks [~ajisakaa] ! MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143583#comment-14143583 ] Jian He commented on MAPREDUCE-6087: mapred-default.xml has the correct name, which is good. {code} nameyarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts/name value3/value {code} Thanks [~ajisakaa] for working on the issue ! MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
Jian He created MAPREDUCE-6087: -- Summary: MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5910: --- Status: Open (was: Patch Available) MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5910: --- Attachment: MAPREDUCE-5910.4.patch I see, thanks for investigating. added one code comment myself, re-submit the patch. MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5910: --- Status: Patch Available (was: Open) MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065309#comment-14065309 ] Jian He commented on MAPREDUCE-5910: committing this. MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063747#comment-14063747 ] Jian He commented on MAPREDUCE-5910: Thanks for updating the patch! patch looks good. submit to jenkins MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5910: --- Status: Patch Available (was: Open) MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063985#comment-14063985 ] Jian He commented on MAPREDUCE-5910: Rohith, can you look into the test failures? thanks! MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, MAPREDUCE-5910.3.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5910: --- Status: Open (was: Patch Available) MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058156#comment-14058156 ] Jian He commented on MAPREDUCE-5910: patch looks good over all, some comments: - addOutstandingAllocateRequestOnResync -addOutstandingRequestsOnResync - MR_RM_WORKPRESERVING_RESTART_ENABLED flag is not needed any more, given that AM_RESYNC and AM_SHUTDOWN commands now are sent in different cases. MRAppMaster should handle Resync from RM instead of shutting down. -- Key: MAPREDUCE-5910 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910 Project: Hadoop Map/Reduce Issue Type: Task Components: applicationmaster Reporter: Rohith Assignee: Rohith Fix For: 2.5.0 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The MRAppMaster behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034664#comment-14034664 ] Jian He commented on MAPREDUCE-5924: LGTM, +1 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING' Key: MAPREDUCE-5924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yesha Vora Assignee: Zhijie Shen Attachments: MAPREDUCE-5924.1.patch The Sort job over 1GB data failed with below error {code} 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE) 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_15_1000 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_15_1000 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:722) 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR {code} The JobHistory Url prints job state = ERROR -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034668#comment-14034668 ] Jian He commented on MAPREDUCE-5924: committing.. Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING' Key: MAPREDUCE-5924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yesha Vora Assignee: Zhijie Shen Attachments: MAPREDUCE-5924.1.patch The Sort job over 1GB data failed with below error {code} 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE) 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_15_1000 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_15_1000 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:722) 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR {code} The JobHistory Url prints job state = ERROR -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034681#comment-14034681 ] Jian He commented on MAPREDUCE-5924: Zhijie, can you open jira for the exception issue on Windows you mentioned? thx Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING' Key: MAPREDUCE-5924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yesha Vora Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: MAPREDUCE-5924.1.patch The Sort job over 1GB data failed with below error {code} 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE) 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_15_1000 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_15_1000 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:722) 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR {code} The JobHistory Url prints job state = ERROR -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5924: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2, Thanks Zhijie! Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING' Key: MAPREDUCE-5924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yesha Vora Assignee: Zhijie Shen Attachments: MAPREDUCE-5924.1.patch The Sort job over 1GB data failed with below error {code} 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE) 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_15_1000 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_15_1000 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:722) 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR {code} The JobHistory Url prints job state = ERROR -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'
[ https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5924: --- Fix Version/s: 2.5.0 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING' Key: MAPREDUCE-5924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yesha Vora Assignee: Zhijie Shen Fix For: 2.5.0 Attachments: MAPREDUCE-5924.1.patch The Sort job over 1GB data failed with below error {code} 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1402304714683_0002 (auth:SIMPLE) 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1402304714683_0002_r_15_1000 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1402304714683_0002_r_15_1000 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:722) 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1402304714683_0002Job Transitioned from RUNNING to ERROR {code} The JobHistory Url prints job state = ERROR -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007680#comment-14007680 ] Jian He commented on MAPREDUCE-5900: Patch looks good overall. I think we need test case to verify the state of the attempt is actually going to killed state. Maybe we can combine the test cases from MAPREDUCE-5848? we can give credit to both. Container preemption interpreted as task failures and eventually job failures -- Key: MAPREDUCE-5900 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch We have Added preemption exit code needs to be incorporated MR needs to recognize the special exit code value of -102 and interpret it as a container being killed instead of a container failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5838) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently
Jian He created MAPREDUCE-5838: -- Summary: TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently Key: MAPREDUCE-5838 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5838 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5832: --- Status: Patch Available (was: Open) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows Key: MAPREDUCE-5832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5832.1.patch java.lang.Exception: test timed out after 1000 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) at java.net.InetAddress.getLocalHost(InetAddress.java:1434) at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174) at sun.security.krb5.Config.getDefaultRealm(Config.java:1081) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75) at org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460) at org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5832) TestJobClient fails sometimes on Windows
Jian He created MAPREDUCE-5832: -- Summary: TestJobClient fails sometimes on Windows Key: MAPREDUCE-5832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5832) TestJobClient fails sometimes on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5832: --- Attachment: MAPREDUCE-5832.1.patch Increased the timeout, did not find problem with the current test TestJobClient fails sometimes on Windows Key: MAPREDUCE-5832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5832.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5832: --- Description: java.lang.Exception: test timed out after 1000 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) at java.net.InetAddress.getLocalHost(InetAddress.java:1434) at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174) at sun.security.krb5.Config.getDefaultRealm(Config.java:1081) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75) at org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460) at org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows Key: MAPREDUCE-5832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5832.1.patch java.lang.Exception: test timed out after 1000 milliseconds at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) at java.net.InetAddress.getLocalHost(InetAddress.java:1434) at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174) at sun.security.krb5.Config.getDefaultRealm(Config.java:1081) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75) at org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460) at org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5832: --- Summary: TestJobClient#testGetStagingAreaDir timeout sometimes on Windows (was: TestJobClient fails sometimes on Windows) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows Key: MAPREDUCE-5832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5832.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5655) Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964967#comment-13964967 ] Jian He commented on MAPREDUCE-5655: Please refer to MAPREDUCE-4052 for the fix, the patch uploaded here is dead. Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath - Key: MAPREDUCE-5655 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5655 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 2.2.0, 2.3.0 Environment: Client machine is a Windows 7 box, with Eclipse Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any linux) Reporter: Attila Pados Assignee: Joyoung Zhang Attachments: MRApps.patch, YARNRunner.patch I was trying to run a java class on my client, windows 7 developer environment, which submits a job to the remote Hadoop cluster, initiates a mapreduce there, and then downloads the results back to the local machine. General use case is to use hadoop services from a web application installed on a non-cluster computer, or as part of a developer environment. The problem was, that the ApplicationMaster's startup shell script (launch_container.sh) was generated with wrong CLASSPATH entry. Together with the java process call on the bottom of the file, these entries were generated in windows style, using % as shell variable marker and ; as the CLASSPATH delimiter. I tracked down the root cause, and found that the MrApps.java, and the YarnRunner.java classes create these entries, and is passed forward to the ApplicationMaster, assuming that the OS that runs these classes will match the one running the ApplicationMaster. But it's not the case, these are in 2 different jvm, and also the OS can be different, the strings are generated based on the client/submitter side's OS. I made some workaround changes to these 2 files, so i could launch my job, however there may be more problems ahead. update error message: 13/12/04 16:33:15 INFO mapreduce.Job: Job job_1386170530016_0001 failed with state FAILED due to: Application application_1386170530016_0001 failed 2 times due to AM Container for appattempt_1386170530016_0001_02 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) update2: It also reqires to add the following property to mapred-site.xml (or mapred-default.xml), on the windows box, so that the job launcher knows, that the job runner will be a linux: property namemapred.remote.os/name valueLinux/value descriptionRemote MapReduce framework's OS, can be either Linux or Windows/description /property without this entry, the patched jar does the same as the unpatched, so it's required to work! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Status: Patch Available (was: Open) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
Jian He created MAPREDUCE-5818: -- Summary: hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Attachment: MAPREDUCE-5818.1.patch simple patch to add the missing command hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Status: Patch Available (was: Open) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Status: Open (was: Patch Available) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Attachment: MAPREDUCE-5818.2.patch hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Status: Patch Available (was: Open) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Status: Open (was: Patch Available) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Attachment: (was: MAPREDUCE-5818.2.patch) hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd
[ https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5818: --- Attachment: MAPREDUCE-5818.3.patch hsadmin cmd is missing in mapred.cmd Key: MAPREDUCE-5818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5816) TestMRAppMaster fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954861#comment-13954861 ] Jian He commented on MAPREDUCE-5816: dup of MAPREDUCE-5815 ? TestMRAppMaster fails in trunk -- Key: MAPREDUCE-5816 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5816 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu As can be seen from https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1741/console: {code} Tests in error: TestMRAppMaster.testMRAppMasterMidLock:163 » NullPointer TestMRAppMaster.testMRAppMasterSuccessLock:202 » NullPointer TestMRAppMaster.testMRAppMasterFailLock:241 » NullPointer {code} I got the following locally: {code} Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 2.964 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) Time elapsed: 0.963 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275) at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099) at org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:163) testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) Time elapsed: 0.25 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275) at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099) at org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:202) testMRAppMasterFailLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) Time elapsed: 0.232 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275) at org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099) at org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterFailLock(TestMRAppMaster.java:241) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949654#comment-13949654 ] Jian He commented on MAPREDUCE-5397: My impression on this issue was I submitted a job, the first few attempts(2 or 3) of the job all failed because of the above reason. Eventually the last attempt got passed. But after I made a clean build and re-deploy the cluster, I couldn't reproduce anymore. Feel free to reopen this if necessary, and also share some logs. tx AM crashes because Webapp failed to start on multi node cluster --- Key: MAPREDUCE-5397 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: log.txt I set up a 12 nodes cluster and tried submitting jobs but get this exception. But job is able to succeed after AM crashes and retry a few times(2 or 3) {code} 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of context org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce} java.io.FileNotFoundException: /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at org.mortbay.resource.JarResource.extract(JarResource.java:215) at org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974) at org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832) at org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-4052: --- Status: Open (was: Patch Available) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster. --- Key: MAPREDUCE-4052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Affects Versions: 2.2.0, 0.23.1 Environment: client on the Windows, the the cluster on the suse Reporter: xieguiming Assignee: Jian He Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, MAPREDUCE-4052.2.patch, MAPREDUCE-4052.3.patch, MAPREDUCE-4052.4.patch, MAPREDUCE-4052.5.patch, MAPREDUCE-4052.6.patch, MAPREDUCE-4052.7.patch, MAPREDUCE-4052.patch when I use the eclipse on the windows to submit the job. and the applicationmaster throw the exception: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster. Program will exit. The reasion is : class Apps addToEnvironment function, use the private static final String SYSTEM_PATH_SEPARATOR = System.getProperty(path.separator); and will result the MRApplicationMaster classpath use the ; separator. I suggest that nodemanger do the replace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-4052: --- Status: Patch Available (was: Open) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster. --- Key: MAPREDUCE-4052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Affects Versions: 2.2.0, 0.23.1 Environment: client on the Windows, the the cluster on the suse Reporter: xieguiming Assignee: Jian He Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, MAPREDUCE-4052.2.patch, MAPREDUCE-4052.3.patch, MAPREDUCE-4052.4.patch, MAPREDUCE-4052.5.patch, MAPREDUCE-4052.6.patch, MAPREDUCE-4052.7.patch, MAPREDUCE-4052.8.patch, MAPREDUCE-4052.patch when I use the eclipse on the windows to submit the job. and the applicationmaster throw the exception: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster. Program will exit. The reasion is : class Apps addToEnvironment function, use the private static final String SYSTEM_PATH_SEPARATOR = System.getProperty(path.separator); and will result the MRApplicationMaster classpath use the ; separator. I suggest that nodemanger do the replace. -- This message was sent by Atlassian JIRA (v6.2#6252)