[jira] [Commented] (MAPREDUCE-6449) MR Code should not throw and catch YarnRuntimeException to communicate internal exceptions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904725#comment-14904725 ] Neelesh Srinivas Salian commented on MAPREDUCE-6449: Started on the version 1 of this patch. Questions to confirm my understanding: 1) As mentioned in MAPREDUCE-6439, I agree that there are 3 files in MR that have the catch (YarnRuntimeException e) {} block that needs to be addressed in this JIRA. These files include ~/mapreduce/v2/app/MRAppMaster.java ~/mapreduce/v2/hs/webapp/HsWebServices.java ~/mapreduce/v2/app/webapp/AMWebServices.java The other occurrences are under YARN which I believe is taken in YARN-4021. 2) The objective in this JIRA is to distinguish the calls for the Exception from remote versus local and wrap these under a unified Exception in MR that also helps in backwards compatiibility. 3) I observed that each of the other files in the mapred modules have specific actions in the catch block. Like in TestRecordFactory: catch (YarnRuntimeException e) { e.printStackTrace(); Assert.fail("Failed to crete record"); } So, the idea in this JIRA is to simply map the name of YarnRuntimeException to a single Wrapper for MR Exception? There are instances where the YarnException is expected to be caught: As in TestLocalContainerAllocator which is the local exception catch block. catch (YarnException e) { // YarnException is expected } Please correct/augment this comment to help confirm my understanding. Thank you. > MR Code should not throw and catch YarnRuntimeException to communicate > internal exceptions > -- > > Key: MAPREDUCE-6449 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6449 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian > > In discussion of MAPREDUCE-6439 we discussed how throwing and catching > YarnRuntimeException in MR code is incorrect and we should instead use some > MR specific exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904854#comment-14904854 ] Arun Suresh commented on MAPREDUCE-6484: Thanks for the patch, [~zxu].. Makes sense.. Minor nit : Instead of using {{yarnConf.getStringCollection()}} and then doing an {{rmIds.toArrays()}}, you can probably just use {{yarnConf.getStrings()}} which returns an array itself. +1, pending that > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at
[jira] [Created] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
Maysam Yabandeh created MAPREDUCE-6489: -- Summary: Fail fast rogue tasks that write too much to local disk Key: MAPREDUCE-6489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 2.7.1 Reporter: Maysam Yabandeh Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs running in collocated containers. Ideally YARN will be able to limit amount of local disk used by each task: YARN-4011. Until then, the mapreduce task can fail fast if the task is writing too much (above a configured threshold) to local disk. As we discussed [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the local disk and throws an exception when it goes beyond a configured value. It is true that written bytes is larger than the actual used disk space, but to detect a rogue task the exact value is not required and a very large value for written bytes to local disk is a good indicative that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob updated MAPREDUCE-6485: --- Description: The scenarios is like this: With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will take resource and start to run when all the map have not finished. But It could happened that when all the resources are taken up by running reduces, there is still one map not finished. Under this condition , the last map have two task attempts . As for the first attempt was killed due to timeout(mapreduce.task.timeout), and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to FAILED, but failed map attempt would not be restarted for there is still one speculate map attempt in progressing. As for the second attempt which was started due to having enable map task speculative is pending at UNASSINGED state because of no resource available. But the second map attempt request have lower priority than reduces, so preemption would not happened. As a result all reduces would not finished because of there is one map left. and the last map hanged there because of no resource available. so, the job would never finish. was: The scenarios is like this: With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will take resource and start to run when all the map have not finished. But It could happened that when all the resources are taken up by running reduces, there is still one map not finished. Under this condition , the last map have two task attempts . As for the first attempt was killed due to timeout(mapreduce.task.timeout), and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP, so failed map attempt would not be started. As for the second attempt which was started due to having enable map task speculative is pending at UNASSINGED state because of no resource available. But the second map attempt request have lower priority than reduces, so preemption would not happened. As a result all reduces would not finished because of there is one map left. and the last map hanged there because of no resource available. so, the job would never finish. > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Attachments: MAPREDUCE-6485.001.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905405#comment-14905405 ] zhihai xu commented on MAPREDUCE-6484: -- thanks for the review [~asuresh]! That is a good suggestion. I attached a new patch MAPREDUCE-6484.001.patch, which addressed your comment. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: MAPREDUCE-6484.001.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6334) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-6334: --- Target Version/s: 2.7.1, 2.6.2 (was: 2.7.1) Targeting 2.6.2 per Eric's comment in the mailing lists. > Fetcher#copyMapOutput is leaking usedMemory upon IOException during > InMemoryMapOutput shuffle handler > - > > Key: MAPREDUCE-6334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6334 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Blocker > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6334.001.patch, MAPREDUCE-6334.002.patch > > > We are seeing this happen when > - an NM's disk goes bad during the creation of map output(s) > - the reducer's fetcher can read the shuffle header and reserve the memory > - but gets an IOException when trying to shuffle for InMemoryMapOutput > - shuffle fetch retry is enabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5935) TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905183#comment-14905183 ] Jorge Gabriel Siqueira commented on MAPREDUCE-5935: --- Is there the possibility to js.fs be initialized after your check? If so, could it cause some problem (because I think it will never be closed)? > TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception. > - > > Key: MAPREDUCE-5935 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5935 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.1, 1.2.0, 1.2.1 >Reporter: Jinghui Wang >Assignee: Jinghui Wang > Attachments: MAPREDUCE-5935.patch > > > The exception is caused by a race condition. The test case calls > Jobtracker.offerservice in a seperate thread JTRunner, which initializes the > class variable FileSystem fs. In the main thread it tries to close fs in the > finally block of code, but at that point jt.fs might still not be > initialized, thus causing the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5935) TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905188#comment-14905188 ] Jorge Gabriel Siqueira commented on MAPREDUCE-5935: --- Is this issue still reproducible? > TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception. > - > > Key: MAPREDUCE-5935 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5935 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.1, 1.2.0, 1.2.1 >Reporter: Jinghui Wang >Assignee: Jinghui Wang > Attachments: MAPREDUCE-5935.patch > > > The exception is caused by a race condition. The test case calls > Jobtracker.offerservice in a seperate thread JTRunner, which initializes the > class variable FileSystem fs. In the main thread it tries to close fs in the > finally block of code, but at that point jt.fs might still not be > initialized, thus causing the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: MAPREDUCE-6484.001.patch > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Attachment: (was: MAPREDUCE-6484.001.patch) > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905790#comment-14905790 ] Hadoop QA commented on MAPREDUCE-6484: -- \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 47s | Tests passed in hadoop-mapreduce-client-core. | | | | 41m 50s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762046/MAPREDUCE-6484.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 06d1c90 | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/console | This message was automatically generated. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token >
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904277#comment-14904277 ] Rohith Sharma K S commented on MAPREDUCE-6485: -- Looked deeper in to reducer preemption code and find that even if *headroom is available for assigning map request* and {{pendingReducers}} are zero and {{scheduledReducers}} are more then *neither preemption will be triggered nor Ramping down of reducers will happen*. RM always allocates containers to reducers. If scheduledReducers are more, then at some point of time cluster resources are fully acquired by reducers. Say if reducers memory is 5GB and mapper memory is 4GB. Headroom 4GB is available where 1 mapper can be assigned to it. But since more reducers requests are there to assign, RM always skip the assignment for reducers since capacity 5GB is greater then 4GB headroom. > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Attachments: MAPREDUCE-6485.001.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904279#comment-14904279 ] Rohith Sharma K S commented on MAPREDUCE-6485: -- Overall patch looks good to me.. Can you add test for handling regression? > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Attachments: MAPREDUCE-6485.001.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904285#comment-14904285 ] Rohith Sharma K S commented on MAPREDUCE-6485: -- nit : can you check for greater then rather not equal? {{task.inProgressAttempts.size() != 0}} > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Attachments: MAPREDUCE-6485.001.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6355) 2.5 client cannot communicate with 2.5 job on 2.6 cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-6355: --- Labels: (was: 2.6.1-candidate) Target Version/s: 2.7.2, 2.6.2 Dropping 2.6.1-candidate label, 2.6.1 is out now. Targetting 2.6.2 / 2.7.2. > 2.5 client cannot communicate with 2.5 job on 2.6 cluster > - > > Key: MAPREDUCE-6355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6355 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe > > Trying to run a job on a Hadoop 2.6 cluster from a Hadoop 2.5 client > submitting a job that uses Hadoop 2.5 jars results in a job that succeeds but > the client cannot communicate with the AM while the job is running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905430#comment-14905430 ] Arun Suresh commented on MAPREDUCE-6484: The latest patch looks good.. thanks [~zxu] +1, pending jenkins > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905644#comment-14905644 ] Hadoop QA commented on MAPREDUCE-6484: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 22s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 11s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 1 new checkstyle issues (total was 9, now 9). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 46s | Tests passed in hadoop-mapreduce-client-core. | | | | 42m 23s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761995/MAPREDUCE-6484.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1f707ec | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/console | This message was automatically generated. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12
[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6484: - Hadoop Flags: Reviewed > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at >
[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905827#comment-14905827 ] zhihai xu commented on MAPREDUCE-6484: -- Thanks for the review [~asuresh]! The new patch passed jenkins. I will commit it tomorrow if no one objects. > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. > > > Key: MAPREDUCE-6484 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, security >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch > > > Yarn Client uses local address instead of RM address as token renewer in a > secure cluster when RM HA is enabled. This will cause HDFS token renew > failure for renewer "nobody" if the rules from > {{hadoop.security.auth_to_local}} exclude the client address in HDFS > {{DelegationTokenIdentifier}}. > The reason why the local address is returned is: When HA is enabled, > "yarn.resourcemanager.address" may not be set, if > {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", > the default address "0.0.0.0:8032" will be used, Based on the following code > at SecurityUtil.java, the local address will be used to replace "0.0.0.0". > {code} > private static String replacePattern(String[] components, String hostname) > throws IOException { > String fqdn = hostname; > if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) { > fqdn = getLocalHostName(); > } > return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + > components[2]; > } > static String getLocalHostName() throws UnknownHostException { > return InetAddress.getLocalHost().getCanonicalHostName(); > } > public static String getServerPrincipal(String principalConfig, > InetAddress addr) throws IOException { > String[] components = getComponents(principalConfig); > if (components == null || components.length != 3 > || !components[1].equals(HOSTNAME_PATTERN)) { > return principalConfig; > } else { > if (addr == null) { > throw new IOException("Can't replace " + HOSTNAME_PATTERN > + " pattern since client address is null"); > } > return replacePattern(components, addr.getCanonicalHostName()); > } > } > {code} > The following is the exception which cause the job fail: > {code} > 15/09/12 16:27:24 WARN security.UserGroupInformation: > PriviledgedActionException as:t...@example.com (auth:KERBEROS) > cause:java.io.IOException: Failed to run job : yarn tries to renew a token > with renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > java.io.IOException: Failed to run job : yarn tries to renew a token with > renewer nobody > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512) > at >