[jira] [Commented] (MAPREDUCE-6449) MR Code should not throw and catch YarnRuntimeException to communicate internal exceptions

2015-09-23 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904725#comment-14904725
 ] 

Neelesh Srinivas Salian commented on MAPREDUCE-6449:


Started on the version 1 of this patch.
Questions to confirm my understanding:
1) As mentioned in MAPREDUCE-6439, I agree that there are 3 files in MR that 
have the 
catch (YarnRuntimeException e) {} block that needs to be addressed in this JIRA.
These files include 
~/mapreduce/v2/app/MRAppMaster.java 
~/mapreduce/v2/hs/webapp/HsWebServices.java
~/mapreduce/v2/app/webapp/AMWebServices.java
The other occurrences are under YARN which I believe is taken in YARN-4021.

2) The objective in this JIRA is to distinguish the calls for the Exception 
from remote versus local and wrap these under a unified Exception in MR that 
also helps in backwards compatiibility.

3) I observed that each of the other files in the mapred modules have specific 
actions in the catch block.
Like in TestRecordFactory:

catch (YarnRuntimeException e) {
  e.printStackTrace();
  Assert.fail("Failed to crete record");
}

So, the idea in this JIRA is to simply map the name of YarnRuntimeException to 
a single Wrapper for MR Exception?
There are instances where the YarnException is expected to be caught:
As in TestLocalContainerAllocator which is the local exception catch block.
catch (YarnException e) {
  // YarnException is expected
}

Please correct/augment this comment to help confirm my understanding.

Thank you.

> MR Code should not throw and catch YarnRuntimeException to communicate 
> internal exceptions
> --
>
> Key: MAPREDUCE-6449
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6449
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> In discussion of MAPREDUCE-6439 we discussed how throwing and catching 
> YarnRuntimeException in MR code is incorrect and we should instead use some 
> MR specific exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904854#comment-14904854
 ] 

Arun Suresh commented on MAPREDUCE-6484:


Thanks for the patch, [~zxu].. Makes sense..

Minor nit :
Instead of using {{yarnConf.getStringCollection()}} and then doing an 
{{rmIds.toArrays()}}, you can probably just use {{yarnConf.getStrings()}} which 
returns an array itself.

+1, pending that


> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 

[jira] [Created] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk

2015-09-23 Thread Maysam Yabandeh (JIRA)
Maysam Yabandeh created MAPREDUCE-6489:
--

 Summary: Fail fast rogue tasks that write too much to local disk
 Key: MAPREDUCE-6489
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 2.7.1
Reporter: Maysam Yabandeh


Tasks of the rogue jobs can write too much to local disk, negatively affecting 
the jobs running in collocated containers. Ideally YARN will be able to limit 
amount of local disk used by each task: YARN-4011. Until then, the mapreduce 
task can fail fast if the task is writing too much (above a configured 
threshold) to local disk.

As we discussed 
[here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
 the suggested approach is that the MapReduce task checks for BYTES_WRITTEN 
counter for the local disk and throws an exception when it goes beyond a 
configured value.  It is true that written bytes is larger than the actual used 
disk space, but to detect a rogue task the exact value is not required and a 
very large value for written bytes to local disk is a good indicative that the 
task is misbehaving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

2015-09-23 Thread Bob (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob updated MAPREDUCE-6485:
---
Description: 
The scenarios is like this:
With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will 
take resource and  start to run when all the map have not finished. 
But It could happened that when all the resources are taken up by running 
reduces, there is still one map not finished. 
Under this condition , the last map have two task attempts .
As for the first attempt was killed due to timeout(mapreduce.task.timeout), and 
its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to FAILED, 
but failed map attempt would not be restarted for there is still one speculate 
map attempt in progressing. 
As for the second attempt which was started due to having enable map task 
speculative is pending at UNASSINGED state because of no resource available. 
But the second map attempt request have lower priority than reduces, so 
preemption would not happened.
As a result all reduces would not finished because of there is one map left. 
and the last map hanged there because of no resource available. so, the job 
would never finish.

  was:
The scenarios is like this:
With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will 
take resource and  start to run when all the map have not finished. 
But It could happened that when all the resources are taken up by running 
reduces, there is still one map not finished. 
Under this condition , the last map have two task attempts .
As for the first attempt was killed due to timeout(mapreduce.task.timeout), and 
its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP, so failed map 
attempt would not be started. 
As for the second attempt which was started due to having enable map task 
speculative is pending at UNASSINGED state because of no resource available. 
But the second map attempt request have lower priority than reduces, so 
preemption would not happened.
As a result all reduces would not finished because of there is one map left. 
and the last map hanged there because of no resource available. so, the job 
would never finish.


> MR job hanged forever because all resources are taken up by reducers and the 
> last map attempt never get resource to run
> ---
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>Reporter: Bob
>Assignee: Xianyin Xin
>Priority: Critical
> Attachments: MAPREDUCE-6485.001.patch
>
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces 
> will take resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running 
> reduces, there is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), 
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to 
> FAILED, but failed map attempt would not be restarted for there is still one 
> speculate map attempt in progressing. 
> As for the second attempt which was started due to having enable map task 
> speculative is pending at UNASSINGED state because of no resource available. 
> But the second map attempt request have lower priority than reduces, so 
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left. 
> and the last map hanged there because of no resource available. so, the job 
> would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905405#comment-14905405
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

thanks for the review [~asuresh]! That is a good suggestion. I attached a new 
patch MAPREDUCE-6484.001.patch, which addressed your comment.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: MAPREDUCE-6484.001.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6334) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler

2015-09-23 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6334:
---
Target Version/s: 2.7.1, 2.6.2  (was: 2.7.1)

Targeting 2.6.2 per Eric's comment in the mailing lists.

> Fetcher#copyMapOutput is leaking usedMemory upon IOException during 
> InMemoryMapOutput shuffle handler
> -
>
> Key: MAPREDUCE-6334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6334
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6334.001.patch, MAPREDUCE-6334.002.patch
>
>
> We are seeing this happen when
> - an NM's disk goes bad during the creation of map output(s)
> - the reducer's fetcher can read the shuffle header and reserve the memory
> - but gets an IOException when trying to shuffle for InMemoryMapOutput
> - shuffle fetch retry is enabled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5935) TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.

2015-09-23 Thread Jorge Gabriel Siqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905183#comment-14905183
 ] 

Jorge Gabriel Siqueira commented on MAPREDUCE-5935:
---

Is there the possibility to js.fs be initialized after your check? If so, could 
it cause some problem (because I think it will never be closed)?

> TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.
> -
>
> Key: MAPREDUCE-5935
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5935
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.1, 1.2.0, 1.2.1
>Reporter: Jinghui Wang
>Assignee: Jinghui Wang
> Attachments: MAPREDUCE-5935.patch
>
>
> The exception is caused by a race condition. The test case calls 
> Jobtracker.offerservice in a seperate thread JTRunner, which initializes the 
> class variable FileSystem fs. In the main thread it tries to close fs in the 
> finally block of code, but at that point jt.fs might still not be 
> initialized, thus causing the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5935) TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.

2015-09-23 Thread Jorge Gabriel Siqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905188#comment-14905188
 ] 

Jorge Gabriel Siqueira commented on MAPREDUCE-5935:
---

Is this issue still reproducible?

> TestMRServerPorts#testTaskTrackerPorts fails with null pointer exception.
> -
>
> Key: MAPREDUCE-5935
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5935
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.1, 1.2.0, 1.2.1
>Reporter: Jinghui Wang
>Assignee: Jinghui Wang
> Attachments: MAPREDUCE-5935.patch
>
>
> The exception is caused by a race condition. The test case calls 
> Jobtracker.offerservice in a seperate thread JTRunner, which initializes the 
> class variable FileSystem fs. In the main thread it tries to close fs in the 
> finally block of code, but at that point jt.fs might still not be 
> initialized, thus causing the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: MAPREDUCE-6484.001.patch

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Attachment: (was: MAPREDUCE-6484.001.patch)

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905790#comment-14905790
 ] 

Hadoop QA commented on MAPREDUCE-6484:
--

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   1m 47s | Tests passed in 
hadoop-mapreduce-client-core. |
| | |  41m 50s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762046/MAPREDUCE-6484.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06d1c90 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6011/console |


This message was automatically generated.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> 

[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

2015-09-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904277#comment-14904277
 ] 

Rohith Sharma K S commented on MAPREDUCE-6485:
--

Looked deeper in to reducer preemption code and find that even if *headroom is 
available for assigning map request* and {{pendingReducers}} are zero and 
{{scheduledReducers}} are more then *neither preemption will be triggered nor 
Ramping down of reducers will happen*. RM always allocates containers to 
reducers. If scheduledReducers are more, then at some point of time cluster 
resources are fully acquired by reducers. Say if reducers memory is 5GB and 
mapper memory is 4GB. Headroom 4GB is available where 1 mapper can be assigned 
to it. But since more reducers requests are there  to assign, RM always skip 
the assignment for reducers since capacity 5GB is greater then 4GB headroom.


> MR job hanged forever because all resources are taken up by reducers and the 
> last map attempt never get resource to run
> ---
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>Reporter: Bob
>Assignee: Xianyin Xin
>Priority: Critical
> Attachments: MAPREDUCE-6485.001.patch
>
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces 
> will take resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running 
> reduces, there is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), 
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to 
> FAILED, but failed map attempt would not be restarted for there is still one 
> speculate map attempt in progressing. 
> As for the second attempt which was started due to having enable map task 
> speculative is pending at UNASSINGED state because of no resource available. 
> But the second map attempt request have lower priority than reduces, so 
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left. 
> and the last map hanged there because of no resource available. so, the job 
> would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

2015-09-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904279#comment-14904279
 ] 

Rohith Sharma K S commented on MAPREDUCE-6485:
--

Overall patch looks good to me.. Can you add test for handling regression?

> MR job hanged forever because all resources are taken up by reducers and the 
> last map attempt never get resource to run
> ---
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>Reporter: Bob
>Assignee: Xianyin Xin
>Priority: Critical
> Attachments: MAPREDUCE-6485.001.patch
>
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces 
> will take resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running 
> reduces, there is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), 
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to 
> FAILED, but failed map attempt would not be restarted for there is still one 
> speculate map attempt in progressing. 
> As for the second attempt which was started due to having enable map task 
> speculative is pending at UNASSINGED state because of no resource available. 
> But the second map attempt request have lower priority than reduces, so 
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left. 
> and the last map hanged there because of no resource available. so, the job 
> would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

2015-09-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904285#comment-14904285
 ] 

Rohith Sharma K S commented on MAPREDUCE-6485:
--

nit : can you check for greater then rather not equal?
{{task.inProgressAttempts.size() != 0}}

> MR job hanged forever because all resources are taken up by reducers and the 
> last map attempt never get resource to run
> ---
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>Reporter: Bob
>Assignee: Xianyin Xin
>Priority: Critical
> Attachments: MAPREDUCE-6485.001.patch
>
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces 
> will take resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running 
> reduces, there is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), 
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to 
> FAILED, but failed map attempt would not be restarted for there is still one 
> speculate map attempt in progressing. 
> As for the second attempt which was started due to having enable map task 
> speculative is pending at UNASSINGED state because of no resource available. 
> But the second map attempt request have lower priority than reduces, so 
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left. 
> and the last map hanged there because of no resource available. so, the job 
> would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6355) 2.5 client cannot communicate with 2.5 job on 2.6 cluster

2015-09-23 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6355:
---
  Labels:   (was: 2.6.1-candidate)
Target Version/s: 2.7.2, 2.6.2

Dropping 2.6.1-candidate label, 2.6.1 is out now. Targetting 2.6.2 / 2.7.2.

> 2.5 client cannot communicate with 2.5 job on 2.6 cluster
> -
>
> Key: MAPREDUCE-6355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>
> Trying to run a job on a Hadoop 2.6 cluster from a Hadoop 2.5 client 
> submitting a job that uses Hadoop 2.5 jars results in a job that succeeds but 
> the client cannot communicate with the AM while the job is running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905430#comment-14905430
 ] 

Arun Suresh commented on MAPREDUCE-6484:


The latest patch looks good.. thanks [~zxu]

+1, pending jenkins

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905644#comment-14905644
 ] 

Hadoop QA commented on MAPREDUCE-6484:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 22s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 
new checkstyle issues (total was 9, now 9). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   1m 46s | Tests passed in 
hadoop-mapreduce-client-core. |
| | |  42m 23s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761995/MAPREDUCE-6484.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1f707ec |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6010/console |


This message was automatically generated.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 

[jira] [Updated] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6484:
-
Hadoop Flags: Reviewed

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> 

[jira] [Commented] (MAPREDUCE-6484) Yarn Client uses local address instead of RM address as token renewer in a secure cluster when RM HA is enabled.

2015-09-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905827#comment-14905827
 ] 

zhihai xu commented on MAPREDUCE-6484:
--

Thanks for the review [~asuresh]! The new patch passed jenkins. I will commit 
it tomorrow if no one objects.

> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled.
> 
>
> Key: MAPREDUCE-6484
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6484
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, security
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6484.001.patch, YARN-4187.000.patch
>
>
> Yarn Client uses local address instead of RM address as token renewer in a 
> secure cluster when RM HA is enabled. This will cause HDFS token renew 
> failure for renewer "nobody"  if the rules from 
> {{hadoop.security.auth_to_local}} exclude the client address in HDFS 
> {{DelegationTokenIdentifier}}.
> The reason why the local address is returned is: When HA is enabled, 
> "yarn.resourcemanager.address" may not be set,  if 
> {{HOSTNAME_PATTERN}}("_HOST") is used in "yarn.resourcemanager.principal", 
> the default address "0.0.0.0:8032" will be used,  Based on the following code 
> at SecurityUtil.java, the local address will be used to replace "0.0.0.0".
> {code}
>   private static String replacePattern(String[] components, String hostname)
>   throws IOException {
> String fqdn = hostname;
> if (fqdn == null || fqdn.isEmpty() || fqdn.equals("0.0.0.0")) {
>   fqdn = getLocalHostName();
> }
> return components[0] + "/" + fqdn.toLowerCase(Locale.US) + "@" + 
> components[2];
>   }
>   static String getLocalHostName() throws UnknownHostException {
> return InetAddress.getLocalHost().getCanonicalHostName();
>   }
>   public static String getServerPrincipal(String principalConfig,
>   InetAddress addr) throws IOException {
> String[] components = getComponents(principalConfig);
> if (components == null || components.length != 3
> || !components[1].equals(HOSTNAME_PATTERN)) {
>   return principalConfig;
> } else {
>   if (addr == null) {
> throw new IOException("Can't replace " + HOSTNAME_PATTERN
> + " pattern since client address is null");
>   }
>   return replacePattern(components, addr.getCanonicalHostName());
> }
>   }
> {code}
> The following is the exception which cause the job fail:
> {code}
> 15/09/12 16:27:24 WARN security.UserGroupInformation: 
> PriviledgedActionException as:t...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: Failed to run job : yarn tries to renew a token 
> with renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:648)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:975)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> java.io.IOException: Failed to run job : yarn tries to renew a token with 
> renewer nobody
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:464)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7109)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:512)
> at 
>