from:"Jian He \(JIRA\)"

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136323#comment-16136323
 ] 

Jian He commented on MAPREDUCE-6838:


Yep, comment race - I just resolved this jira too.

> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, 
> MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, 
> MAPREDUCE-6838-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-21 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6838:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~varun_saxena] and [~rohithsharma] !

> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, 
> MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, 
> MAPREDUCE-6838-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136305#comment-16136305
 ] 

Jian He commented on MAPREDUCE-6838:


I tried to commit to YARN-5355_branch2, but looks like YARN-5355_branch2 has 
compilation error without this patch. [~rohithsharma], [~varun_saxena], can you 
check ?

I've committed the patch to YARN-5355 branch - but I forgot to update the 
aforementioned  codecomment..[~rohithsharma],  [~varun_saxena], maybe you can 
just update it in next whatever patch you have.. 

> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, 
> MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, 
> MAPREDUCE-6838-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136288#comment-16136288
 ] 

Jian He commented on MAPREDUCE-6838:


bq. The code condition is correct. Will change the comment.
No worry, I can fix this at commit, no need to upload a new patch just for this.
bq. Could not find any API to remove the token from UGI. Not sure why. Should 
we add one?
Yeah, I think we can open a jira in hadoop-common for this request, and fix the 
issue later.

I'm committing the patch , thanks

> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, 
> MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, 
> MAPREDUCE-6838-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135858#comment-16135858
 ] 

Jian He commented on MAPREDUCE-6838:


- The comment says is OR condition where as the code is AND, which one is true 
? Also, when will the "delegationToken.getService()" be empty ? looks like the 
NodeTimelineCollectorManager#generateTokenAndSetTimer is always setting the 
service field.
{code}
// Token need not be updated if either address or token service does not
// exist.
String service = delegationToken.getService();
if ((service == null || service.isEmpty()) &&
(collectorAddr == null || collectorAddr.isEmpty())) {
  LOG.warn("Timeline token does not have service and timeline service " +
  "address is not yet set. Not updating the token");
  return;
}
{code}

- Here if this method is called for the first time, timelineServiceAddress is 
null, and collectorAddr is null 
{code}
if (collectorAddr == null || collectorAddr.isEmpty()) {
  collectorAddr = timelineServiceAddress;
}
{code}
 later here, it uses "SecurityUtil.getTokenServiceAddr(timelineToken)" to set 
the token service.  Then next time collectorAddr is not null because 
timelineServiceAddress is not null, it always call 
"NetUtils.createSocketAddr(collectorAddr) " to set the token service. Is my 
understanding correct?  why not just consistently use one of them to make it 
look simpler?
{code}
// Prefer timeline service address over service coming in the token for
// updating the token service.
InetSocketAddress serviceAddr =
(collectorAddr != null && !collectorAddr.isEmpty()) ?
NetUtils.createSocketAddr(collectorAddr) :
SecurityUtil.getTokenServiceAddr(timelineToken);
SecurityUtil.setTokenService(timelineToken, serviceAddr);
authUgi.addToken(timelineToken);
{code}
- Does the collector address change if NM restarts? If so, we may have two 
keys(different address) for two tokens in the UGI. 



> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch, 
> MAPREDUCE-6838-YARN-5355.04.patch, MAPREDUCE-6838-YARN-5355.05.patch, 
> MAPREDUCE-6838-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009
 ] 

Jian He edited comment on MAPREDUCE-6838 at 8/19/17 8:08 AM:
-

today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know why it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
replaced properly - in case ip changes on restart?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?





was (Author: jianhe):
today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know why it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?




> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134014#comment-16134014
 ] 

Jian He commented on MAPREDUCE-6838:


Think one other way would be when we create the token service in 
generateTokenForAppCollector, using the same SecurityUtil#buildTokenService API 
- doing this approach requires AM and NM be consistent on the use_ip config.

> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009
 ] 

Jian He edited comment on MAPREDUCE-6838 at 8/19/17 7:57 AM:
-

today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know why it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?





was (Author: jianhe):
today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know what it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?




> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009
 ] 

Jian He edited comment on MAPREDUCE-6838 at 8/19/17 7:56 AM:
-

today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know what it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?





was (Author: jianhe):
today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know what it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
probably replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?




> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6838) [ATSv2 Security] Add timeline delegation token received in allocate response to UGI

2017-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134009#comment-16134009
 ] 

Jian He commented on MAPREDUCE-6838:


today, for other delegation tokens RMDelegationToken, the old ATSv1 
DelegationToken, the token service is not set at server side, it is set at 
client side - the client call the SecurityUtils#buildTokenService and then set 
the token service. I don't know what it was done like that - maybe because it 
avoids the use_ip config inconsistency between client and serve ?

Should we follow the same ? The client can construct the tokenService based on 
the collector address info ? (One caveat is to make sure the old token gets 
probably replaced properly - in case ip changes ?)
The CollectorInfo#getCollectorAddr right now is a string, should it be an 
address type ?




> [ATSv2 Security] Add timeline delegation token received in allocate response 
> to UGI
> ---
>
> Key: MAPREDUCE-6838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6838
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355
>
> Attachments: MAPREDUCE-6838-YARN-5355.01.patch, 
> MAPREDUCE-6838-YARN-5355.02.patch, MAPREDUCE-6838-YARN-5355.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time

2017-07-12 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084298#comment-16084298
 ] 

Jian He commented on MAPREDUCE-5621:


lgtm

> mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
> 
>
> Key: MAPREDUCE-5621
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.8.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5621-branch-2.02.patch, 
> MAPREDUCE-5621-branch-2.patch, MAPREDUCE-5621.patch
>
>
> mr-jobhistory-daemon.sh executes mkdir and chown command to output the log 
> files.
> This is always executed with or without a directory. In addition, this is 
> executed not only starting daemon but also stopping daemon.
> It add "if" like hadoop-daemon.sh and yarn-daemon.sh and should control it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException

2017-06-05 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037665#comment-16037665
 ] 

Jian He commented on MAPREDUCE-6288:


[~rkanter], your last comment mentioned it is already reverted.
But I still see it in branch-2 and trunk.
There's one revert commit in branch-2 and trunk, but the content of that only 
changed  the CHANGEST.txt. I think we should go ahead and revert the patch from 
trunk and branch-2 ?
{code}
commit 4cf44bef5ca5fee69f712c448f6969e2e046d495
Author: Vinod Kumar Vavilapalli 
Date:   Tue Mar 31 13:29:20 2015 -0700

Reverted MAPREDUCE-6286, MAPREDUCE-6199, and MAPREDUCE-5875 from 
branch-2.7. Editing CHANGES.txt to reflect this.

(cherry picked from commit e428fea73029ea0c3494c71a50c5f6c994888fd2)
{code}

> mapred job -status fails with AccessControlException 
> -
>
> Key: MAPREDUCE-6288
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Priority: Blocker
> Attachments: MAPREDUCE-6288.002.patch, MAPREDUCE-6288-gera-001.patch, 
> MAPREDUCE-6288.patch
>
>
> After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred 
> job -status job_1427080398288_0001}}
> {noformat}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> Permission denied: user=jenkins, access=EXECUTE, 
> inode="/user/history/done":mapred:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
>

[jira] [Updated] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-03-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6852:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks Junping !

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889376#comment-15889376
 ] 

Jian He commented on MAPREDUCE-6852:


lgtm, committing tomorrow  

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition

2017-02-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889206#comment-15889206
 ] 

Jian He commented on MAPREDUCE-6852:


looks like getJobID is used in the same class in several other places, we may 
just use this method.

> Job#updateStatus() failed with NPE due to race condition
> 
>
> Key: MAPREDUCE-6852
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>   at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>   at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
> ensureState(JobState.RUNNING);
> updateStatus();
> return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS

2016-09-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522514#comment-15522514
 ] 

Jian He edited comment on MAPREDUCE-6726 at 9/26/16 9:10 AM:
-

[~srikanth.sampath], thanks for the patch , I looked at it.

IIUC, we are also going to have a different mechanism to retrieve the AM 
address via YARN-4758. The patch right now is hardcoded to depend on registry 
approach only, this part of the code  needs to be made pluggable so that the 
approach listed in YARN-4758 can be plugged in.  We could implement different 
FailoverProvider like RegistryBasedFailoverProvider for this jira or 
RPCBasedFailoverProvider for YARN-4758. 

Regarding the JVMId changes, could you separate that out and upload it on to 
MAPREDUCE-6754 ? we can get that reviewed and committed first. 



was (Author: jianhe):
[~srikanth.sampath], thanks for the patch , I looked at it.

IIUC, we are also going to have a different mechanism to retrieve the AM 
address via YARN-4758. The patch right now is hardcoded to depend on registry 
approach only, this part of the code  needs to be made pluggable so that the 
approach listed in YARN-4758 can be plugged in.  We could implement different 
FailoverProvider like RegistryBasedFailoverProvider or 
RPCBasedFailoverProvider. 

Regarding the JVMId changes, could you separate that out and upload it on to 
MAPREDUCE-6754 ? we can get that reviewed and committed first. 


> YARN Registry based AM discovery with retry and in-flight task persistent via 
> JHS
> -
>
> Key: MAPREDUCE-6726
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster
>Reporter: Junping Du
>Assignee: Srikanth Sampath
> Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.002.patch, WorkPreservingMRAppMaster.pdf
>
>
> Several tasks will be achieved in this JIRA based on the demo patch in 
> MAPREDUCE-6608:
> 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 
> later due to scale up issue.
> 2. Retry logic for TaskUmbilicalProtocol RPC connection
> 3. In-flight task recover after AM restart via JHS
> 4. Configuration to control the behavior compatible with previous when not 
> enable this feature (by default).
> All security related issues and other concerns discussed in MAPREDUCE-6608 
> will be addressed in follow up JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS

2016-09-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522514#comment-15522514
 ] 

Jian He commented on MAPREDUCE-6726:


[~srikanth.sampath], thanks for the patch , I looked at it.

IIUC, we are also going to have a different mechanism to retrieve the AM 
address via YARN-4758. The patch right now is hardcoded to depend on registry 
approach only, this part of the code  needs to be made pluggable so that the 
approach listed in YARN-4758 can be plugged in.  We could implement different 
FailoverProvider like RegistryBasedFailoverProvider or 
RPCBasedFailoverProvider. 

Regarding the JVMId changes, could you separate that out and upload it on to 
MAPREDUCE-6754 ? we can get that reviewed and committed first. 


> YARN Registry based AM discovery with retry and in-flight task persistent via 
> JHS
> -
>
> Key: MAPREDUCE-6726
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster
>Reporter: Junping Du
>Assignee: Srikanth Sampath
> Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.002.patch, WorkPreservingMRAppMaster.pdf
>
>
> Several tasks will be achieved in this JIRA based on the demo patch in 
> MAPREDUCE-6608:
> 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 
> later due to scale up issue.
> 2. Retry logic for TaskUmbilicalProtocol RPC connection
> 3. In-flight task recover after AM restart via JHS
> 4. Configuration to control the behavior compatible with previous when not 
> enable this feature (by default).
> All security related issues and other concerns discussed in MAPREDUCE-6608 
> will be addressed in follow up JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application

2016-08-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436220#comment-15436220
 ] 

Jian He commented on MAPREDUCE-6754:


Thanks for the feedback, Jason, Vinod.
I think we can add a attemptId into the JvmID, given that it's internal only. 
[~srikanth.sampath], your opinion ?

> Container Ids for an yarn application should be monotonically increasing in 
> the scope of the application
> 
>
> Key: MAPREDUCE-6754
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Srikanth Sampath
>Assignee: Srikanth Sampath
>
> Currently across application attempts, container Ids are reused.  The 
> container id is stored in AppSchedulingInfo and it is reinitialized with 
> every application attempt.  So the containerId scope is limited to the 
> application attempt.
> In the MR Framework, It is important to note that the containerId is being 
> used as part of the JvmId.  JvmId has 3 components  containerId>.  The JvmId is used in datastructures in TaskAttemptListener and 
> is passed between the AppMaster and the individual tasks.  For an application 
> attempt, no two tasks have the same JvmId.
> This works well currently, since inflight tasks get killed if the AppMaster 
> goes down.  However, if we want to enable WorkPreserving nature for the AM, 
> containers (and hence containerIds) live across application attempts.  If we 
> recycle containerIds across attempts, then two independent tasks (one 
> inflight from a previous attempt  and another new in a succeeding attempt) 
> can have the same JvmId and cause havoc.
> This can be solved in two ways:
> *Approach A*: Include attempt id as part of the JvmId. This is a viable 
> solution, however, there is a change in the format of the JVMid. Changing 
> something that has existed so long for an optional feature is not persuasive.
> *Approach B*: Keep the container id to be a monotonically increasing id for 
> the life of an application. So, container ids are not reused across 
> application attempts containers should be able to outlive an application 
> attempt. This is the preferred approach as it is clean, logical and is 
> backwards compatible. Nothing changes for existing applications or the 
> internal workings.  
> *How this is achieved:*
> Currently, we maintain latest containerId only for application attempts and 
> reinitialize for new attempts.  With this approach, we retrieve the latest 
> containerId from the just-failed attempt and initialize the new attempt with 
> the latest containerId (instead of 0).   I can provide the patch if it helps. 
>  It currently exists in MAPREDUCE-6726



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application

2016-08-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434421#comment-15434421
 ] 

Jian He commented on MAPREDUCE-6754:


Hi [~jlowe], mind help shedding some light on this ? any reason the JvmID did 
not include the attemptId ? or any problem if we add that.  If we cannot add 
the attempt Id in the JvmID, we'll go with approach B to make 
ContainerId#getContainerId uniq across attempts.

> Container Ids for an yarn application should be monotonically increasing in 
> the scope of the application
> 
>
> Key: MAPREDUCE-6754
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Srikanth Sampath
>Assignee: Srikanth Sampath
>
> Currently across application attempts, container Ids are reused.  The 
> container id is stored in AppSchedulingInfo and it is reinitialized with 
> every application attempt.  So the containerId scope is limited to the 
> application attempt.
> In the MR Framework, It is important to note that the containerId is being 
> used as part of the JvmId.  JvmId has 3 components  containerId>.  The JvmId is used in datastructures in TaskAttemptListener and 
> is passed between the AppMaster and the individual tasks.  For an application 
> attempt, no two tasks have the same JvmId.
> This works well currently, since inflight tasks get killed if the AppMaster 
> goes down.  However, if we want to enable WorkPreserving nature for the AM, 
> containers (and hence containerIds) live across application attempts.  If we 
> recycle containerIds across attempts, then two independent tasks (one 
> inflight from a previous attempt  and another new in a succeeding attempt) 
> can have the same JvmId and cause havoc.
> This can be solved in two ways:
> *Approach A*: Include attempt id as part of the JvmId. This is a viable 
> solution, however, there is a change in the format of the JVMid. Changing 
> something that has existed so long for an optional feature is not persuasive.
> *Approach B*: Keep the container id to be a monotonically increasing id for 
> the life of an application. So, container ids are not reused across 
> application attempts containers should be able to outlive an application 
> attempt. This is the preferred approach as it is clean, logical and is 
> backwards compatible. Nothing changes for existing applications or the 
> internal workings.  
> *How this is achieved:*
> Currently, we maintain latest containerId only for application attempts and 
> reinitialize for new attempts.  With this approach, we retrieve the latest 
> containerId from the just-failed attempt and initialize the new attempt with 
> the latest containerId (instead of 0).   I can provide the patch if it helps. 
>  It currently exists in MAPREDUCE-6726



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler

2016-06-21 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6197:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, thanks Junping !

> Cache MapOutputLocations in ShuffleHandler
> --
>
> Key: MAPREDUCE-6197
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siddharth Seth
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6197.patch
>
>
> ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / 
> index information) when it receives a message.
> This should be caching map info across requests, so that the a scan of all 
> directories is not required for each reducer fetching from the same map.
> Also, the scan for each map output / index file is performed twice per mapId 
> within a request. In populateHeaders - once in the call to getMapOutputInfo, 
> and then directly in the method.
> For an invocation where we do end up with more than 1000 (default) mapIds in 
> a single call, and don't cache them in the map - the path constructed for 
> such entries will be invalid. This is highly unlikely to be the case though, 
> until there's proper caching.
> {code}
> MapOutputInfo info = mapOutputInfoMap.get(mapId);
>   if (info == null) {
> info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler

2016-06-15 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333020#comment-15333020
 ] 

Jian He commented on MAPREDUCE-6197:


lgtm, 
one question is how/why do you choose such policy for determining the weight ?
{code}
maximumWeight(MAX_WEIGHT).weigher(
  new Weigher() {
@Override
public int weigh(AttemptPathIdentifier key,
AttemptPathInfo value) {
  return key.jobId.length() + key.user.length() +
  key.attemptId.length()+
  value.indexPath.toString().length() +
  value.dataPath.toString().length();
}
  }
  )
{code}

> Cache MapOutputLocations in ShuffleHandler
> --
>
> Key: MAPREDUCE-6197
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siddharth Seth
>Assignee: Junping Du
> Attachments: MAPREDUCE-6197.patch
>
>
> ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / 
> index information) when it receives a message.
> This should be caching map info across requests, so that the a scan of all 
> directories is not required for each reducer fetching from the same map.
> Also, the scan for each map output / index file is performed twice per mapId 
> within a request. In populateHeaders - once in the call to getMapOutputInfo, 
> and then directly in the method.
> For an invocation where we do end up with more than 1000 (default) mapIds in 
> a single call, and don't cache them in the map - the path constructed for 
> such entries will be invalid. This is highly unlikely to be the case though, 
> until there's proper caching.
> {code}
> MapOutputInfo info = mapOutputInfoMap.get(mapId);
>   if (info == null) {
> info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers

2016-05-24 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6703:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2,  thanks [~asuresh] !

> Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
> --
>
> Key: MAPREDUCE-6703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6703.001.patch, MAPREDUCE-6703.002.patch, 
> MAPREDUCE-6703.003.patch
>
>
> YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes 
> and specifically OPPORTUNISTIC containers.
> The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to 
> provide hints via config to the MR framework as to the number of containers 
> it would like to schedule as OPPORTUNISTIC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers

2016-05-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298653#comment-15298653
 ] 

Jian He commented on MAPREDUCE-6703:


lgtm, I can commit later today if no comments from others.

> Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
> --
>
> Key: MAPREDUCE-6703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: MAPREDUCE-6703.001.patch, MAPREDUCE-6703.002.patch, 
> MAPREDUCE-6703.003.patch
>
>
> YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes 
> and specifically OPPORTUNISTIC containers.
> The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to 
> provide hints via config to the MR framework as to the number of containers 
> it would like to schedule as OPPORTUNISTIC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6696:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.9.0
Target Version/s: 2.9.0
  Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, thanks Zhihai !

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers

2016-05-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292491#comment-15292491
 ] 

Jian He commented on MAPREDUCE-6703:


I see, just few minor comments:
- could you add comments to the newly added config about what this config means 
?
- Here, we can just call  addOpportunisticResourceRequest and so 
addContainerReq method does not need to be refactored. 
{code}
maps.put(event.getAttemptID(), request); 
addContainerReq(request, ExecutionType.OPPORTUNISTIC);
{code}

> Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
> --
>
> Key: MAPREDUCE-6703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: MAPREDUCE-6703.001.patch
>
>
> YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes 
> and specifically OPPORTUNISTIC containers.
> The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to 
> provide hints via config to the MR framework as to the number of containers 
> it would like to schedule as OPPORTUNISTIC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers

2016-05-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292343#comment-15292343
 ] 

Jian He commented on MAPREDUCE-6703:


I see. depending on how locality sensitive the MR job is, this may not benefit 
as much. Wonder whether you have statistics to show how much this improves, or 
this is mainly for example purpose ?

> Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
> --
>
> Key: MAPREDUCE-6703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: MAPREDUCE-6703.001.patch
>
>
> YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes 
> and specifically OPPORTUNISTIC containers.
> The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to 
> provide hints via config to the MR framework as to the number of containers 
> it would like to schedule as OPPORTUNISTIC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6703) Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers

2016-05-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292313#comment-15292313
 ] 

Jian He commented on MAPREDUCE-6703:


looks like the locality is ignored for opportunistic containers, does YARN-2877 
consider locality for opportunistic containers ?

> Add flag to allow MapReduce AM to request for OPPORTUNISTIC containers
> --
>
> Key: MAPREDUCE-6703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6703
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: MAPREDUCE-6703.001.patch
>
>
> YARN-2882 and YARN-4335 introduces the concept of container ExecutionTypes 
> and specifically OPPORTUNISTIC containers.
> The default ExecutionType is GUARANTEED. This JIRA proposes to allow users to 
> provide hints via config to the MR framework as to the number of containers 
> it would like to schedule as OPPORTUNISTIC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292106#comment-15292106
 ] 

Jian He commented on MAPREDUCE-6696:


lgtm, +1

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch, MAPREDUCE-6696.003.patch, MAPREDUCE-6696.004.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290171#comment-15290171
 ] 

Jian He commented on MAPREDUCE-6696:


also, may be throw IllegalArgumentException instead of RuntimeException ?

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290170#comment-15290170
 ] 

Jian He commented on MAPREDUCE-6696:


I see, thanks for your explanation. patch looks good to me, minor nit:
may be useful to print the current number of map tasks too in the exception 
message ?  just to be more clear.
{code}
new RuntimeException("The number of map tasks exceeded limit " +
maxMaps);
{code}

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6696) Add a configuration to limit the number of map tasks allowed per job.

2016-05-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287888#comment-15287888
 ] 

Jian He commented on MAPREDUCE-6696:


I think the MRJobConfig.NUM_MAPS is giving a hint about, not the actual, number 
of maps.
Btw, seems JobImpl#checkTaskLimits was the very initial code for the task 
limit. I guess it was removed when YARN got created based on git history

> Add a configuration to limit the number of map tasks allowed per job.
> -
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 2.8.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-05-13 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6513:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed to branch-2.7, thanks Wangda !
Thanks Varun for reviewing the patch !

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-6099) Adding getSplits(JobContext job, List stats) to mapreduce CombineFileInputFormat

2016-05-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved MAPREDUCE-6099.

Resolution: Won't Fix

Close as Jason mentioned 

> Adding  getSplits(JobContext job, List stats) to mapreduce 
> CombineFileInputFormat
> -
>
> Key: MAPREDUCE-6099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6099
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.1
>Reporter: Pankit Thapar
>Priority: Critical
> Attachments: MAPREDUCE-6099.patch
>
>
> Currently we have getSplits(JobContext job) in CombineFileInputFormat. 
> This api does not give freedom to the client to create a list if file status 
> it self and then create splits on the resultant List stats.
> The client might be able to perform some filtering on its end on the File 
> sets in the input paths. For the reasons, above it would be a good idea to 
> have getSplits(JobContext, List).
> Please let me know what you think about this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-4758) jobhistory web ui not showing correct # failed reducers

2016-05-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4758:
---
Target Version/s: 2.9.0  (was: 2.8.0)
Priority: Major  (was: Critical)

An improvement on the UI.
Unlikely, this will get done. move out

> jobhistory web ui not showing correct # failed reducers
> ---
>
> Key: MAPREDUCE-4758
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4758
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, webapps
>Affects Versions: 0.23.4
>Reporter: Thomas Graves
>
> we had a job fail due to a reducer failing 4 times.  Unfortunately the job 
> history UI didn't show  this particular failed reducer which lead to 
> confusion as to why the job failed. 
> This reducer failed to launch all 4 task attempts with a Token Expiration 
> error and the jobhistory file only gets an event when the task attempt 
> transitions to launched.  The webapp JobInfo object only counts the task 
> attempts in the jobhistory file to display under the "Attempt Type" table, so 
> since this task didn't have an attempt with it, it did show it on the UI.
> We need to reconcile the task list with the task attempts or also shows more 
> stats for the tasks vs task attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar

2016-05-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4683:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

I guess this could break existing script , close

> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar
> 
>
> Key: MAPREDUCE-4683
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Arun C Murthy
>Assignee: Akira AJISAKA
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4683.patch
>
>
> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-05-11 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280963#comment-15280963
 ] 

Jian He commented on MAPREDUCE-6513:


looks like TaskAttemptKillEvent will be sent twice for each mapper 
First at below code in RMContainerAllocator#handleUpdatedNodes,  JobImpl will 
in turn send the  TaskAttemptKillEvent event for each mapper on the unusable 
node.
{code}
  // send event to the job to act upon completed tasks
  eventHandler.handle(new JobUpdatedNodesEvent(getJob().getID(),
  updatedNodes));
{code}
Second time at this code in the same method  
{code}
// If map, reschedule next task attempt.
boolean rescheduleNextAttempt = (i == 0) ? true : false;
eventHandler.handle(new TaskAttemptKillEvent(tid,
"TaskAttempt killed because it ran on unusable node"
+ taskAttemptNodeId, rescheduleNextAttempt));
  }
{code}

This is how it was long time ago, Not sure why that is.  With the new change, 
will this cause more container requests get scheduled ?

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6680) JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

2016-04-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6680:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8, branch-2.7, thanks Junping !

> JHS UserLogDir scan algorithm sometime could skip directory with update in 
> CloudFS (Azure FileSystem, S3, etc.)
> ---
>
> Key: MAPREDUCE-6680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6680
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: Azure, S3
> Fix For: 2.8.0, 2.7.3
>
> Attachments: MAPREDUCE-6680-v2.patch, MAPREDUCE-6680-v3.patch, 
> MAPREDUCE-6680.patch
>
>
> In our cluster based on a Cloud FileSystem, we notice JHS sometimes could 
> skip directory with .jhist file in scanning.
> The behavior is like:
> First round scan, doesn't found .jhist file:
> {noformat}
> 16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a 
> directory with 6 files in it.
> 16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
> ...
> {noformat}
> Then, we see "Scan not needed of ..." for the same directory every 3 minutes 
> until application failed as timeout.
> From our analysis, we found the root cause is: most of Cloud File System 
> (Azure FS, S3, etc.) is truncating file/directory modification time to 
> seconds instead of milliseconds - which could due to limit of http protocol 
> (from discussion at: 
> https://forums.aws.amazon.com/thread.jspa?messageID=476615). 
> So if the time sequence is happen to be: latest non .jhist file modification 
> on directory happens at T1, directory scanning happens at T2, .jhist file 
> added to directory at T3. If we have {{T1< T2 < T3}} and T1 is equal to T3 
> after truncating to seconds, this issue could appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6619) HADOOP_CLASSPATH is overwritten in MR container

2016-01-27 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6619:
---
   Resolution: Fixed
Fix Version/s: 2.6.4
   2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed to  branch-2, branch-2.8, branch-2.7, branch-2.6, thanks [~djp] !
Thanks [~shanyu] for reviewing the patch !

> HADOOP_CLASSPATH is overwritten in MR container
> ---
>
> Key: MAPREDUCE-6619
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6619
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.8.0, 2.7.2
>Reporter: shanyu zhao
>Assignee: Junping Du
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: MAPREDUCE-6619-branch-2.patch
>
>
> Previously env variable HADOOP_CLASSPAH in MR containers inherit from 
> defaults of the worker node. MAPREDUCE-6454 introduced change to overwrite 
> HADOOP_CLASSPATH completely. This caused regression. We need to add 
> additional entries to HADOOP_CLASSPATH instead of completely replacing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6619) HADOOP_CLASSPATH is overwritten in MR container

2016-01-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118626#comment-15118626
 ] 

Jian He commented on MAPREDUCE-6619:


lgtm , committing. 

> HADOOP_CLASSPATH is overwritten in MR container
> ---
>
> Key: MAPREDUCE-6619
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6619
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.8.0, 2.7.2
>Reporter: shanyu zhao
>Assignee: Junping Du
> Attachments: MAPREDUCE-6619-branch-2.patch
>
>
> Previously env variable HADOOP_CLASSPAH in MR containers inherit from 
> defaults of the worker node. MAPREDUCE-6454 introduced change to overwrite 
> HADOOP_CLASSPATH completely. This caused regression. We need to add 
> additional entries to HADOOP_CLASSPATH instead of completely replacing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6610) JobHistoryEventHandler should not swallow timeline response

2016-01-25 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6610:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8  thanks [~gtCarrera]
Thanks [~Naganarasimha] for reviewing !

> JobHistoryEventHandler should not swallow timeline response
> ---
>
> Key: MAPREDUCE-6610
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6610
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6610-trunk.001.patch, 
> MAPREDUCE-6610-trunk.002.patch
>
>
> As discussed in YARN-4596, JobHistoryEventHandler should process and log 
> timeline put errors after the timeline put call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6610) JobHistoryEventHandler should not swallow timeline response

2016-01-25 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116339#comment-15116339
 ] 

Jian He commented on MAPREDUCE-6610:


lgtm, 

> JobHistoryEventHandler should not swallow timeline response
> ---
>
> Key: MAPREDUCE-6610
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6610
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Trivial
> Attachments: MAPREDUCE-6610-trunk.001.patch, 
> MAPREDUCE-6610-trunk.002.patch
>
>
> As discussed in YARN-4596, JobHistoryEventHandler should process and log 
> timeline put errors after the timeline put call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API

2015-11-16 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007094#comment-15007094
 ] 

Jian He commented on MAPREDUCE-5485:


lgtm, committing 

> Allow repeating job commit by extending OutputCommitter API
> ---
>
> Key: MAPREDUCE-5485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta
>Reporter: Nemon Lou
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, 
> MAPREDUCE-5485-v5.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API

2015-11-16 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007097#comment-15007097
 ] 

Jian He commented on MAPREDUCE-5485:


[~djp], there are some findbugs and ut failures, mind checking ?

> Allow repeating job commit by extending OutputCommitter API
> ---
>
> Key: MAPREDUCE-5485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta
>Reporter: Nemon Lou
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, 
> MAPREDUCE-5485-v5.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API

2015-11-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5485:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.7.3
Target Version/s: 2.7.3  (was: 2.6.3, 2.7.3)
  Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.7, thanks Junping !
thanks Bikas for reviewing !

> Allow repeating job commit by extending OutputCommitter API
> ---
>
> Key: MAPREDUCE-5485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta
>Reporter: Nemon Lou
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, 
> MAPREDUCE-5485-v5-branch-2.7.patch, MAPREDUCE-5485-v5.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-09-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744709#comment-14744709
 ] 

Jian He commented on MAPREDUCE-5870:


I see, I'm ok to keep it supporting the enum as that's what I originally 
thought.  Just want to bring this up.

> Support for passing Job priority through Application Submission Context in 
> Mapreduce Side
> -
>
> Key: MAPREDUCE-5870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
> 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, 
> 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch
>
>
> Job Prioirty can be set from client side as below [Configuration and api].
>   a.  JobConf.getJobPriority() and 
> Job.setPriority(JobPriority priority) 
>   b.  We can also use configuration 
> "mapreduce.job.priority".
>   Now this Job priority can be passed in Application Submission 
> context from Client side.
>   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-09-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743154#comment-14743154
 ] 

Jian He commented on MAPREDUCE-5870:


I earlier thought we can keep backward compatible with the enum priority, but 
now am thinking the value of doing this. This does bring extra complexity to 
support both. [~jlowe], do you know if there are many apps from MR1 are 
actually expecting this enum based priority to work ? Since priority is never 
supported since hadoop 2 for such a long time, I'm thinking if we can deprecate 
the old API and claim only support integers to be simple and clear.

> Support for passing Job priority through Application Submission Context in 
> Mapreduce Side
> -
>
> Key: MAPREDUCE-5870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
> 0003-MAPREDUCE-5870.patch, 0004-MAPREDUCE-5870.patch, 
> 0005-MAPREDUCE-5870.patch, 0006-MAPREDUCE-5870.patch, Yarn-2002.1.patch
>
>
> Job Prioirty can be set from client side as below [Configuration and api].
>   a.  JobConf.getJobPriority() and 
> Job.setPriority(JobPriority priority) 
>   b.  We can also use configuration 
> "mapreduce.job.priority".
>   Now this Job priority can be passed in Application Submission 
> context from Client side.
>   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-07-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642080#comment-14642080
 ] 

Jian He commented on MAPREDUCE-5870:


[~sunilg], thanks for updating,
-  should we use the JobConf.getJobPriority API so that it can accept the 
current CLI specified priority too ?
{code}
String jobPriority = jobConf.get(MRJobConfig.PRIORITY);
{code}
- to simplify a little bit, 
{code}
int iPriority = TypeConverter.toYarn(jobPriority);
// If the given input not a JobPriority enum, verify whether its an
// integer.
if (0 == iPriority) {
  iPriority = Integer.parseInt(jobPriority);
}
{code}
we can do something like
{code}
try {
  iPriority = TypeConverter.toYarn(jobPriority);
} catch (IllegalArgumentException exception) {
  iPriority = Integer.parseInt(jobPriority);
}
{code}

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, 0002-MAPREDUCE-5870.patch, 
 Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-07-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640790#comment-14640790
 ] 

Jian He commented on MAPREDUCE-5870:


patch looks good to me, triggering jenkins

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-07-24 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5870:
---
Status: Patch Available  (was: Open)

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5870) Support for passing Job priority through Application Submission Context in Mapreduce Side

2015-07-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641134#comment-14641134
 ] 

Jian He commented on MAPREDUCE-5870:


[~sunilg], could you check if the test failure is related ? 

 Support for passing Job priority through Application Submission Context in 
 Mapreduce Side
 -

 Key: MAPREDUCE-5870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5870
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-MAPREDUCE-5870.patch, Yarn-2002.1.patch


 Job Prioirty can be set from client side as below [Configuration and api].
   a.  JobConf.getJobPriority() and 
 Job.setPriority(JobPriority priority) 
   b.  We can also use configuration 
 mapreduce.job.priority.
   Now this Job priority can be passed in Application Submission 
 context from Client side.
   Here we can reuse the MRJobConfig.PRIORITY configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-01 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6350:
---
Component/s: jobhistoryserver

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-01 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He moved YARN-1614 to MAPREDUCE-6350:
--

Key: MAPREDUCE-6350  (was: YARN-1614)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key

2015-01-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296073#comment-14296073
 ] 

Jian He commented on MAPREDUCE-6230:


+1

 MR AM does not survive RM restart if RM activated a new AMRM secret key
 ---

 Key: MAPREDUCE-6230
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-6230.001.patch


 A MapReduce AM will fail to reconnect to an RM that performed restart in the 
 following scenario:
 # MapReduce job launched with AMRM token generated from AMRM secret X
 # RM rolls new AMRM secret Y and activates the new key
 # RM performs a work-preserving restart
 # MapReduce job AM now unable to connect to RM with Invalid AMRMToken 
 exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key

2015-01-28 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6230:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2,  thanks Jason !

 MR AM does not survive RM restart if RM activated a new AMRM secret key
 ---

 Key: MAPREDUCE-6230
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6230.001.patch


 A MapReduce AM will fail to reconnect to an RM that performed restart in the 
 following scenario:
 # MapReduce job launched with AMRM token generated from AMRM secret X
 # RM rolls new AMRM secret Y and activates the new key
 # RM performs a work-preserving restart
 # MapReduce job AM now unable to connect to RM with Invalid AMRMToken 
 exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.

2014-11-25 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5568:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2,  thanks [~minjikim] !

 JHS returns invalid string for reducer completion percentage if AM restarts 
 with 0 reducer.
 ---

 Key: MAPREDUCE-5568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Jian He
Assignee: MinJi Kim
 Fix For: 2.7.0

 Attachments: 5568.patch01, 5568.patch02, 5568.patch03, 5568.patch04


 JobCLient shows like:
 {code}
 13/10/05 16:26:09 INFO mapreduce.Job:  map 100% reduce NaN%
 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed 
 successfully
 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26
   File System Counters
   FILE: Number of bytes read=0
   FILE: Number of bytes written=76741
   FILE: Number of read operations=0
   FILE: Number of large read operations=0
   FILE: Number of write operations=0
   HDFS: Number of bytes read=48
   HDFS: Number of bytes written=0
   HDFS: Number of read operations=1
   HDFS: Number of large read operations=0
   HDFS: Number of write operations=0
 {code}
 With mapped job -status command, it shows:
 {code}
 Uber job : false
 Number of maps: 1
 Number of reduces: 0
 map() completion: 1.0
 reduce() completion: NaN
 Job state: SUCCEEDED
 retired: false
 reason for failure:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

2014-11-25 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225462#comment-14225462
]

Jian He commented on MAPREDUCE-5785:

After this patch, job somehow fails due to not able to launch task container
{{Error: Could not find or load main class null}}. (might be my own setup
problem)

Derive heap size or mapreduce.*.memory.mb automatically
---

Key: MAPREDUCE-5785
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Fix For: 3.0.0

Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch,
MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch

Currently users have to set 2 memory-related configs per Job / per task type.
One first chooses some container size map reduce.\*.memory.mb and then a
corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This
makes sure that the JVM's C-heap (native memory + Java heap) does not exceed
this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be
- allocating big containers whereas the JVM will only use the default
-Xmx200m.
- allocating small containers that will OOM because Xmx is too high.
With this JIRA, we propose to set Xmx automatically based on an empirical
ratio that can be adjusted. Xmx is not changed automatically if provided by
the user.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.

2014-11-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221569#comment-14221569
 ] 

Jian He commented on MAPREDUCE-5568:


[~minjikim], thanks for your contribution. patch looks good, just some format 
issues:
- the convention is to use two spaces for indentation.
{code}
if ( getTotalMaps() == 0 ) {
report.setMapProgress(1.0f);
} else {
report.setMapProgress((float) getCompletedMaps() / getTotalMaps());
}
if ( getTotalReduces() == 0 ) {
report.setReduceProgress(1.0f);
} else {
report.setReduceProgress((float) getCompletedReduces() / getTotalReduces());
}
{code}

 JHS returns invalid string for reducer completion percentage if AM restarts 
 with 0 reducer.
 ---

 Key: MAPREDUCE-5568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Jian He
Assignee: MinJi Kim
 Attachments: 5568.patch01, 5568.patch02, 5568.patch03


 JobCLient shows like:
 {code}
 13/10/05 16:26:09 INFO mapreduce.Job:  map 100% reduce NaN%
 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed 
 successfully
 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26
   File System Counters
   FILE: Number of bytes read=0
   FILE: Number of bytes written=76741
   FILE: Number of read operations=0
   FILE: Number of large read operations=0
   FILE: Number of write operations=0
   HDFS: Number of bytes read=48
   HDFS: Number of bytes written=0
   HDFS: Number of read operations=1
   HDFS: Number of large read operations=0
   HDFS: Number of write operations=0
 {code}
 With mapped job -status command, it shows:
 {code}
 Uber job : false
 Number of maps: 1
 Number of reduces: 0
 map() completion: 1.0
 reduce() completion: NaN
 Job state: SUCCEEDED
 retired: false
 reason for failure:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6048) TestJavaSerialization fails in trunk build

2014-11-04 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6048:
---
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

looks good.
Committed to trunk, branch-2, branch-2.6, thanks Varun!

 TestJavaSerialization fails in trunk build
 --

 Key: MAPREDUCE-6048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6048
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.6.0

 Attachments: apache-mapreduce-6048.0.patch


 This happened in builds #1871 and #1872
 {code}
 testMapReduceJob(org.apache.hadoop.mapred.TestJavaSerialization)  Time 
 elapsed: 2.784 sec   FAILURE!
 junit.framework.ComparisonFailure: expected:[a   ]1 but was:[0 1]1
   at junit.framework.Assert.assertEquals(Assert.java:100)
   at junit.framework.Assert.assertEquals(Assert.java:107)
   at junit.framework.TestCase.assertEquals(TestCase.java:269)
   at 
 org.apache.hadoop.mapred.TestJavaSerialization.testMapReduceJob(TestJavaSerialization.java:127)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type

2014-10-22 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180218#comment-14180218
 ] 

Jian He commented on MAPREDUCE-6126:


make sense,  +1 

 (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: 
 JobBuilder.process(HistoryEvent): unknown event type
 --

 Key: MAPREDUCE-6126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch


 java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown 
 event type 
 at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) 
 at 
 org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305)
 at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
 at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type

2014-10-22 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6126:
---
   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.6. thanks Junping !

 (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: 
 JobBuilder.process(HistoryEvent): unknown event type
 --

 Key: MAPREDUCE-6126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.6.0

 Attachments: MAPREDUCE-6126-v2.patch, MAPREDUCE-6126.patch


 java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown 
 event type 
 at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) 
 at 
 org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305)
 at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
 at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6126) (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown event type

2014-10-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179529#comment-14179529
 ] 

Jian He commented on MAPREDUCE-6126:


in JobHistoryEventHandler, seems we already skip writing this event
{code}
HistoryEvent historyEvent = event.getHistoryEvent();
if (! (historyEvent instanceof NormalizedResourceEvent)) {
  mi.writeEvent(historyEvent);
}
{code}

 (Rumen) Rumen tool returns error ava.lang.IllegalArgumentException: 
 JobBuilder.process(HistoryEvent): unknown event type
 --

 Key: MAPREDUCE-6126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6126
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6126.patch


 java.lang.IllegalArgumentException: JobBuilder.process(HistoryEvent): unknown 
 event type 
 at org.apache.hadoop.tools.rumen.JobBuilder.process(JobBuilder.java:172) 
 at 
 org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:305)
 at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:259) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
 at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:186) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong

2014-09-26 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-6087:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks [~ajisakaa] !

 MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
 

 Key: MAPREDUCE-6087
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch


 The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS 
 now has double prefix as yarn.app.mapreduce. + 
 yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong

2014-09-22 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143583#comment-14143583
 ] 

Jian He commented on MAPREDUCE-6087:


mapred-default.xml has the correct name, which is good.
{code}
  nameyarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts/name
  value3/value
{code}
Thanks [~ajisakaa] for working on the issue !

 MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
 

 Key: MAPREDUCE-6087
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch


 The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS 
 now has double prefix as yarn.app.mapreduce. + 
 yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong

2014-09-12 Thread Jian He (JIRA)

Jian He created MAPREDUCE-6087:
--

 Summary: MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS 
config name is wrong
 Key: MAPREDUCE-6087
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He


The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now 
has double prefix as yarn.app.mapreduce. + 
yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5910:
---

Status: Open  (was: Patch Available)

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5910:
---

Attachment: MAPREDUCE-5910.4.patch

I see, thanks for investigating.  added one code comment myself, re-submit the 
patch.

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5910:
---

Status: Patch Available  (was: Open)

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065309#comment-14065309
 ] 

Jian He commented on MAPREDUCE-5910:


committing this.

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch, MAPREDUCE-5910.4.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-16 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063747#comment-14063747
 ] 

Jian He commented on MAPREDUCE-5910:


Thanks for updating the patch!
patch looks good. submit to jenkins

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5910:
---

Status: Patch Available  (was: Open)

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-16 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063985#comment-14063985
 ] 

Jian He commented on MAPREDUCE-5910:


Rohith, can you look into the test failures? thanks!

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch, 
 MAPREDUCE-5910.3.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5910:
---

Status: Open  (was: Patch Available)

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5910) MRAppMaster should handle Resync from RM instead of shutting down.

2014-07-10 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058156#comment-14058156
 ] 

Jian He commented on MAPREDUCE-5910:


patch looks good over all, some comments:
- addOutstandingAllocateRequestOnResync -addOutstandingRequestsOnResync
- MR_RM_WORKPRESERVING_RESTART_ENABLED flag is not needed any more, given that 
AM_RESYNC and AM_SHUTDOWN commands now are sent in different cases.

 MRAppMaster should handle Resync from RM instead of shutting down.
 --

 Key: MAPREDUCE-5910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5910
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5910.1.patch, MAPREDUCE-5910.2.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The MRAppMaster behavior is expected to change 
 to calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

2014-06-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034664#comment-14034664
 ] 

Jian He commented on MAPREDUCE-5924:


LGTM， +1

 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at 
 COMMIT_PENDING'
 

 Key: MAPREDUCE-5924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5924.1.patch


 The Sort job over 1GB data failed with below error
 {code}
 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] 
 SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
 job_1402304714683_0002 (auth:SIMPLE)
 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
 from attempt_1402304714683_0002_r_15_1000
 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1402304714683_0002_r_15_1000
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_COMMIT_PENDING at COMMIT_PENDING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:722)
 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1402304714683_0002Job Transitioned from RUNNING to ERROR
 {code}
 The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

2014-06-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034668#comment-14034668
 ] 

Jian He commented on MAPREDUCE-5924:


committing..

 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at 
 COMMIT_PENDING'
 

 Key: MAPREDUCE-5924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5924.1.patch


 The Sort job over 1GB data failed with below error
 {code}
 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] 
 SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
 job_1402304714683_0002 (auth:SIMPLE)
 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
 from attempt_1402304714683_0002_r_15_1000
 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1402304714683_0002_r_15_1000
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_COMMIT_PENDING at COMMIT_PENDING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:722)
 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1402304714683_0002Job Transitioned from RUNNING to ERROR
 {code}
 The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

2014-06-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034681#comment-14034681
 ] 

Jian He commented on MAPREDUCE-5924:


Zhijie, can you open jira for the exception issue on Windows you mentioned? thx

 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at 
 COMMIT_PENDING'
 

 Key: MAPREDUCE-5924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5924.1.patch


 The Sort job over 1GB data failed with below error
 {code}
 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] 
 SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
 job_1402304714683_0002 (auth:SIMPLE)
 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
 from attempt_1402304714683_0002_r_15_1000
 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1402304714683_0002_r_15_1000
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_COMMIT_PENDING at COMMIT_PENDING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:722)
 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1402304714683_0002Job Transitioned from RUNNING to ERROR
 {code}
 The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

2014-06-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5924:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, Thanks Zhijie!

 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at 
 COMMIT_PENDING'
 

 Key: MAPREDUCE-5924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5924.1.patch


 The Sort job over 1GB data failed with below error
 {code}
 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] 
 SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
 job_1402304714683_0002 (auth:SIMPLE)
 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
 from attempt_1402304714683_0002_r_15_1000
 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1402304714683_0002_r_15_1000
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_COMMIT_PENDING at COMMIT_PENDING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:722)
 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1402304714683_0002Job Transitioned from RUNNING to ERROR
 {code}
 The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5924) Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at COMMIT_PENDING'

2014-06-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5924:
---

Fix Version/s: 2.5.0

 Windows: Sort Job failed due to 'Invalid event: TA_COMMIT_PENDING at 
 COMMIT_PENDING'
 

 Key: MAPREDUCE-5924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5924.1.patch


 The Sort job over 1GB data failed with below error
 {code}
 2014-06-09 09:15:38,746 INFO [Socket Reader #1 for port 63415] 
 SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
 job_1402304714683_0002 (auth:SIMPLE)
 2014-06-09 09:15:38,750 INFO [IPC Server handler 13 on 63415] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
 from attempt_1402304714683_0002_r_15_1000
 2014-06-09 09:15:38,751 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1402304714683_0002_r_15_1000
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_COMMIT_PENDING at COMMIT_PENDING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1058)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:145)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1271)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1263)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:722)
 2014-06-09 09:15:38,753 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1402304714683_0002Job Transitioned from RUNNING to ERROR
 {code}
 The JobHistory Url prints job state = ERROR



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures

2014-05-23 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007680#comment-14007680
 ] 

Jian He commented on MAPREDUCE-5900:


Patch looks good overall.
I think we need test case to verify the state of the attempt is actually going 
to killed state. Maybe we can combine the test cases from MAPREDUCE-5848? we 
can give credit to both.

 Container preemption interpreted as task failures and eventually job failures 
 --

 Key: MAPREDUCE-5900
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch


 We have Added preemption exit code needs to be incorporated
 MR needs to recognize the special exit code value of -102 and interpret it as 
 a container being killed instead of a container failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-5838) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently

2014-04-15 Thread Jian He (JIRA)

Jian He created MAPREDUCE-5838:
--

 Summary: 
TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing 
intermittently
 Key: MAPREDUCE-5838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5838
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows

2014-04-12 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5832:
---

Status: Patch Available  (was: Open)

 TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
 

 Key: MAPREDUCE-5832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5832.1.patch


 java.lang.Exception: test timed out after 1000 milliseconds
   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
   at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
   at java.net.InetAddress.getLocalHost(InetAddress.java:1434)
   at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174)
   at sun.security.krb5.Config.getDefaultRealm(Config.java:1081)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75)
   at 
 org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246)
   at 
 org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233)
   at 
 org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719)
   at 
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704)
   at 
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
   at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
   at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460)
   at 
 org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-5832) TestJobClient fails sometimes on Windows

2014-04-11 Thread Jian He (JIRA)

Jian He created MAPREDUCE-5832:
--

 Summary: TestJobClient fails sometimes on Windows
 Key: MAPREDUCE-5832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5832) TestJobClient fails sometimes on Windows

2014-04-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5832:
---

Attachment: MAPREDUCE-5832.1.patch

Increased the timeout, did not find problem with the current test

 TestJobClient fails sometimes on Windows
 

 Key: MAPREDUCE-5832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5832.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows

2014-04-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5832:
---

Description: 
java.lang.Exception: test timed out after 1000 milliseconds
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
at java.net.InetAddress.getLocalHost(InetAddress.java:1434)
at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174)
at sun.security.krb5.Config.getDefaultRealm(Config.java:1081)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75)
at 
org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606)
at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81)
at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460)
at 
org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74)

 TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
 

 Key: MAPREDUCE-5832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5832.1.patch


 java.lang.Exception: test timed out after 1000 milliseconds
   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
   at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
   at java.net.InetAddress.getLocalHost(InetAddress.java:1434)
   at sun.security.krb5.Config.getRealmFromDNS(Config.java:1174)
   at sun.security.krb5.Config.getDefaultRealm(Config.java:1081)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75)
   at 
 org.apache.hadoop.security.authentication.util.KerberosName.clinit(KerberosName.java:85)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:246)
   at 
 org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:233)
   at 
 org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:719)
   at 
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:704)
   at 
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:606)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81)
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
   at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
   at org.apache.hadoop.mapred.JobClient.init(JobClient.java:460)
   at 
 org.apache.hadoop.mapred.TestJobClient.testGetStagingAreaDir(TestJobClient.java:74)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5832) TestJobClient#testGetStagingAreaDir timeout sometimes on Windows

2014-04-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5832:
---

Summary: TestJobClient#testGetStagingAreaDir timeout sometimes on Windows  
(was: TestJobClient fails sometimes on Windows)

 TestJobClient#testGetStagingAreaDir timeout sometimes on Windows
 

 Key: MAPREDUCE-5832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5832.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5655) Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath

2014-04-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964967#comment-13964967
 ] 

Jian He commented on MAPREDUCE-5655:


Please refer to MAPREDUCE-4052 for the fix,  the patch uploaded here is dead.

 Remote job submit from windows to a linux hadoop cluster fails due to wrong 
 classpath
 -

 Key: MAPREDUCE-5655
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5655
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, job submission
Affects Versions: 2.2.0, 2.3.0
 Environment: Client machine is a Windows 7 box, with Eclipse
 Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any 
 linux)
Reporter: Attila Pados
Assignee: Joyoung Zhang
 Attachments: MRApps.patch, YARNRunner.patch


 I was trying to run a java class on my client, windows 7 developer 
 environment, which submits a job to the remote Hadoop cluster, initiates a 
 mapreduce there, and then downloads the results back to the local machine.
 General use case is to use hadoop services from a web application installed 
 on a non-cluster computer, or as part of a developer environment.
 The problem was, that the ApplicationMaster's startup shell script 
 (launch_container.sh) was generated with wrong CLASSPATH entry. Together with 
 the java process call on the bottom of the file, these entries were generated 
 in windows style, using % as shell variable marker and ; as the CLASSPATH 
 delimiter.
 I tracked down the root cause, and found that the MrApps.java, and the 
 YarnRunner.java classes create these entries, and is passed forward to the 
 ApplicationMaster, assuming that the OS that runs these classes will match 
 the one running the ApplicationMaster. But it's not the case, these are in 2 
 different jvm, and also the OS can be different, the strings are generated 
 based on the client/submitter side's OS.
 I made some workaround changes to these 2 files, so i could launch my job, 
 however there may be more problems ahead.
 update
  error message:
 13/12/04 16:33:15 INFO mapreduce.Job: Job job_1386170530016_0001 failed with 
 state FAILED due to: Application application_1386170530016_0001 failed 2 
 times due to AM Container for appattempt_1386170530016_0001_02 exited 
 with  exitCode: 1 due to: Exception from container-launch: 
 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job 
 control
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
   at org.apache.hadoop.util.Shell.run(Shell.java:379)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:724)
 update2: 
  It also reqires to add the following property to 
  mapred-site.xml (or mapred-default.xml), on the windows box, so that the job 
 launcher knows, that the job runner will be a linux:
   property
   namemapred.remote.os/name
   valueLinux/value
   descriptionRemote MapReduce framework's OS, can be either Linux or 
 Windows/description
  /property
 without this entry, the patched jar does the same as the unpatched, so it's 
 required to work!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-03 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Status: Patch Available  (was: Open)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)

Jian He created MAPREDUCE-5818:
--

 Summary: hsadmin cmd is missing in mapred.cmd
 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Attachment: MAPREDUCE-5818.1.patch

simple patch to add the missing command


 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Status: Patch Available  (was: Open)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Status: Open  (was: Patch Available)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Attachment: MAPREDUCE-5818.2.patch

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Status: Patch Available  (was: Open)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Status: Open  (was: Patch Available)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Attachment: (was: MAPREDUCE-5818.2.patch)

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5818) hsadmin cmd is missing in mapred.cmd

2014-04-02 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5818:
---

Attachment: MAPREDUCE-5818.3.patch

 hsadmin cmd is missing in mapred.cmd
 

 Key: MAPREDUCE-5818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: MAPREDUCE-5818.1.patch, MAPREDUCE-5818.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5816) TestMRAppMaster fails in trunk

2014-03-30 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954861#comment-13954861
 ] 

Jian He commented on MAPREDUCE-5816:


dup of MAPREDUCE-5815 ?

 TestMRAppMaster fails in trunk
 --

 Key: MAPREDUCE-5816
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5816
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ted Yu

 As can be seen from 
 https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1741/console:
 {code}
 Tests in error: 
   TestMRAppMaster.testMRAppMasterMidLock:163 » NullPointer
   TestMRAppMaster.testMRAppMasterSuccessLock:202 » NullPointer
   TestMRAppMaster.testMRAppMasterFailLock:241 » NullPointer
 {code}
 I got the following locally:
 {code}
 Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 2.964 sec  
 FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster
 testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster)  
 Time elapsed: 0.963 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275)
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:163)
 testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster)
   Time elapsed: 0.25 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275)
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:202)
 testMRAppMasterFailLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster)  
 Time elapsed: 0.232 sec   ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.escapeDelimiters(FileNameIndexUtils.java:275)
   at 
 org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:97)
   at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:743)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
   at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1491)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1099)
   at 
 org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterFailLock(TestMRAppMaster.java:241)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5397) AM crashes because Webapp failed to start on multi node cluster

2014-03-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949654#comment-13949654
 ] 

Jian He commented on MAPREDUCE-5397:


My impression on this issue was I submitted a job,  the first few attempts(2 or 
3) of the job all failed because of the above reason.  Eventually the last 
attempt got passed.  But after I made a clean build and re-deploy the cluster, 
I couldn't reproduce anymore. Feel free to reopen this if necessary, and also 
share some logs. tx

 AM crashes because Webapp failed to start on multi node cluster
 ---

 Key: MAPREDUCE-5397
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5397
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: log.txt


 I set up a 12 nodes cluster and tried submitting jobs but get this exception.
 But job is able to succeed after AM crashes and retry a few times(2 or 3)
 {code}
 2013-07-12 18:56:28,438 INFO [main] org.mortbay.log: Extract 
 jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce
  to /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp
 2013-07-12 18:56:28,528 WARN [main] org.mortbay.log: Failed startup of 
 context 
 org.mortbay.jetty.webapp.WebAppContext@2726b2{/,jar:file:/grid/0/dev/jhe/hadoop-2.1.0-beta/share/hadoop/yarn/hadoop-yarn-common-2.1.0-beta.jar!/webapps/mapreduce}
 java.io.FileNotFoundException: 
 /tmp/Jetty_0_0_0_0_43554_mapreduceljbmlg/webapp/webapps/mapreduce/.keep 
 (No such file or directory)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:145)
   at org.mortbay.resource.JarResource.extract(JarResource.java:215)
   at 
 org.mortbay.jetty.webapp.WebAppContext.resolveWebApp(WebAppContext.java:974)
   at 
 org.mortbay.jetty.webapp.WebAppContext.getWebInf(WebAppContext.java:832)
   at 
 org.mortbay.jetty.webapp.WebInfConfiguration.configureClassLoader(WebInfConfiguration.java:62)
   at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:684)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:211)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:134)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1019)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4052:
---

Status: Open  (was: Patch Available)

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.2.0, 0.23.1
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.3.patch, MAPREDUCE-4052.4.patch, 
 MAPREDUCE-4052.5.patch, MAPREDUCE-4052.6.patch, MAPREDUCE-4052.7.patch, 
 MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4052:
---

Status: Patch Available  (was: Open)

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.2.0, 0.23.1
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.3.patch, MAPREDUCE-4052.4.patch, 
 MAPREDUCE-4052.5.patch, MAPREDUCE-4052.6.patch, MAPREDUCE-4052.7.patch, 
 MAPREDUCE-4052.8.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 3 >

1 - 100 of 289 matches

Mail list logo