[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-07-20 Thread Kuhu Shukla (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384386#comment-17384386
 ] 

Kuhu Shukla commented on YARN-10854:


Proposal seems good but since I have been away from the land of YARN for a 
while, could [~brahma], [~templedf] or others chime in on the idea as well? I 
would love to review the code for this change.

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2019-08-06 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901428#comment-16901428
 ] 

Kuhu Shukla commented on YARN-6315:
---

Thanks for the ping Eric and sorry about the delay on this. This is not a 
trivial change when it comes to archives and directories and I would have 
difficulty making time for this patch rework. I apologize and please feel free 
to reassign and use the existing patch if it is any good. :(

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-21 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869632#comment-16869632
 ] 

Kuhu Shukla commented on YARN-9202:
---

This attempts to add the logic to refresh nodes but there are test failures 
around it that beg further investigation of scenarios where we get node 
heratbeats after a refresh that has taken those nodes out from the include list.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch, 
> YARN-9202.003.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-21 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9202:
--
Attachment: YARN-9202.003.patch

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch, 
> YARN-9202.003.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-11 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861171#comment-16861171
 ] 

Kuhu Shukla commented on YARN-9202:
---

I am unable to reproduce this case locally but investigating some more. AFAICT 
, so far, it seems unrelated.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-11 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861170#comment-16861170
 ] 

Kuhu Shukla commented on YARN-9202:
---

[~Jim_Brennan], the nodes from the inactive list (with port= -1) are thrown 
away once the actual NM registration come through and creates the new RMNode 
object. Since that is the case for any new node trying to register, we do not 
need the shutdown to running transition since the rmnode object that is in 
shutdown state is never really used so to say.

 

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-06-04 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9202:
--
Attachment: YARN-9202.002.patch

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch, YARN-9202.002.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-04-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819467#comment-16819467
 ] 

Kuhu Shukla commented on YARN-9202:
---

I do not think we can get away with creating new RMNodeImpl objects since 
anything that has not registered may not have valid values for cmPort and 
NmVersion and other fields that are populated through the constructor only upon 
registration. Even for the case where we could just have the REST APIs return 
state in new state, the issue is that none of the lists that the webservice has 
access to have nodes in new state. [~eepayne], appreciate thoughts on how to 
move forward on this given this inherent design of RMNodeImpl. I could expose 
some fields and add setters to get over this issue but I am not sure if that is 
the right way to proceed.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-03-26 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802025#comment-16802025
 ] 

Kuhu Shukla commented on YARN-9202:
---

On second thought :
{quote}
bq. can they just be reused and moved to the NEW or RUNNING state when the host 
registers?

in the next patch.
{quote}
This is trickier as the old node will be set with the nodId = unknownNodeID and 
updating that needs change to RMNodeImpl fields some of which are private and 
final. Also when I tried making this change there were several typecasts to 
RMNodeImpl which makes this less than ideal. I will explore the two other 
possibilities to see which one makes more sense.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-02-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206-branch-2.8.001.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9206-branch-2.8.001.patch, 
> YARN-9206-branch-3.1.001.patch, YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-02-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206-branch-3.1.001.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9206-branch-3.1.001.patch, YARN-9206.001.patch, 
> YARN-9206.002.patch, YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-02-05 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761251#comment-16761251
 ] 

Kuhu Shukla commented on YARN-9206:
---

Thank you [~sunilg] for the commit and the review. Thank you [~Jim_Brennan] for 
the reviews. The 2.8 version has no Test file in the repo and I didn't find one 
that was relevant so I skipped the test. Hope that is ok, else if needed I can 
add one.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9206-branch-2.8.001.patch, 
> YARN-9206-branch-3.1.001.patch, YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-02-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: (was: YARN-9206-branch-3.1.001.patch)

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9206-branch-3.1.001.patch, YARN-9206.001.patch, 
> YARN-9206.002.patch, YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-02-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206-branch-3.1.001.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9206-branch-3.1.001.patch, YARN-9206.001.patch, 
> YARN-9206.002.patch, YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-31 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206.004.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch, YARN-9206.004.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-30 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756075#comment-16756075
 ] 

Kuhu Shukla commented on YARN-9206:
---

Thank you [~sunilg], will update patch shortly.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-23 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750094#comment-16750094
 ] 

Kuhu Shukla commented on YARN-9206:
---

Spoke to [~Jim_Brennan] offline and he clarified with the following solution. 
It is certainly better but does add some if blocks to the computation, which I 
was really trying to not do (at the cost of complexity -which is not ideal). 
[~sunilg] please review the following suggestion by [~Jim_Brennan] and if it 
looks ok to you I will revise my patch.
{code:java}
public static List queryRMNodes(RMContext context,
 EnumSet acceptedStates) {
   // nodes contains nodes that are NEW, RUNNING, UNHEALTHY or DECOMMISSIONING.
   boolean has_active = false;
   boolean has_inactive = false;
   ArrayList results = new ArrayList();
   for (NodeState nodeState : acceptedStates) {
 if (!has_inactive && nodeState.isInactiveState()) {
   has_inactive = true;
 }
 if (!has_active && nodeState.isActiveState()) {
   has_active = true;
 }
 if (has_active && has_inactive) {
   break;
 }
   }
   if (has_inactive) {
 for (RMNode rmNode : context.getInactiveRMNodes().values()) {
   if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
 results.add(rmNode);
   }
 }
   }
   if (has_active) {
 for (RMNode rmNode : context.getRMNodes().values()) {
   if (acceptedStates.contains(rmNode.getState())) {
 results.add(rmNode);
   }
 }
   }
   return results;
 }
{code}

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9018) Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path

2019-01-23 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750058#comment-16750058
 ] 

Kuhu Shukla commented on YARN-9018:
---

[~eepayne], could you help review this patch? Thanks a lot!

> Add functionality to AuxiliaryLocalPathHandler to return all locations to 
> read for a given path
> ---
>
> Key: YARN-9018
> URL: https://issues.apache.org/jira/browse/YARN-9018
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9018.001.patch
>
>
> Analogous to LocalDirAllocator#getAllLocalPathsToRead, this will allow aux 
> services(and other components) to use this function that they rely on when 
> using the former class objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749362#comment-16749362
 ] 

Kuhu Shukla commented on YARN-9202:
---

[~eepayne], Thank you so much for the review!
{quote} I do not see the nodes from the include list in the UI shutdown list.
{quote}
I tested it on my pseudo distributed cluster and it works per our offline 
discussion. I will address other comments especially around
{quote}can they just be reused and moved to the NEW or RUNNING state when the 
host registers?
{quote}
in the next patch.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749361#comment-16749361
 ] 

Kuhu Shukla commented on YARN-9206:
---

Thank you [~sunilg] for the review. I was not super happy with it either. Can 
you comment on v2 and v1 patch and see if that makes more sense? Thanks again!

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749333#comment-16749333
 ] 

Kuhu Shukla edited comment on YARN-9206 at 1/23/19 12:37 AM:
-

Complexity is a bit worse but I tried not to add another boolean. Appreciate 
corrections, comments.


was (Author: kshukla):
Complexity is a bit worse but I tried not to add another boolean. Appreciate 
correctiosn, comments.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749333#comment-16749333
 ] 

Kuhu Shukla commented on YARN-9206:
---

Complexity is a bit worse but I tried not to add another boolean. Appreciate 
correctiosn, comments.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206.003.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch, 
> YARN-9206.003.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749140#comment-16749140
 ] 

Kuhu Shukla commented on YARN-9206:
---

I see! Will update patch shortly.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-22 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749021#comment-16749021
 ] 

Kuhu Shukla commented on YARN-9206:
---

Thank you [~Jim_Brennan] for the review!
bq. I think you need to iterate the acceptedStates() and call isInactiveState() 
on each one to determine if it contains one.
Yes, this was something I was trying to avoid as EnumSet.contains at least in 
my understanding is faster than iterating over elements of the enum set.
bq.if there should also be a NodeState.isActiveState that can be used in the 
same way for the first part of QueryRMNodes().
Agree that it should be a good addition.



> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-18 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746600#comment-16746600
 ] 

Kuhu Shukla commented on YARN-9206:
---

Thank you for the comments [~leftnoteasy], I guess the v2 patch is what you 
were looking for? I would have preferred the param to to the new method to be 
not an Enum but it made more sense than iteratng over the acceptedStates. This 
patch includes a test for inactive node states.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-18 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206.002.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch, YARN-9206.002.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-17 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9206:
--
Attachment: YARN-9206.001.patch

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745595#comment-16745595
 ] 

Kuhu Shukla commented on YARN-9206:
---

Patch needs a test still, but just to get things going.

> RMServerUtils does not count SHUTDOWN as an accepted state
> --
>
> Key: YARN-9206
> URL: https://issues.apache.org/jira/browse/YARN-9206
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9206.001.patch
>
>
> {code}
> if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
> acceptedStates.contains(NodeState.LOST) ||
> acceptedStates.contains(NodeState.REBOOTED)) {
>   for (RMNode rmNode : context.getInactiveRMNodes().values()) {
> if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
>   results.add(rmNode);
> }
>   }
> }
> return results;
>   }
> {code}
> This should include SHUTDOWN state as they are inactive too. This method is 
> used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9206) RMServerUtils does not count SHUTDOWN as an accepted state

2019-01-17 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-9206:
-

 Summary: RMServerUtils does not count SHUTDOWN as an accepted state
 Key: YARN-9206
 URL: https://issues.apache.org/jira/browse/YARN-9206
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.3
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


{code}
if (acceptedStates.contains(NodeState.DECOMMISSIONED) ||
acceptedStates.contains(NodeState.LOST) ||
acceptedStates.contains(NodeState.REBOOTED)) {
  for (RMNode rmNode : context.getInactiveRMNodes().values()) {
if ((rmNode != null) && acceptedStates.contains(rmNode.getState())) {
  results.add(rmNode);
}
  }
}
return results;
  }
{code}
This should include SHUTDOWN state as they are inactive too. This method is 
used for node reports and such so might be useful to account for them as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745549#comment-16745549
 ] 

Kuhu Shukla commented on YARN-9202:
---

Thank you Jim for the review. Appreciate it.
bq.  If nodes are in the include list, but never register, what is it that we 
are missing
Currently there is no way to know which nodes should have been a part of the 
cluster, unless one manually goes and checks the include list. This is 
different from the Namenode as the nodes that are not registered are still 
listed as dead or in other categories.
bq. Is it just that those nodes are not included in any metrics? 
More or less, yes, tracking what *should* be there is harder for operation 
teams.
bq. Can the desired result be accomplished by just adding these nodes to the 
inactive list and leaving them in the NEW state? 
I did think about that and since there was no place where NEW nodes were 
exposed on the UI I thought may be moving them to a somewhat terminal state 
would be nicer , but of course, I like the idea of having NEW nodes in the 
inactive list as well. I will have to see how much semantic difference does it 
make in the code, to which end I will update shortly.
bq. testIncludeHostsWithNoRegister() - it's not clear to me why the latter half 
of the test is needed?  Looks like it was copied from the previous test but I 
don't see why it needs to be repeated in this one?
True. I will prune the test in the next version.

If keeping the nodes in NEW state is fairly straight forward while they get 
listed as inactive, the next version would have that change as well. 



> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-01-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745285#comment-16745285
 ] 

Kuhu Shukla commented on YARN-8625:
---

Thank you for the patch and the report.
Minor checkstyle needs fixing but change seems straightforward. How do you plan 
to use it on the AHS side since the new field has not be leveraged yet I think, 
please correct me if I am wrong [~Prabhu Joseph]. 

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745268#comment-16745268
 ] 

Kuhu Shukla commented on YARN-9202:
---

The test fails with and without the patch and YARN-8494 tracks that. 
[~Jim_Brennan], [~eepayne],[~nroberts] request for initial thoughts and 
comments.

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime

2019-01-17 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745192#comment-16745192
 ] 

Kuhu Shukla commented on YARN-6616:
---

Minor comment on 
{code}

  @Public
  @Stable
  public abstract long getSubmitTime();

  @Private
  @Unstable
  public abstract void setSubmitTime(long submitTime);
{code}
Wondering how the getter is considered Stable and the setter is not. I see 
other methods do the same but is that true for these particular ones or just an 
artifact from older code?
I also wonder how this fix getting in would not break compatibility in a minor 
release of 3.2. I am no expert at compatibility so pinging [~eepayne] and 
[~haibochen] for helping with this. 

Otherwise the patch looks good to me and I verified that the test failures are 
unrelated. We need to check out the mapreduce build failure however.

> YARN AHS shows submitTime for jobs same as startTime
> 
>
> Key: YARN-6616
> URL: https://issues.apache.org/jira/browse/YARN-6616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-6616.patch, 0002-YARN-6616.patch, 
> 0003-YARN-6616.patch
>
>
> YARN AHS returns startTime value for both submitTime and startTime for the 
> jobs.  Looks the code sets the submitTime with startTime value. 
> https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java#L80
> {code}
> curl --negotiate -u: 
> http://prabhuzeppelin3.openstacklocal:8188/ws/v1/applicationhistory/apps
> 149501553757414950155375741495016384084
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime

2019-01-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744435#comment-16744435
 ] 

Kuhu Shukla commented on YARN-6616:
---

Taking a look. Thanks for the update!

> YARN AHS shows submitTime for jobs same as startTime
> 
>
> Key: YARN-6616
> URL: https://issues.apache.org/jira/browse/YARN-6616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-6616.patch, 0002-YARN-6616.patch, 
> 0003-YARN-6616.patch
>
>
> YARN AHS returns startTime value for both submitTime and startTime for the 
> jobs.  Looks the code sets the submitTime with startTime value. 
> https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java#L80
> {code}
> curl --negotiate -u: 
> http://prabhuzeppelin3.openstacklocal:8188/ws/v1/applicationhistory/apps
> 149501553757414950155375741495016384084
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744369#comment-16744369
 ] 

Kuhu Shukla commented on YARN-9202:
---

Here is an initial patch that tackles this problem by listing new nodes as 
SHUTDOWN first. This means that now nodes can be shutdown and be brought back 
up making it a non terminal state in say one life cycle of the RM. Any ideas, 
concerns around this change which can cause semantics to break would be good to 
point out here. I will wait for p\Precommit before formal review comment 
request but any ideas on this patch would be awesome!

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-16 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9202:
--
Attachment: YARN-9202.001.patch

> RM does not track nodes that are in the include list and never register
> ---
>
> Key: YARN-9202
> URL: https://issues.apache.org/jira/browse/YARN-9202
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.2, 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9202.001.patch
>
>
> The RM state machine decides to put new or running nodes in inactive state 
> only past the point of either registration or being in the exclude list. This 
> does not cover the case where a node is the in the include list but never 
> registers and since all state changes are based on these NodeState 
> transitions, having NEW nodes be listed as inactive first may help. This 
> would change the semantics of how inactiveNodes are looked at today. Another 
> state addition might help this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9202) RM does not track nodes that are in the include list and never register

2019-01-16 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-9202:
-

 Summary: RM does not track nodes that are in the include list and 
never register
 Key: YARN-9202
 URL: https://issues.apache.org/jira/browse/YARN-9202
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.5, 3.0.3, 2.9.2
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


The RM state machine decides to put new or running nodes in inactive state only 
past the point of either registration or being in the exclude list. This does 
not cover the case where a node is the in the include list but never registers 
and since all state changes are based on these NodeState transitions, having 
NEW nodes be listed as inactive first may help. This would change the semantics 
of how inactiveNodes are looked at today. Another state addition might help 
this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime

2019-01-14 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742520#comment-16742520
 ] 

Kuhu Shukla commented on YARN-6616:
---

I am currently reviewing this patch and the proposed change is a good one. My 
one concern is the break of backward compatibility with the change to the 
protos and more importantly the newInstance for ApplicationReport which at 
least for us is used by upstream and peer projects. Can we add a new 
constructor/newInstance with submit Time and replace the non-public usages of 
that so that the submit time is available for the new or modified consumers but 
keeps the option to use the old(buggy) way if the need be?

> YARN AHS shows submitTime for jobs same as startTime
> 
>
> Key: YARN-6616
> URL: https://issues.apache.org/jira/browse/YARN-6616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-6616.patch
>
>
> YARN AHS returns startTime value for both submitTime and startTime for the 
> jobs.  Looks the code sets the submitTime with startTime value. 
> https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java#L80
> {code}
> curl --negotiate -u: 
> http://prabhuzeppelin3.openstacklocal:8188/ws/v1/applicationhistory/apps
> 149501553757414950155375741495016384084
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9018) Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path

2018-11-13 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-9018:
--
Attachment: YARN-9018.001.patch

> Add functionality to AuxiliaryLocalPathHandler to return all locations to 
> read for a given path
> ---
>
> Key: YARN-9018
> URL: https://issues.apache.org/jira/browse/YARN-9018
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5
>Reporter: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9018.001.patch
>
>
> Analogous to LocalDirAllocator#getAllLocalPathsToRead, this will allow aux 
> services(and other components) to use this function that they rely on when 
> using the former class objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9018) Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path

2018-11-13 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned YARN-9018:
-

Assignee: Kuhu Shukla

> Add functionality to AuxiliaryLocalPathHandler to return all locations to 
> read for a given path
> ---
>
> Key: YARN-9018
> URL: https://issues.apache.org/jira/browse/YARN-9018
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.5, 3.0.3
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-9018.001.patch
>
>
> Analogous to LocalDirAllocator#getAllLocalPathsToRead, this will allow aux 
> services(and other components) to use this function that they rely on when 
> using the former class objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9018) Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path

2018-11-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-9018:
-

 Summary: Add functionality to AuxiliaryLocalPathHandler to return 
all locations to read for a given path
 Key: YARN-9018
 URL: https://issues.apache.org/jira/browse/YARN-9018
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.5, 3.0.3
Reporter: Kuhu Shukla


Analogous to LocalDirAllocator#getAllLocalPathsToRead, this will allow aux 
services(and other components) to use this function that they rely on when 
using the former class objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8082) Include LocalizedResource size information in the NM download log for localization

2018-04-02 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-8082:
--
Attachment: YARN-8082.002.patch

> Include LocalizedResource size information in the NM download log for 
> localization
> --
>
> Key: YARN-8082
> URL: https://issues.apache.org/jira/browse/YARN-8082
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-8082.001.patch, YARN-8082.002.patch
>
>
> The size of the resource that finished downloading helps with debugging 
> localization delays and failures. A close approximate local size of the 
> resource is available in the LocalizedResource object which can be used to 
> address this minor change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8082) Include LocalizedResource size information in the NM download log for localization

2018-03-27 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-8082:
--
Attachment: YARN-8082.001.patch

> Include LocalizedResource size information in the NM download log for 
> localization
> --
>
> Key: YARN-8082
> URL: https://issues.apache.org/jira/browse/YARN-8082
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-8082.001.patch
>
>
> The size of the resource that finished downloading helps with debugging 
> localization delays and failures. A close approximate local size of the 
> resource is available in the LocalizedResource object which can be used to 
> address this minor change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8082) Include LocalizedResource size information in the NM download log for localization

2018-03-27 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-8082:
-

 Summary: Include LocalizedResource size information in the NM 
download log for localization
 Key: YARN-8082
 URL: https://issues.apache.org/jira/browse/YARN-8082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


The size of the resource that finished downloading helps with debugging 
localization delays and failures. A close approximate local size of the 
resource is available in the LocalizedResource object which can be used to 
address this minor change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-23 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned YARN-8054:
-

Assignee: Jonathan Eagles  (was: Jason Lowe)

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.1.1
>
> Attachments: YARN-8054.001.patch, YARN-8054.002.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$Moni

[jira] [Commented] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-21 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408088#comment-16408088
 ] 

Kuhu Shukla commented on YARN-8054:
---

Since the stack trace is printed already in the NM log, the log.warn seems 
good. 

+1 (non-binding).

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: YARN-8054.001.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.Local

[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-12-13 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289657#comment-16289657
 ] 

Kuhu Shukla commented on YARN-6315:
---

Known and unrelated test failures, appreciate any comments on the 
patch/approach [~jlowe], [~jrottinghuis]. Thanks a lot.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-12-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.006.patch

Updated patch that fixes all but one checkstyle issues. The indentation warning 
seems trivial. Also, findbugs warning is present with and without the patch. No 
test failures were there since the test builds failed with 
{code}
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
{code}
Hopefully this precommit will go through without this error.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-12-11 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.005.patch

Updated patch with a revised approach to keep track of actual size of the file 
via downloadSize. Changes were also made to YARNRunner and LocalResourceProto 
for this added field. If download size is not updated and is -1 (it could be 
changed to a constant to indicate that the value was not set at any point), we 
ignore the file attribute mismatch. Would appreciate any initial 
comments/modifications on the approach. Thanks a lot!

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7422) Application History Server URL does not direct to the appropriate UI for failed/killed jobs

2017-11-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234089#comment-16234089
 ] 

Kuhu Shukla commented on YARN-7422:
---

Thank you [~jlowe].
bq. One simple method that would work for Tez and possibly other frameworks 
would be supporting a history URL during registration in addition to the one 
already supported at unregistration.
That would solve most cases and I agree that AM that dies before registration 
does not need a valid tracking URL.

> Application History Server URL does not direct to the appropriate UI for 
> failed/killed jobs
> ---
>
> Key: YARN-7422
> URL: https://issues.apache.org/jira/browse/YARN-7422
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Priority: Major
>
> In cases where AM fails fatally, the AHS page's history link does not work 
> since AM was not able to update the trackingURL for the job. This JIRA is to 
> track any last attempt effort we can do from the AM to allow a tracking URL 
> in cases where the AM failure does not occur immediately at start up. Any 
> ideas and corrections would be appreciated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7422) Application History Server URL does not direct to the appropriate UI for failed/killed jobs

2017-10-31 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-7422:
-

 Summary: Application History Server URL does not direct to the 
appropriate UI for failed/killed jobs
 Key: YARN-7422
 URL: https://issues.apache.org/jira/browse/YARN-7422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.1
Reporter: Kuhu Shukla


In cases where AM fails fatally, the AHS page's history link does not work 
since AM was not able to update the trackingURL for the job. This JIRA is to 
track any last attempt effort we can do from the AM to allow a tracking URL in 
cases where the AM failure does not occur immediately at start up. Any ideas 
and corrections would be appreciated. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-30 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225477#comment-16225477
 ] 

Kuhu Shukla commented on YARN-7244:
---

[~jlowe], request for comments on the 2.8 version of the patch. Appreciate it!

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7244-branch-2.8.001.patch, 
> YARN-7244-branch-2.8.002.patch, YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-30 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244-branch-2.8.002.patch

Fixing minor new line checkstyle.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7244-branch-2.8.001.patch, 
> YARN-7244-branch-2.8.002.patch, YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-30 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244-branch-2.8.001.patch

Attaching 2.8 version of the patch which needed some extra changes. The 
important one is in LocalDirsHandlerService which was missing 
getLocalPathForRead() method from trunk which went in as part of YARN-3998. I 
have added just that method rather than change the visibility of 
getPathToRead(). 

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7244-branch-2.8.001.patch, YARN-7244.001.patch, 
> YARN-7244.002.patch, YARN-7244.003.patch, YARN-7244.004.patch, 
> YARN-7244.005.patch, YARN-7244.006.patch, YARN-7244.007.patch, 
> YARN-7244.008.patch, YARN-7244.009.patch, YARN-7244.010.patch, 
> YARN-7244.011.patch, YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-30 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reopened YARN-7244:
---

Re-opening to attach 2.8 version of the patch.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218956#comment-16218956
 ] 

Kuhu Shukla commented on YARN-7244:
---

[~jlowe]/[~sunilg] appreciate any comments on the latest patch! Thank you.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-18 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209152#comment-16209152
 ] 

Kuhu Shukla commented on YARN-7244:
---

[~jlowe], [~sunilg] request for comments/review. Thanks a lot!

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-17 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.013.patch

The build failed with 2 separate and unrelated issues I believe. The second 
time the cache seems to be picking up the old package for 
AuxiliaryLocalPathHandler. Re-triggering by uploading the same patch again. 
Please let me know if I missed something.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-17 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.012.patch

Fixing minor checkstyle issues. :(

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-17 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.011.patch

Updated patch addressing comments from [~sunilg].

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-17 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208089#comment-16208089
 ] 

Kuhu Shukla commented on YARN-7244:
---

Thank you [~sunilg] for the comments!
bq. Do you think is it better to have a setter and update 
AuxiliaryLocalPathHandler to AuxServices rather than changing AuxServices ctor.
The constructor change makes sure that we always initialize the pathHandler 
which seems safer to me.
bq. AuxiliaryLocalPathHandler could be in org.apache.hadoop.yarn.server.api? 
any reasons to move to api?
You are right. This needs to be in server apis.
bq. All apis in AuxiliaryLocalPathHandlerImpl could have Override annotation.
Will do.
bq. Does ContainerManagerImpl need to have a getAuxiliaryLocalPathHandler ?
I added the getter to assist any testing in the futur and made it package 
private. I can mark it as VisibleForTesting or take it out , either way would 
be fine.


> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-16 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.010.patch

Attaching revised patch that address review comments. Thanks a lot !

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-16 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205971#comment-16205971
 ] 

Kuhu Shukla commented on YARN-7244:
---

[~jlowe], request for review/comments on the latest patch. Thanks again.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-16 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.009.patch

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-16 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.008.patch

Fixing minor javadoc comments. Think the patch is ready for some review!

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-15 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.007.patch

Updated patch that fixes checkstyles (almost all.. there is one to add getter 
in a test that seems excessive to me) and test failure for testMapFileAccess. 
My setup did not allow for that test to run and required overhauling. Verified 
that it passes now.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-15 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.006.patch

Thank you for the comments/review [~jlowe]! Updated patch. Will wait for 
PreCommit before any review requests.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.005.patch

Fixing TestShuffleHandler failures. The TestDistributedScheduler failure is 
documented in YARN-7299.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201883#comment-16201883
 ] 

Kuhu Shukla commented on YARN-7244:
---

Test failures are related. Will update shortly.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-11 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.004.patch

Rebasing patch on trunk.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-11 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.003.patch

Updated patch closer to the design Jason mentioned earlier. Adds a new Path 
Handler that is passed from the Containermanager -> AuxServices -> 
AuxiliaryService -> ShuffleHandler.
Appreciate any comments on the approach/patch. Thanks a lot!

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-28 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184239#comment-16184239
 ] 

Kuhu Shukla commented on YARN-7244:
---

Thank you [~jlowe], [~sunilg] for the review/comments.
bq. We could make a pull API where the aux service can essentially directly 
call the NM's LocalDirHandlerService for getting a path to read or a path to 
write, then the aux service doesn't even have to manage the directories itself 
if all it cares about is finding a place to write or read.

A pull model where the Shuffle handler /aux service does not maintain valid 
dirs state would be my preference but the other pull approach would work too. I 
will start reworking the patch in the meantime and will finalize based on what 
we decide. Appreciate your thoughts.


> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182450#comment-16182450
 ] 

Kuhu Shukla commented on YARN-7244:
---

Thank you [~sunilg] for the review comments! 
bq. We could push this config name to LocalDirAllocator and then read from NM 
end
I am not sure how we can initialize the config specifically for 
LocalDirAllocator (may be add a constructor?) and what would reading it from NM 
end mean. Agreed that code separation is important here. May be not having this 
as a config might help?
bq. Do you think, we can improve this to skip as default behavior itself?
I did not fully get what you have in mind here. Could you help me understand 
and elaborate a bit. Thanks a lot!

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181059#comment-16181059
 ] 

Kuhu Shukla edited comment on YARN-7244 at 9/27/17 12:06 PM:
-

Thank you [~bibinchundatt] for the review comments!
bq. Better to check directory exists first if we are not concerned of 
permission . thoughts?
Makes sense to me.
bq. Testcase is successful even if YARN_SHUFFLE_BAD_DIRS_FILTER_ENABLED set to 
false.
That is expected. This config essentially decides if we should stick to 
existing behavior (value=true) of removing directories if they are bad or not 
(value=false).


was (Author: kshukla):
bq. Better to check directory exists first if we are not concerned of 
permission . thoughts?
Makes sense to me.
bq. Testcase is successful even if YARN_SHUFFLE_BAD_DIRS_FILTER_ENABLED set to 
false.
That is expected. This config essentially decides if we should stick to 
existing behavior (value=true) of removing directories if they are bad or not 
(value=false).

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-26 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181059#comment-16181059
 ] 

Kuhu Shukla commented on YARN-7244:
---

bq. Better to check directory exists first if we are not concerned of 
permission . thoughts?
Makes sense to me.
bq. Testcase is successful even if YARN_SHUFFLE_BAD_DIRS_FILTER_ENABLED set to 
false.
That is expected. This config essentially decides if we should stick to 
existing behavior (value=true) of removing directories if they are bad or not 
(value=false).

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-22 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.002.patch

Fixing minor test issue for newly added yarn config key.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-22 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-7244:
--
Attachment: YARN-7244.001.patch

v1 patch that adds a new LocalDirAllocator#getLocalPathToRead() that decides to 
filter the bad directories based on a boolean. Changing the original call would 
be more pervasive. The patch does modify the 
AllocatorPerContext#getLocalPathToRead() signature since that is a private 
static class to LocalDirAllocator. The ShuffleHandler uses a yarn config to 
decide whether or not to filter bad dirs. This value , when false will never 
take out bad directories and hence any changes to local dirs would not impact 
the shuffle handler reads. Even if the mkdirs and exists check fails we want 
the dirs to be listed in the localdirs member when the config is false. For 
testing reasons, I have added a getter to the lDirAllocator which is package 
private. Appreciate any comments/corrections to this patch.

Another way to handle this would have been to change the AuxiliaryServices to 
pass the NMContext or the LocalDirAllocator from the NM . The former approach 
needs nodemanager dependencies to be added and the latter is tricky as I am not 
sure how the AuxServices class would pass the object without adding that it as 
a member. Would appreciate any suggestions on any alternative approaches as 
well.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-09-22 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-7244:
-

 Summary: ShuffleHandler is not aware of disks that are added
 Key: YARN-7244
 URL: https://issues.apache.org/jira/browse/YARN-7244
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


The ShuffleHandler permanently remembers the list of "good" disks on NM 
startup. If disks later are added to the node then map tasks will start using 
them but the ShuffleHandler will not be aware of them. The end result is that 
the data cannot be shuffled from the node leading to fetch failures and re-runs 
of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-06-20 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055849#comment-16055849
 ] 

Kuhu Shukla commented on YARN-6315:
---

Adding an 'actualSize' for LocalizedResource and then checking it against the 
file attributes in isResourcePresent() covers a subset of corruption scenarios. 
That is if the file size changes after its successful download. I am leaning 
towards adding actualSize to reflect the "hdfs resource size" and compare that 
with the local file size. This will cover any corruption caused during 
download. Special case here would be directories and archives.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-26 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026350#comment-16026350
 ] 

Kuhu Shukla commented on YARN-6641:
---

Thanks [~jlowe], let me know if a 2.8 patch is required as well.

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch, 
> YARN-6641.003.patch, YARN-6641.004.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-26 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6641:
--
Attachment: YARN-6641.004.patch

Updated patch to make the getter for dirsHandler package private. Thanks 
[~jlowe] for the review comments.

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch, 
> YARN-6641.003.patch, YARN-6641.004.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025341#comment-16025341
 ] 

Kuhu Shukla commented on YARN-6641:
---

[~jlowe], request for some more comments. Thanks a lot!

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch, 
> YARN-6641.003.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6641:
--
Attachment: YARN-6641.003.patch

Thanks [~jlowe] for the quick response. I have updated the patch.

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch, 
> YARN-6641.003.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024814#comment-16024814
 ] 

Kuhu Shukla commented on YARN-6641:
---

Minor checkstyle issues. Will fix in upcoming patches. Request for review on 
the approach and any concerns with this change. [~jlowe]/ [~nroberts]. 

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-24 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6641:
--
Attachment: YARN-6641.002.patch

Fixed minor checkstyle issues.  Findbugs warnings are in files this patch has 
not touched so leaving them unaddressed for now.

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch, YARN-6641.002.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-24 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6641:
--
Attachment: YARN-6641.001.patch

v1 patch that calls the constructor for LocalResourceTracker with 
LocalDirsHandlerService object. It dies that also in the case where it is 
trying to recover resources.

> Non-public resource localization on a bad disk causes subsequent containers 
> failure
> ---
>
> Key: YARN-6641
> URL: https://issues.apache.org/jira/browse/YARN-6641
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6641.001.patch
>
>
> YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
> call to allow checking an already localized resource against the list of 
> good/full directories.
> Since LocalResourcesTrackerImpl instantiations for app level resources and 
> private resources do not use the new constructor, such resources that are on 
> bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure

2017-05-24 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-6641:
-

 Summary: Non-public resource localization on a bad disk causes 
subsequent containers failure
 Key: YARN-6641
 URL: https://issues.apache.org/jira/browse/YARN-6641
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} 
call to allow checking an already localized resource against the list of 
good/full directories.

Since LocalResourcesTrackerImpl instantiations for app level resources and 
private resources do not use the new constructor, such resources that are on 
bad disk will never be checked against good dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6277) Nodemanager heap memory leak

2017-04-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952157#comment-15952157
 ] 

Kuhu Shukla commented on YARN-6277:
---

[~Feng Yuan], did you see this after YARN-4095 went in? Thanks!

> Nodemanager heap memory leak
> 
>
> Key: YARN-6277
> URL: https://issues.apache.org/jira/browse/YARN-6277
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2
>Reporter: Feng Yuan
>Assignee: Feng Yuan
> Attachments: YARN-6277.branch-2.8.001.patch
>
>
> Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create 
> massive LocalFileSystem.So lead to heap leak.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943132#comment-15943132
 ] 

Kuhu Shukla commented on YARN-6315:
---

Thank you [~jlowe] for the reviews and help find out the bug with this 
approach. I will update my patch shortly. Initial idea which seems to work when 
I tested it was add "actualSize" to LocalizedResource and use that instead of 
the request's size. Will update shortly,

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-16 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928563#comment-15928563
 ] 

Kuhu Shukla commented on YARN-6315:
---

mvn install is marking a lot of hdfs files as duplicate. I have asked on 
HDFS-11431 since that seems related.
{code}
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed 
with message:
Duplicate classes found:

  Found in:
org.apache.hadoop:hadoop-client-api:jar:3.0.0-alpha3-SNAPSHOT:compile

org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0-alpha3-SNAPSHOT:compile
  Duplicate classes:

org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocolProtos$GetJournalStateRequestProto$Builder.class
 
{code}

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-16 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.004.patch

Thank you [~jlowe] for the feedback. I have made the changes accordingly.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-15 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927045#comment-15927045
 ] 

Kuhu Shukla commented on YARN-6315:
---

[~jlowe], Request for some more comments on the latest patch. Appreciate it.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-15 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.003.patch

Thank you Jason for the review. Updated patch. Also, I now catch Exception 
instead of just IOException to cover the cases where the readAttributes call 
could throw SecurityException or UnsupportedOperationException. Will wait for 
Precommit before review request.

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-13 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907917#comment-15907917
 ] 

Kuhu Shukla commented on YARN-6315:
---

[~jlowe], [~eepayne], Request for comments/review. Thanks a lot!

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-13 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.002.patch

Fixing checkstyle warnings. 

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906665#comment-15906665
 ] 

Kuhu Shukla edited comment on YARN-6315 at 3/12/17 7:36 PM:


Some performance numbers from instrumenting the test and profiling it through 
YourKit on my Macbook Pro.

The current patch spends an average of 1900 ms for 10,002 runs (189 
micro-seconds per call).
An equivalent patch that uses file.isDirectory(), file.exists(), file.length() 
as shown below takes 2080.8 ms for 10,002 runs (208 micro seconds per call).

{code}
 if ((!file.isDirectory() && file.length() != req.getSize()) || !file.exists()) 
{
  ret = false;
} else if (dirsHandler != null) {
  ret = checkLocalResource(rsrc);
}
{code}


was (Author: kshukla):
Some performance numbers from instrumenting the test and profiling it through 
YourKit on my Macbook Pro.

The current patch spends an average of 1900 ms for 10,002 runs (189 
micro-seconds per call).
An equivalent patch that uses file.isDirectory(), file.exists(), file.length() 
as shown below takes 2080.8 ms for 10,002 runs (0.208 micro seconds per call).

{code}
 if ((!file.isDirectory() && file.length() != req.getSize()) || !file.exists()) 
{
  ret = false;
} else if (dirsHandler != null) {
  ret = checkLocalResource(rsrc);
}
{code}

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906665#comment-15906665
 ] 

Kuhu Shukla commented on YARN-6315:
---

Some performance numbers from instrumenting the test and profiling it through 
YourKit on my Macbook Pro.

The current patch spends an average of 1900 ms for 10,002 runs (189 
micro-seconds per call).
An equivalent patch that uses file.isDirectory(), file.exists(), file.length() 
as shown below takes 2080.8 ms for 10,002 runs (0.208 micro seconds per call).

{code}
 if ((!file.isDirectory() && file.length() != req.getSize()) || !file.exists()) 
{
  ret = false;
} else if (dirsHandler != null) {
  ret = checkLocalResource(rsrc);
}
{code}

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-6315:
--
Attachment: YARN-6315.001.patch

First version of the patch that uses readAttributes bulk operation to match the 
size for resources that are not directories since the size of the directory may 
not always match up. It maintains the exists() behavior by setting ret= false 
when file not found exception is thrown. The method also catches IOException to 
maintain previous behavior/signature. 

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-6315.001.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2017-03-09 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created YARN-6315:
-

 Summary: Improve LocalResourcesTrackerImpl#isResourcePresent to 
return false for corrupted files
 Key: YARN-6315
 URL: https://issues.apache.org/jira/browse/YARN-6315
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.3, 2.8.1
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


We currently check if a resource is present by making sure that the file exists 
locally. There can be a case where the LocalizationTracker thinks that it has 
the resource if the file exists but with size 0 or less than the "expected" 
size of the LocalResource. This JIRA tracks the change to harden the 
isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   >