[jira] [Comment Edited] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258890#comment-16258890
 ] 

Arun Suresh edited comment on YARN-6483 at 11/20/17 7:33 AM:
-

Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which would be the case now - since the new 
setter and getter are abstract. As a work around, what we do in such situations 
is have default/no-op implementations in the base class (NodeReport here) and 
override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?


was (Author: asuresh):
Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which will happen now, since the new setter 
and getter are abstract. As a work around, what we do now is have default/no-op 
implementations in the NodeReport class and override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258892#comment-16258892
 ] 

Yufei Gu commented on YARN-7534:


That's valid. The solution would be adding the resource request to current 
usage, and comparing the new resource usage with the maxResource.

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258890#comment-16258890
 ] 

Arun Suresh commented on YARN-6483:
---

Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which will happen now, since the new setter 
and getter are abstract. As a work around, what we do now is have default/no-op 
implementations in the NodeReport class and override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7480) Render tooltips on columns where text is clipped

2017-11-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258876#comment-16258876
 ] 

Sunil G commented on YARN-7480:
---

ASF license changes needs fix.

Kindly fix this.

> Render tooltips on columns where text is clipped
> 
>
> Key: YARN-7480
> URL: https://issues.apache.org/jira/browse/YARN-7480
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: YARN-7480.001.patch, YARN-7480.002.patch
>
>
> In em-table, when text gets clipped the information is lost. Need to render a 
> tooltip to show the full text in these cases



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258849#comment-16258849
 ] 

YunFan Zhou edited comment on YARN-7534 at 11/20/17 6:34 AM:
-

[~templedf]
For example, a queue resource usage time slices as follows:
Max Resources: **
Current used resources: **
Pending resource request: **

This time a node manager report the heartbeat and it has ** available resources.
Before assigning containers it will do follows check:
{code:java}
@Override
  public Resource assignContainer(FSSchedulerNode node) {
Resource assigned = Resources.none();
if (LOG.isDebugEnabled()) {
  LOG.debug("Node " + node.getNodeName() + " offered to queue: " +
  getName() + " fairShare: " + getFairShare());
}

if (!assignContainerPreCheck(node)) {
  return assigned;
}
{code}

Because it used resources is less than maxResources. So it will assign 
** to
this queue, and in this time the queue's used resources exceed *maxResources*.




was (Author: daemon):
[~templedf]
For example, a queue resource usage time slices as follows:
Max Resources: **
Current used resources: **
Pending resource request: **

This time a node manager report the heartbeat and it has ** 
available resources.
Before assigning containers it will do follows check:
{code:java}
@Override
  public Resource assignContainer(FSSchedulerNode node) {
Resource assigned = Resources.none();
if (LOG.isDebugEnabled()) {
  LOG.debug("Node " + node.getNodeName() + " offered to queue: " +
  getName() + " fairShare: " + getFairShare());
}

if (!assignContainerPreCheck(node)) {
  return assigned;
}
{code}

Because it used resources is less than maxResources. So it will assign 
** to
this queue, and in this time the queue's used resources exceed *maxResources*.



> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258849#comment-16258849
 ] 

YunFan Zhou edited comment on YARN-7534 at 11/20/17 6:33 AM:
-

[~templedf]
For example, a queue resource usage time slices as follows:
Max Resources: **
Current used resources: **
Pending resource request: **

This time a node manager report the heartbeat and it has ** 
available resources.
Before assigning containers it will do follows check:
{code:java}
@Override
  public Resource assignContainer(FSSchedulerNode node) {
Resource assigned = Resources.none();
if (LOG.isDebugEnabled()) {
  LOG.debug("Node " + node.getNodeName() + " offered to queue: " +
  getName() + " fairShare: " + getFairShare());
}

if (!assignContainerPreCheck(node)) {
  return assigned;
}
{code}

Because it used resources is less than maxResources. So it will assign 
** to
this queue, and in this time the queue's used resources exceed *maxResources*.




was (Author: daemon):
[~templedf]
For example, a queue resource usage time slices as follows:
Max Resources: **
Current used resources: **
Pending resource request: **

This time a node manager report the heartbeat and it has ** 
available resources.
Before assigning containers it will do follows check:
{code:java}
@Override
  public Resource assignContainer(FSSchedulerNode node) {
Resource assigned = Resources.none();
if (LOG.isDebugEnabled()) {
  LOG.debug("Node " + node.getNodeName() + " offered to queue: " +
  getName() + " fairShare: " + getFairShare());
}

if (!assignContainerPreCheck(node)) {
  return assigned;
}
{code}

Because it used resources is less than maxResources. So it will assign 
** to
this queue, and in this time the queue's used resources over limit.



> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258849#comment-16258849
 ] 

YunFan Zhou commented on YARN-7534:
---

[~templedf]
For example, a queue resource usage time slices as follows:
Max Resources: **
Current used resources: **
Pending resource request: **

This time a node manager report the heartbeat and it has ** 
available resources.
Before assigning containers it will do follows check:
{code:java}
@Override
  public Resource assignContainer(FSSchedulerNode node) {
Resource assigned = Resources.none();
if (LOG.isDebugEnabled()) {
  LOG.debug("Node " + node.getNodeName() + " offered to queue: " +
  getName() + " fairShare: " + getFairShare());
}

if (!assignContainerPreCheck(node)) {
  return assigned;
}
{code}

Because it used resources is less than maxResources. So it will assign 
** to
this queue, and in this time the queue's used resources over limit.



> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7489) ConcurrentModificationException in RMAppImpl#getRMAppMetrics

2017-11-19 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258833#comment-16258833
 ] 

Tao Yang commented on YARN-7489:


Thanks [~bibinchundatt] for your review and commit.

> ConcurrentModificationException in RMAppImpl#getRMAppMetrics
> 
>
> Key: YARN-7489
> URL: https://issues.apache.org/jira/browse/YARN-7489
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-7489.001.patch
>
>
> The REST clients have sometimes failed to query applications through apps 
> REST API in RMWebService and it happened when iterating 
> attempts(RMWebServices#getApps --> AppInfo# --> 
> RMAppImpl#getRMAppMetrics) and meanwhile these attempts 
> changed(AttemptFailedTransition#transition --> 
> RMAppImpl#createAndStartNewAttempt --> RMAppImpl#createNewAttempt). 
> Application state changed within the lockup period of writeLock in RMAppImpl, 
> so that we can add readLock before iterating attempts to fix this problem.
> Exception stack:
> {noformat}
> java.util.ConcurrentModificationException
> at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
> at 
> java.util.LinkedHashMap$LinkedValueIterator.next(LinkedHashMap.java:747)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getRMAppMetrics(RMAppImpl.java:1487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:597)
> at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
> at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
> at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
> at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-7534:
--
Description: The logic we're scheduling now is to check whether the 
resources used by the queue has exceeded *maxResources* before assigning the 
container. This will leads to the fact that after assigning this container the 
queue uses more resources than *maxResources*.  (was: The logic we're 
scheduling now is to check whether the resources used by the queue has exceeded 
*maxResources* before assigning the container. This will leads to the fact that 
after assigning this container the queue uses more resources than maxResources.)

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-7534:
--
Description: The logic we're scheduling now is to check whether the 
resources used by the queue has exceeded *maxResources* before assigning the 
container. This will leads to the fact that after assigning this container the 
queue uses more resources than maxResources.  (was: The logic we're scheduling 
now is to check whether the resources used by the queue has exceeded 
*maxResources* before assigning the container. This leads to the fact that 
after dispatching this Container, the queue USES more resources than 
maxResources.)

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than maxResources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-7534:
--
Description: The logic we're scheduling now is to check whether the 
resources used by the queue has exceeded *maxResources* before assigning the 
container. This leads to the fact that after dispatching this Container, the 
queue USES more resources than maxResources.  (was: The logic we're scheduling 
now is to check whether the resources used by the queue before assigning the 
container have exceeded maxResources. This leads to the fact that after 
dispatching this Container, the queue USES more resources than maxResources.)

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This leads 
> to the fact that after dispatching this Container, the queue USES more 
> resources than maxResources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-7534:
--
Description: The logic we're scheduling now is to check whether the 
resources used by the queue before assigning the container have exceeded 
maxResources. This leads to the fact that after dispatching this Container, the 
queue USES more resources than maxResources.

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue before assigning the container have exceeded maxResources. This leads 
> to the fact that after dispatching this Container, the queue USES more 
> resources than maxResources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258808#comment-16258808
 ] 

Daniel Templeton commented on YARN-7534:


Any other details?

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7515) HA related tests fails in trunk

2017-11-19 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-7515.

Resolution: Duplicate

Thank you [~jlowe] for pointing out the duplicate JIRA. Closing this issue as 
duplicate.


> HA related tests fails in trunk
> ---
>
> Key: YARN-7515
> URL: https://issues.apache.org/jira/browse/YARN-7515
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Bibin A Chundatt
>
>   https://builds.apache.org/job/PreCommit-YARN-Build/18498/testReport/
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
>   
> org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA
>   
> org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
>   
> org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7534) Fair scheduler dispatch resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)
YunFan Zhou created YARN-7534:
-

 Summary: Fair scheduler dispatch resources may exceed maxResources
 Key: YARN-7534
 URL: https://issues.apache.org/jira/browse/YARN-7534
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: YunFan Zhou






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-19 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-7534:
--
Summary: Fair scheduler assign resources may exceed maxResources  (was: 
Fair scheduler dispatch resources may exceed maxResources)

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-19 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated YARN-7290:
--
Attachment: YARN-7290.003.patch

Uploaded a new patch to try to make the test a bit nicer.

[~templedf], would it be possible for you or someone else to take a look? This 
bug seems to still exist on trunk, and I think it'd be good to fix it.

> canContainerBePreempted can return true when it shouldn't
> -
>
> Key: YARN-7290
> URL: https://issues.apache.org/jira/browse/YARN-7290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7290-failing-test.patch, YARN-7290.001.patch, 
> YARN-7290.002.patch, YARN-7290.003.patch
>
>
> In FSAppAttempt#canContainerBePreempted, we make sure that preempting the 
> given container would not put the app below its fair share:
> {code}
> // Check if the app's allocation will be over its fairshare even
> // after preempting this container
> Resource usageAfterPreemption = Resources.clone(getResourceUsage());
> // Subtract resources of containers already queued for preemption
> synchronized (preemptionVariablesLock) {
>   Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
> }
> // Subtract this container's allocation to compute usage after preemption
> Resources.subtractFrom(
> usageAfterPreemption, container.getAllocatedResource());
> return !isUsageBelowShare(usageAfterPreemption, getFairShare());
> {code}
> However, this only considers one container in isolation, and fails to 
> consider containers for the same app that we already added to 
> {{preemptableContainers}} in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a 
> case where we preempt multiple containers from the same app, none of which by 
> itself puts the app below fair share, but which cumulatively do so.
> I've attached a patch with a test to show this behavior. The flow is:
> 1. Initially greedyApp runs in {{root.preemptable.child-1}} and is allocated 
> all the resources (8g and 8vcores)
> 2. Then starvingApp runs in {{root.preemptable.child-2}} and requests 2 
> containers, each of which is 3g and 3vcores in size. At this point both 
> greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
> 3. For the first container requested by starvedApp, we (correctly) preempt 3 
> containers from greedyApp, each of which is 1g and 1vcore.
> 4. For the second container requested by starvedApp, we again (this time 
> incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below 
> its fair share, but happens anyway because all six times that we call 
> {{return !isUsageBelowShare(usageAfterPreemption, getFairShare());}}, the 
> value of {{usageAfterPreemption}} is 7g and 7vcores (confirmed using 
> debugger).
> So in addition to accounting for {{resourcesToBePreempted}}, we also need to 
> account for containers that we're already planning on preempting in 
> FSPreemptionThread#identifyContainersToPreemptOnNode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258614#comment-16258614
 ] 

Hadoop QA commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
11s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
7s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  4s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 33 new + 606 unchanged - 3 fixed = 639 total (was 609) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 32 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 53s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
4s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
58s{color} | {color:green} hadoop-yarn-client in the 

[jira] [Assigned] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-6483:
-

Assignee: Juan Rodríguez Hortalá

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org