[jira] [Updated] (YARN-8477) Minor code refactor on ProcfsBasedProcessTree.java

2018-06-28 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8477:
--
Attachment: YARN-8477.001.patch

> Minor code refactor on ProcfsBasedProcessTree.java
> --
>
> Key: YARN-8477
> URL: https://issues.apache.org/jira/browse/YARN-8477
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Shuai Zhang
>Priority: Trivial
> Attachments: YARN-8477.001.patch
>
>
> Minor code refactor on ProcfsBasedProcessTree.java to improve readability.
> Split the functionality of "read the first line of file" into separate 
> function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8477) Minor code refactor on ProcfsBasedProcessTree.java

2018-06-28 Thread Shuai Zhang (JIRA)
Shuai Zhang created YARN-8477:
-

 Summary: Minor code refactor on ProcfsBasedProcessTree.java
 Key: YARN-8477
 URL: https://issues.apache.org/jira/browse/YARN-8477
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.1.0
Reporter: Shuai Zhang


Minor code refactor on ProcfsBasedProcessTree.java to improve readability.

Split the functionality of "read the first line of file" into separate function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8467) AsyncDispatcher should have a name & display it in logs to improve debug

2018-06-28 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8467:
--
Attachment: YARN-8467.002.patch

> AsyncDispatcher should have a name & display it in logs to improve debug
> 
>
> Key: YARN-8467
> URL: https://issues.apache.org/jira/browse/YARN-8467
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Shuai Zhang
>Priority: Trivial
> Attachments: YARN-8467.001.patch, YARN-8467.002.patch
>
>
> Currently each AbstractService has a dispatcher, but the dispatcher is not 
> named. Logs from dispatcher is mixed, which is quite hard to debug any hang 
> issues. I suggest
>  # Make it possible to name AsyncDispatcher & its thread (partially done in 
> YARN-6015)
>  # Mention the AsyncDispatcher name in all its logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou resolved YARN-8476.
---
Resolution: Won't Fix

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> ---
>
> Key: YARN-8476
> URL: https://issues.apache.org/jira/browse/YARN-8476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
--
Description: (was: Hi, [~leftnoteasy]

We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
version and found some bug.

 Below is the more serious bugs I've encountered:

 
{code:java}
  LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
  assignment = queue.assignContainers(getClusterResource(), candidates,
  // TODO, now we only consider limits for parent for non-labeled
  // resources, should consider labeled resources as well.
  new ResourceLimits(labelManager
  .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
  getClusterResource())),
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);

  if (assignment.isFulfilledReservation()) {
if (withNodeHeartbeat) {
  // Only update SchedulerHealth in sync scheduling, existing
  // Data structure of SchedulerHealth need to be updated for
  // Async mode
  updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
  assignment);
}

schedulerHealth.updateSchedulerFulfilledReservationCounts(1);

ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(),
AllocationState.ALLOCATED_FROM_RESERVED);
  } else{
ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
  }

  assignment.setSchedulingMode(
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
  submitResourceCommitRequest(getClusterResource(), assignment);
}
{code}
 

Before we submit assignment to *resourceCommitterService* service, we must 
check the assignment is  greater than the *Resources. none().*

Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
which is a meaningless value. 

 

But we are still going to submit it to *resourceCommitterService* service, and 
lead to a bunch of meaningless assignments blocks other meaningful event 
processing.

 

I think this is a very serious bug!  Any Suggestions?)

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> ---
>
> Key: YARN-8476
> URL: https://issues.apache.org/jira/browse/YARN-8476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
--
Priority: Minor  (was: Blocker)

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> ---
>
> Key: YARN-8476
> URL: https://issues.apache.org/jira/browse/YARN-8476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>Priority: Minor
>
> Hi, [~leftnoteasy]
> We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
> version and found some bug.
>  Below is the more serious bugs I've encountered:
>  
> {code:java}
>   LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
>   assignment = queue.assignContainers(getClusterResource(), candidates,
>   // TODO, now we only consider limits for parent for non-labeled
>   // resources, should consider labeled resources as well.
>   new ResourceLimits(labelManager
>   .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
>   getClusterResource())),
>   SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   if (assignment.isFulfilledReservation()) {
> if (withNodeHeartbeat) {
>   // Only update SchedulerHealth in sync scheduling, existing
>   // Data structure of SchedulerHealth need to be updated for
>   // Async mode
>   updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
>   assignment);
> }
> schedulerHealth.updateSchedulerFulfilledReservationCounts(1);
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
> node, reservedContainer.getContainerId(),
> AllocationState.ALLOCATED_FROM_RESERVED);
>   } else{
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
> node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
>   }
>   assignment.setSchedulingMode(
>   SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   submitResourceCommitRequest(getClusterResource(), assignment);
> }
> {code}
>  
> Before we submit assignment to *resourceCommitterService* service, we must 
> check the assignment is  greater than the *Resources. none().*
> Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
> NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
> which is a meaningless value. 
>  
> But we are still going to submit it to *resourceCommitterService* service, 
> and lead to a bunch of meaningless assignments blocks other meaningful event 
> processing.
>  
> I think this is a very serious bug!  Any Suggestions?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
--
Description: 
Hi, [~leftnoteasy]

We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
version and found some bug.

 

Below is the more serious bugs I've encountered:

 
{code:java}
  LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
  assignment = queue.assignContainers(getClusterResource(), candidates,
  // TODO, now we only consider limits for parent for non-labeled
  // resources, should consider labeled resources as well.
  new ResourceLimits(labelManager
  .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
  getClusterResource())),
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);

  if (assignment.isFulfilledReservation()) {
if (withNodeHeartbeat) {
  // Only update SchedulerHealth in sync scheduling, existing
  // Data structure of SchedulerHealth need to be updated for
  // Async mode
  updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
  assignment);
}

schedulerHealth.updateSchedulerFulfilledReservationCounts(1);

ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(),
AllocationState.ALLOCATED_FROM_RESERVED);
  } else{
ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
  }

  assignment.setSchedulingMode(
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
  submitResourceCommitRequest(getClusterResource(), assignment);
}
{code}
 

Before we submit assignment to *resourceCommitterService* service, we must 
check the assignment is  greater than the *Resources. none().*

Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
which is a meaningless value. 

 

But we are still going to submit it to *resourceCommitterService* service, and 
lead to a bunch of meaningless assignments blocks other meaningful event 
processing.

 

I think this is a very serious bug!  Any Suggestions?

> Should check the resource of assignment is greater than Resources.none() 
> before submitResourceCommitRequest
> ---
>
> Key: YARN-8476
> URL: https://issues.apache.org/jira/browse/YARN-8476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>Priority: Blocker
>
> Hi, [~leftnoteasy]
> We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
> version and found some bug.
>  
> Below is the more serious bugs I've encountered:
>  
> {code:java}
>   LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
>   assignment = queue.assignContainers(getClusterResource(), candidates,
>   // TODO, now we only consider limits for parent for non-labeled
>   // resources, should consider labeled resources as well.
>   new ResourceLimits(labelManager
>   .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
>   getClusterResource())),
>   SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
>   if (assignment.isFulfilledReservation()) {
> if (withNodeHeartbeat) {
>   // Only update SchedulerHealth in sync scheduling, existing
>   // Data structure of SchedulerHealth need to be updated for
>   // Async mode
>   updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
>   assignment);
> }
> schedulerHealth.updateSchedulerFulfilledReservationCounts(1);
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
> node, reservedContainer.getContainerId(),
> AllocationState.ALLOCATED_FROM_RESERVED);
>   } else{
> ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
> queue.getParent().getQueueName(), queue.getQueueName(),
> ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
> 

[jira] [Updated] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-8476:
--
Description: 
Hi, [~leftnoteasy]

We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
version and found some bug.

 Below is the more serious bugs I've encountered:

 
{code:java}
  LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
  assignment = queue.assignContainers(getClusterResource(), candidates,
  // TODO, now we only consider limits for parent for non-labeled
  // resources, should consider labeled resources as well.
  new ResourceLimits(labelManager
  .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
  getClusterResource())),
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);

  if (assignment.isFulfilledReservation()) {
if (withNodeHeartbeat) {
  // Only update SchedulerHealth in sync scheduling, existing
  // Data structure of SchedulerHealth need to be updated for
  // Async mode
  updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
  assignment);
}

schedulerHealth.updateSchedulerFulfilledReservationCounts(1);

ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(),
AllocationState.ALLOCATED_FROM_RESERVED);
  } else{
ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
  }

  assignment.setSchedulingMode(
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
  submitResourceCommitRequest(getClusterResource(), assignment);
}
{code}
 

Before we submit assignment to *resourceCommitterService* service, we must 
check the assignment is  greater than the *Resources. none().*

Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
which is a meaningless value. 

 

But we are still going to submit it to *resourceCommitterService* service, and 
lead to a bunch of meaningless assignments blocks other meaningful event 
processing.

 

I think this is a very serious bug!  Any Suggestions?

  was:
Hi, [~leftnoteasy]

We recently merge https://issues.apache.org/jira/browse/YARN-5139 into our 
version and found some bug.

 

Below is the more serious bugs I've encountered:

 
{code:java}
  LeafQueue queue = ((LeafQueue) reservedApplication.getQueue());
  assignment = queue.assignContainers(getClusterResource(), candidates,
  // TODO, now we only consider limits for parent for non-labeled
  // resources, should consider labeled resources as well.
  new ResourceLimits(labelManager
  .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
  getClusterResource())),
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);

  if (assignment.isFulfilledReservation()) {
if (withNodeHeartbeat) {
  // Only update SchedulerHealth in sync scheduling, existing
  // Data structure of SchedulerHealth need to be updated for
  // Async mode
  updateSchedulerHealth(lastNodeUpdateTime, node.getNodeID(),
  assignment);
}

schedulerHealth.updateSchedulerFulfilledReservationCounts(1);

ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(),
AllocationState.ALLOCATED_FROM_RESERVED);
  } else{
ActivitiesLogger.QUEUE.recordQueueActivity(activitiesManager, node,
queue.getParent().getQueueName(), queue.getQueueName(),
ActivityState.ACCEPTED, ActivityDiagnosticConstant.EMPTY);
ActivitiesLogger.NODE.finishAllocatedNodeAllocation(activitiesManager,
node, reservedContainer.getContainerId(), AllocationState.SKIPPED);
  }

  assignment.setSchedulingMode(
  SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY);
  submitResourceCommitRequest(getClusterResource(), assignment);
}
{code}
 

Before we submit assignment to *resourceCommitterService* service, we must 
check the assignment is  greater than the *Resources. none().*

Because the assignment can be *CSAssignment(Resources.createResource(0, 0), 
NodeType.NODE_LOCAL)* after call *getRootQueue().assignContainers* method, 
which is a meaningless value. 

 


[jira] [Updated] (YARN-8455) Add basic acl check for all TS v2 REST APIs

2018-06-28 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8455:

Attachment: YARN-8455.004.patch

> Add basic acl check for all TS v2 REST APIs
> ---
>
> Key: YARN-8455
> URL: https://issues.apache.org/jira/browse/YARN-8455
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8455.001.patch, YARN-8455.002.patch, 
> YARN-8455.003.patch, YARN-8455.004.patch
>
>
> YARN-8319 filter check for flows pages. The same behavior need to be added 
> for all other REST API as long as ATS provides support for ACLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8476) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread YunFan Zhou (JIRA)
YunFan Zhou created YARN-8476:
-

 Summary: Should check the resource of assignment is greater than 
Resources.none() before submitResourceCommitRequest
 Key: YARN-8476
 URL: https://issues.apache.org/jira/browse/YARN-8476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, capacityscheduler
Reporter: YunFan Zhou
Assignee: YunFan Zhou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8475) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-06-28 Thread JackZhou (JIRA)
JackZhou created YARN-8475:
--

 Summary: Should check the resource of assignment is greater than 
Resources.none() before submitResourceCommitRequest
 Key: YARN-8475
 URL: https://issues.apache.org/jira/browse/YARN-8475
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: JackZhou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8474) sleeper service fails to launch with "Authentication Required"

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527148#comment-16527148
 ] 

genericqa commented on YARN-8474:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
50s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8474 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929658/YARN-8474.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bce47b9a478d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4d7227 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21143/testReport/ |
| Max. process+thread count | 539 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21143/console |
| Powered by | 

[jira] [Comment Edited] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527118#comment-16527118
 ] 

tianjuan edited comment on YARN-8471 at 6/29/18 3:17 AM:
-

yes, it's related to YARN-8193.   this is an applicable patch for 2.9.0


was (Author: jutia):
yes, it's related to YARN-8193.   this is a applicable patch for 2.9.0

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527118#comment-16527118
 ] 

tianjuan commented on YARN-8471:


yes, it's related to YARN-8193.   this is a applicable patch for 2.9.0

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8474) sleeper service fails to launch with "Authentication Required"

2018-06-28 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8474:
---

 Assignee: Eric Yang
Affects Version/s: 3.1.0
 Target Version/s: 3.2.0, 3.1.1

- Fix Kerberos challenge from client side.

> sleeper service fails to launch with "Authentication Required"
> --
>
> Key: YARN-8474
> URL: https://issues.apache.org/jira/browse/YARN-8474
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sumana Sathish
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8474.001.patch
>
>
> Sleeper job fails with Authentication required.
> {code}
> yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8474) sleeper service fails to launch with "Authentication Required"

2018-06-28 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8474:

Attachment: YARN-8474.001.patch

> sleeper service fails to launch with "Authentication Required"
> --
>
> Key: YARN-8474
> URL: https://issues.apache.org/jira/browse/YARN-8474
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Critical
> Attachments: YARN-8474.001.patch
>
>
> Sleeper job fails with Authentication required.
> {code}
> yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8455) Add basic acl check for all TS v2 REST APIs

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527047#comment-16527047
 ] 

genericqa commented on YARN-8455:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 14s{color} 
| {color:red} hadoop-yarn-server-timelineservice in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8455 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929653/YARN-8455.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 30216ddc3a5d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4d7227 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21142/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21142/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 

[jira] [Commented] (YARN-8455) Add basic acl check for all TS v2 REST APIs

2018-06-28 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527000#comment-16527000
 ] 

Rohith Sharma K S commented on YARN-8455:
-

While testing I found one of the REST query was not handled properly. I 
attached the patch fixing that API. 

> Add basic acl check for all TS v2 REST APIs
> ---
>
> Key: YARN-8455
> URL: https://issues.apache.org/jira/browse/YARN-8455
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8455.001.patch, YARN-8455.002.patch, 
> YARN-8455.003.patch
>
>
> YARN-8319 filter check for flows pages. The same behavior need to be added 
> for all other REST API as long as ATS provides support for ACLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8455) Add basic acl check for all TS v2 REST APIs

2018-06-28 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8455:

Attachment: YARN-8455.003.patch

> Add basic acl check for all TS v2 REST APIs
> ---
>
> Key: YARN-8455
> URL: https://issues.apache.org/jira/browse/YARN-8455
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8455.001.patch, YARN-8455.002.patch, 
> YARN-8455.003.patch
>
>
> YARN-8319 filter check for flows pages. The same behavior need to be added 
> for all other REST API as long as ATS provides support for ACLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8474) sleeper service fails to launch with "Authentication Required"

2018-06-28 Thread Sumana Sathish (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish updated YARN-8474:
-
Description: 
Sleeper job fails with Authentication required.

{code}
yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
 18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition 
from local FS: /a/YarnServiceLogs/sleeper-orig.json
 18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required
{code}

  was:
Sleeper job fails with Authentication required.

yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition from 
local FS: /a/YarnServiceLogs/sleeper-orig.json
18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required


> sleeper service fails to launch with "Authentication Required"
> --
>
> Key: YARN-8474
> URL: https://issues.apache.org/jira/browse/YARN-8474
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Critical
>
> Sleeper job fails with Authentication required.
> {code}
> yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /a/YarnServiceLogs/sleeper-orig.json
>  18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8474) sleeper service fails to launch with "Authentication Required"

2018-06-28 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-8474:


 Summary: sleeper service fails to launch with "Authentication 
Required"
 Key: YARN-8474
 URL: https://issues.apache.org/jira/browse/YARN-8474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Sumana Sathish


Sleeper job fails with Authentication required.

yarn app -launch sl1 a/YarnServiceLogs/sleeper-orig.json
18/06/28 22:00:43 INFO client.ApiServiceClient: Loading service definition from 
local FS: /a/YarnServiceLogs/sleeper-orig.json
18/06/28 22:00:44 ERROR client.ApiServiceClient: Authentication required



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-06-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526965#comment-16526965
 ] 

Sunil Govindan commented on YARN-7863:
--

Attaching first version for this patch. I will add more test cases and other 
enhancements in next patch.

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes

2018-06-28 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-7863:
-
Attachment: YARN-7863.v0.patch

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-06-28 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526926#comment-16526926
 ] 

Haibo Chen commented on YARN-1013:
--

{quote} where is the enforcement flag?
{quote}
It is per ResourceRequest, included in the ExecutionTypeRequest of a 
ResourceRequest.  Essentially, a RequestRequest can opt out of oversubscription 
by setting its enforcement flag to true.  (G, false) requests can start eagerly 
as O containers, but there is a possibility that the O containers can sometimes 
be preempted if the node is running hot. Applications can decide for themselves 
what tasks are critical enough that the risk of starting as O containers and 
being preempted is not acceptable.  YARN-8240 added control on a queue level, 
that is, if a queue opts out of oversubscription, all applications running in 
the queue will never get Opportunistic containers for their (G, false) 
requests. 
{quote}Does this considers resource usages for O container or it is just 
consider G container usages?
{quote}
The fair scheduler policy (SchedulingPolicy) is plug-able, so FairScheduler 
queues can be sorted with O resource usage of the queue in mind.

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Weiwei Yang
>Priority: Major
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526924#comment-16526924
 ] 

Yufei Gu commented on YARN-8468:


Sounds good to me. Thanks [~mrbillau].

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread Mike Billau (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526900#comment-16526900
 ] 

Mike Billau commented on YARN-8468:
---

Hi [~yufeigu], [~bsteinbach], and team - sorry for the delay.

The motivation behind this case came from one of my customers who has a very 
large cluster and many different users. They are using FairScheduler and have 
many different rules set up. Overall they are using 
"yarn.scheduler.maximum-allocation-mb" to limit the size of containers that 
their users create - this is to gently encourage the users to write "better" 
jobs and not just request massive containers. This is working fine, except once 
in a while they actually DO need to create massive containers for enterprise 
jobs. Originally we were looking for ways to "exclude" these specific 
enterprise jobs from this maximum-allocation-mb, but since this property is set 
globally and applies to all queues, there was no way to do this. If we could 
set this property at a per-queue basis we could achieve this.

Additionally, it looks like you CAN already set this maximum-allocation-mb 
setting on a per queue basis for the CapacityScheduler, so this ticket would 
add feature parity with the FairScheduler. Under queue properties for teh 
CapacityScheduler doc page, we read: 
"The per queue maximum limit of memory to allocate to each container request at 
the Resource Manager. This setting overrides the cluster configuration 
yarn.scheduler.maximum-allocation-mb. This value must be smaller than or equal 
to the cluster maximum."

Hopefully that is enough justification - please let me know if you guys need 
anything else! I don't have voting power but I agree that the naming scheme is 
not friendly to newcomers.

 

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread Xiao Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526881#comment-16526881
 ] 

Xiao Liang commented on YARN-8471:
--

Thanks [~jutia], is it related to YARN-8193 ?

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526869#comment-16526869
 ] 

Yufei Gu commented on YARN-8468:


[~bsteinbach] since you filed this jira and provided the patch, you have the 
responsibility to justify the motivation. However, I am OK with this feature.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-06-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526815#comment-16526815
 ] 

Jason Lowe commented on YARN-8473:
--

Sample error transitions from the NM log:
{noformat}
2018-06-21 22:10:08,433 [AsyncDispatcher event handler] WARN 
application.ApplicationImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
INIT_CONTAINER at FINISHING_CONTAINERS_WAIT
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1325)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1317)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}
{noformat}
2018-06-21 22:10:09,020 [AsyncDispatcher event handler] WARN 
application.ApplicationImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
INIT_CONTAINER at FINISHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1325)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1317)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}


> Containers being launched as app tears down can leave containers in NEW state
> -
>
> Key: YARN-8473
> URL: https://issues.apache.org/jira/browse/YARN-8473
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
>
> I saw a case where containers were stuck on a nodemanager in the NEW state 
> because they tried to launch just as an application was tearing down.  The 
> container sent an INIT_CONTAINER event to the ApplicationImpl which then 
> executed an invalid transition since that event is not handled/expected when 
> the application is in the process of tearing down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-06-28 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8473:


 Summary: Containers being launched as app tears down can leave 
containers in NEW state
 Key: YARN-8473
 URL: https://issues.apache.org/jira/browse/YARN-8473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.4
Reporter: Jason Lowe
Assignee: Jason Lowe


I saw a case where containers were stuck on a nodemanager in the NEW state 
because they tried to launch just as an application was tearing down.  The 
container sent an INIT_CONTAINER event to the ApplicationImpl which then 
executed an invalid transition since that event is not handled/expected when 
the application is in the process of tearing down.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526745#comment-16526745
 ] 

genericqa commented on YARN-8451:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929613/YARN-8451.v2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b2897a23a46d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2911943 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21141/testReport/ |
| Max. process+thread count | 302 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21141/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Multiple NM heartbeat thread created when a slow NM resync 

[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526655#comment-16526655
 ] 

Botong Huang commented on YARN-8451:


Good point, fixed in v2 patch!

> Multiple NM heartbeat thread created when a slow NM resync with RM
> --
>
> Key: YARN-8451
> URL: https://issues.apache.org/jira/browse/YARN-8451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8451.v1.patch, YARN-8451.v2.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8451:
---
Attachment: YARN-8451.v2.patch

> Multiple NM heartbeat thread created when a slow NM resync with RM
> --
>
> Key: YARN-8451
> URL: https://issues.apache.org/jira/browse/YARN-8451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8451.v1.patch, YARN-8451.v2.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526650#comment-16526650
 ] 

Jason Lowe commented on YARN-8451:
--

Ah, sorry, I missed that there was a thread earlier as well.  Should have used 
{{diff -b}} after applying the patch. ;-)

Since the AtomicBoolean is referred to as a lock, I'd like to see it treated as 
such where the release of it is in a {{finally}} block so it's always released 
even if exceptions occur.  Otherwise looks good.


> Multiple NM heartbeat thread created when a slow NM resync with RM
> --
>
> Key: YARN-8451
> URL: https://issues.apache.org/jira/browse/YARN-8451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8451.v1.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526640#comment-16526640
 ] 

Botong Huang commented on YARN-8451:


Hi [~jlowe], I am actually not changing this behavior (not to block dispatcher 
for resync), existing code has been creating a new thread for it. I think the 
reason is that resync involves draining existing heartbeat thread and a 
register call to RM, which can take a long time (say network slow or RM is down 
during master-slave switch). We don't want to block the entire NM for this. It 
maybe much more involved if we want to change this behavior. 

> Multiple NM heartbeat thread created when a slow NM resync with RM
> --
>
> Key: YARN-8451
> URL: https://issues.apache.org/jira/browse/YARN-8451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8451.v1.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2018-06-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526621#comment-16526621
 ] 

Wangda Tan commented on YARN-8453:
--

+1 to the patch, thanks [~sunilg].

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8453.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Improve balancing resources in already satisfied queues by using Capacity Scheduler preemption

2018-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526603#comment-16526603
 ] 

Hudson commented on YARN-8379:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14497/])
YARN-8379. Improve balancing resources in already satisfied queues by (sunilg: 
rev 291194302cc1a875d6d94ea93cf1184a3f1fc2cc)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptableResourceCalculator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionContext.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoCandidatesSelector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/AbstractPreemptableResourceCalculator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/QueuePriorityContainerCandidateSelector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestPreemptionForQueueWithPriorities.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TempQueuePerPartition.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/IntraQueueCandidatesSelector.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyPreemptToBalance.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptionCandidatesSelector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ReservedContainerCandidatesSelector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerSurgicalPreemption.java


> Improve balancing resources in already satisfied queues by using Capacity 
> Scheduler preemption
> --
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, YARN-8379.004.patch, YARN-8379.005.patch, 
> YARN-8379.006.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We 

[jira] [Updated] (YARN-8472) YARN Container Phase 2

2018-06-28 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8472:

Description: 
In YARN-3611, we have implemented basic Docker container support for YARN.  
This story is the next phase to improve container usability.

Several area for improvements are:
 # Software defined network support
 # Interactive shell to container
 # User management sss/nscd integration
 # Runc/containerd support
 # Metrics/Logs integration with Timeline service v2

  was:
In YARN-3611, we have implemented basic Docker container support for YARN.  
This story is the next phase to improve container usability.

Several area for improvements are:
 # Software defined network support
 # Interactive shell to container
 # User management sss/nscd integration
 # Runc/containerd support


> YARN Container Phase 2
> --
>
> Key: YARN-8472
> URL: https://issues.apache.org/jira/browse/YARN-8472
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
>
> In YARN-3611, we have implemented basic Docker container support for YARN.  
> This story is the next phase to improve container usability.
> Several area for improvements are:
>  # Software defined network support
>  # Interactive shell to container
>  # User management sss/nscd integration
>  # Runc/containerd support
>  # Metrics/Logs integration with Timeline service v2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8472) YARN Container Phase 2

2018-06-28 Thread Eric Yang (JIRA)
Eric Yang created YARN-8472:
---

 Summary: YARN Container Phase 2
 Key: YARN-8472
 URL: https://issues.apache.org/jira/browse/YARN-8472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Yang


In YARN-3611, we have implemented basic Docker container support for YARN.  
This story is the next phase to improve container usability.

Several area for improvements are:
 # Software defined network support
 # Interactive shell to container
 # User management sss/nscd integration
 # Runc/containerd support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8455) Add basic acl check for all TS v2 REST APIs

2018-06-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526586#comment-16526586
 ] 

Sunil Govindan commented on YARN-8455:
--

Latest patch looks good to me. +1

Committing shortly.

> Add basic acl check for all TS v2 REST APIs
> ---
>
> Key: YARN-8455
> URL: https://issues.apache.org/jira/browse/YARN-8455
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8455.001.patch, YARN-8455.002.patch
>
>
> YARN-8319 filter check for flows pages. The same behavior need to be added 
> for all other REST API as long as ATS provides support for ACLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8379) Improve balancing resources in already satisfied queues by using Capacity Scheduler preemption

2018-06-28 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8379:
-
Summary: Improve balancing resources in already satisfied queues by using 
Capacity Scheduler preemption  (was: Add an option to allow Capacity Scheduler 
preemption to balance satisfied queues)

> Improve balancing resources in already satisfied queues by using Capacity 
> Scheduler preemption
> --
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, YARN-8379.004.patch, YARN-8379.005.patch, 
> YARN-8379.006.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-28 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526578#comment-16526578
 ] 

Chandni Singh commented on YARN-8409:
-

Thanks [~eyang] for reviewing and merging the patch.

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8378) Missing default implementation of loading application with FileSystemApplicationHistoryStore

2018-06-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526570#comment-16526570
 ] 

Sunil Govindan commented on YARN-8378:
--

Changes seems fine to me. [~rohithsharma] could u pls help to take a look.

> Missing default implementation of loading application with 
> FileSystemApplicationHistoryStore 
> -
>
> Key: YARN-8378
> URL: https://issues.apache.org/jira/browse/YARN-8378
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Attachments: YARN-8378.1.patch
>
>
> [YARN-3700|https://issues.apache.org/jira/browse/YARN-3700] and 
> [YARN-3787|https://issues.apache.org/jira/browse/YARN-3787] add some 
> limitations (number, time) to loading applications from yarn timelineservice. 
> But this API missing the default implementation when we use 
> FileSystemApplicationHistoryStore for applicationhistoryservice instead of 
> using timelineservice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526561#comment-16526561
 ] 

Hudson commented on YARN-8409:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14496 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14496/])
YARN-8409.  Fixed NPE in ActiveStandbyElectorBasedElectorService.
(eyang: rev 384764cdeac6490bc47fa0eb7b936baa4c0d3230)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java


> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-28 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8414.
-
Resolution: Cannot Reproduce

This has not happened in the last two weeks of stress test.  Close this as can 
not reproduce.

> Nodemanager crashes soon if ATSv2 HBase is either down or absent
> 
>
> Key: YARN-8414
> URL: https://issues.apache.org/jira/browse/YARN-8414
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Critical
>
> Test cluster has 1000 apps running, and a user trigger capacity scheduler 
> queue changes.  This crashes all node managers.  It looks like node manager 
> encounter too many files open while aggregating logs for containers:
> {code}
> 2018-06-07 21:17:59,307 WARN  server.AbstractConnector 
> (AbstractConnector.java:handleAcceptFailure(544)) -
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:371)
> at 
> org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:17:59,758 WARN  util.SysInfoLinux 
> (SysInfoLinux.java:readProcMemInfoFile(238)) - Couldn't read /proc/meminfo; 
> can't determine memory settings
> 2018-06-07 21:18:00,842 WARN  client.ConnectionUtils 
> (ConnectionUtils.java:getStubKey(236)) - Can not resolve host12.example.com, 
> please check your network
> java.net.UnknownHostException: host1.example.com: System error
> at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
> at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
> at 
> org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:111)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Timeline service has thousands of exceptions:
> {code}
> 2018-06-07 21:18:34,182 ERROR client.AsyncProcess 
> (AsyncProcess.java:submit(291)) - Failed to get region location
> java.io.InterruptedIOException
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:265)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
> at 
> 

[jira] [Commented] (YARN-7690) Expose reserved Memory/Vcores of Node Manager at WebUI

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526550#comment-16526550
 ] 

genericqa commented on YARN-7690:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-7690 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7690 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12903982/YARN-7690.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21140/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Expose reserved Memory/Vcores of Node Manager at WebUI
> --
>
> Key: YARN-7690
> URL: https://issues.apache.org/jira/browse/YARN-7690
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-7690.patch
>
>
> Now only total reserved memory/Vcores are exposed at RM Web UI, reserved 
> memory/Vcores of a single nodemanager is hard to find out. It confuses users 
> that they observe that there are available memory/Vcores at nodes page, but 
> their jobs are stuck and waiting for resource to be allocated. It is helpful 
> for debug to expose reserved memory/Vcores of every single nodemanager, and 
> memory/Vcores that can be allocated (unallocated minus reserved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7690) Expose reserved Memory/Vcores of Node Manager at WebUI

2018-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526505#comment-16526505
 ] 

Íñigo Goiri commented on YARN-7690:
---

Thanks [~jutia] for the patch.
Can you add unit tests for this?
A couple screenshots would also be helpful.

> Expose reserved Memory/Vcores of Node Manager at WebUI
> --
>
> Key: YARN-7690
> URL: https://issues.apache.org/jira/browse/YARN-7690
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-7690.patch
>
>
> Now only total reserved memory/Vcores are exposed at RM Web UI, reserved 
> memory/Vcores of a single nodemanager is hard to find out. It confuses users 
> that they observe that there are available memory/Vcores at nodes page, but 
> their jobs are stuck and waiting for resource to be allocated. It is helpful 
> for debug to expose reserved memory/Vcores of every single nodemanager, and 
> memory/Vcores that can be allocated (unallocated minus reserved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7690) Expose reserved Memory/Vcores of Node Manager at WebUI

2018-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7690:
--
Description: Now only total reserved memory/Vcores are exposed at RM Web 
UI, reserved memory/Vcores of a single nodemanager is hard to find out. It 
confuses users that they observe that there are available memory/Vcores at 
nodes page, but their jobs are stuck and waiting for resource to be allocated. 
It is helpful for debug to expose reserved memory/Vcores of every single 
nodemanager, and memory/Vcores that can be allocated (unallocated minus 
reserved).  (was: now only total reserved memory/Vcores are exposed at RM 
webUI, reserved memory/Vcores of a single nodemanager is hard to find out. it 
confuses users that they obeserve that there are available memory/Vcores at 
nodes page, but their jobs are stuck and waiting for resouce to be allocated. 
It is helpful for bedug to expose reserved memory/Vcores of every single 
nodemanager, and memory/Vcores that can be allocated( unallocated minus 
reserved))

> Expose reserved Memory/Vcores of Node Manager at WebUI
> --
>
> Key: YARN-7690
> URL: https://issues.apache.org/jira/browse/YARN-7690
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-7690.patch
>
>
> Now only total reserved memory/Vcores are exposed at RM Web UI, reserved 
> memory/Vcores of a single nodemanager is hard to find out. It confuses users 
> that they observe that there are available memory/Vcores at nodes page, but 
> their jobs are stuck and waiting for resource to be allocated. It is helpful 
> for debug to expose reserved memory/Vcores of every single nodemanager, and 
> memory/Vcores that can be allocated (unallocated minus reserved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526499#comment-16526499
 ] 

Íñigo Goiri commented on YARN-8471:
---

Can you also post the stack traces and describe how this can happen?

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7690) Expose reserved Memory/Vcores of Node Manager at WebUI

2018-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-7690:
-

Assignee: tianjuan

> Expose reserved Memory/Vcores of Node Manager at WebUI
> --
>
> Key: YARN-7690
> URL: https://issues.apache.org/jira/browse/YARN-7690
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-7690.patch
>
>
> now only total reserved memory/Vcores are exposed at RM webUI, reserved 
> memory/Vcores of a single nodemanager is hard to find out. it confuses users 
> that they obeserve that there are available memory/Vcores at nodes page, but 
> their jobs are stuck and waiting for resouce to be allocated. It is helpful 
> for bedug to expose reserved memory/Vcores of every single nodemanager, and 
> memory/Vcores that can be allocated( unallocated minus reserved)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7690) Expose reserved Memory/Vcores of Node Manager at WebUI

2018-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7690:
--
Summary: Expose reserved Memory/Vcores of Node Manager at WebUI  (was: 
expose reserved memory/Vcores of  nodemanager at webUI)

> Expose reserved Memory/Vcores of Node Manager at WebUI
> --
>
> Key: YARN-7690
> URL: https://issues.apache.org/jira/browse/YARN-7690
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: tianjuan
>Priority: Major
> Attachments: YARN-7690.patch
>
>
> now only total reserved memory/Vcores are exposed at RM webUI, reserved 
> memory/Vcores of a single nodemanager is hard to find out. it confuses users 
> that they obeserve that there are available memory/Vcores at nodes page, but 
> their jobs are stuck and waiting for resouce to be allocated. It is helpful 
> for bedug to expose reserved memory/Vcores of every single nodemanager, and 
> memory/Vcores that can be allocated( unallocated minus reserved)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-28 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526497#comment-16526497
 ] 

Eric Yang commented on YARN-8409:
-

[~csingh] Thank you for the patch.  TestAppManager error seems to be Jenkins 
out of resource to fork.  The error doesn't happen when I ran the unit test 
locally.  RM handles ZooKeeper unavailability more gracefully with this patch.  
+1 on this patch, and will commit shortly.

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-8471:
-

Assignee: tianjuan

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526496#comment-16526496
 ] 

Íñigo Goiri commented on YARN-8471:
---

Thanks [~jutia] for the patch, does this apply to trunk too? If so you need to 
provide one for trunk and another one for branch-2 (or branch-2.9).
Can we add a unit test to reproduce this?

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8471:
--
Description: At some point RM just hangs and stops allocating resources. At 
the point RM get hangs, YARN throws NullPointerException at 
RegularContainerAllocator#allocate, and 
RegularContainerAllocator#preCheckForPlacementSet, and 
RegularContainerAllocator#getLocalityWaitFactor.  (was: at some point RM just 
hangs and stops allocating resources. At the point RM get hangs, YARN throw 
NullPointerException  at RegularContainerAllocator#allocate, and 
RegularContainerAllocator#preCheckForPlacementSet, and 
RegularContainerAllocator#getLocalityWaitFactor)

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> At some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throws NullPointerException at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes

2018-06-28 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526412#comment-16526412
 ] 

Bibin A Chundatt commented on YARN-8103:


Thank you [~Naganarasimha] for review and commit

> Add CLI interface to  query node attributes
> ---
>
> Key: YARN-8103
> URL: https://issues.apache.org/jira/browse/YARN-8103
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: YARN-3409
>
> Attachments: YARN-8103-YARN-3409.001.patch, 
> YARN-8103-YARN-3409.002.patch, YARN-8103-YARN-3409.003.patch, 
> YARN-8103-YARN-3409.004.patch, YARN-8103-YARN-3409.005.patch, 
> YARN-8103-YARN-3409.006.patch, YARN-8103-YARN-3409.WIP.patch
>
>
> YARN-8100 will add API interface for querying the attributes. CLI interface 
> for querying node attributes for each nodes and list all attributes in 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-06-28 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526393#comment-16526393
 ] 

Bibin A Chundatt commented on YARN-8459:


[~leftnoteasy]

Can we remove the CapacityScheduler#allocate

{code}
LOG.info("Allocation for application " + applicationAttemptId + " : " +
allocation + " with cluster resource : " + getClusterResource());
{code}
Observed in one of cluster seems to be flooding the logs .. Since its printed 
even of allocation is empty.. thoughts?

> Improve logs of Capacity Scheduler to better debug invalid states
> -
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8459.001.patch, YARN-8459.002.patch, 
> YARN-8459.003.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526350#comment-16526350
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

[~szegedim], "yarn.scheduler.maximum-allocation-mb" is the existing property 
for general container resource allocation setting not a new one, but I agree 
the name is a bit confusing.

[~yufeigu], can you please ask the details from [~mrbillau]?

[~haibochen], can you please review YARN-7556? This patch is created based on 
that.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526343#comment-16526343
 ] 

Jason Lowe commented on YARN-8451:
--

Thanks for the report and patch!

Are we sure it's safe to unblock the Nodemanager's async dispatcher during this 
reboot?  I'm worried that other events could be dispatched to subsystems while 
they are trying to reset and cause other problems.  I think it would be simpler 
and safer to have NodeManager#resyncWithRM check a "resyncing" boolean when 
it's called, avoiding redundantly resyncing if it is currently resyncing.  No 
need for separate threads and atomic booleans.


> Multiple NM heartbeat thread created when a slow NM resync with RM
> --
>
> Key: YARN-8451
> URL: https://issues.apache.org/jira/browse/YARN-8451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8451.v1.patch
>
>
> During a NM resync with RM (say RM did a master slave switch), if NM is 
> running slow, more than one RESYNC event may be put into the NM dispatcher by 
> the existing heartbeat thread before they are processed. As a result, 
> multiple new heartbeat thread are later created and start to hb to RM 
> concurrently with their own responseId. If at some point of time, one thread 
> becomes more than one step behind others, RM will send back a resync signal 
> in this heartbeat response, killing all containers in this NM. 
> See comments below for details on how this can happen. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526331#comment-16526331
 ] 

genericqa commented on YARN-8468:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8468 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8468 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929579/YARN-8468.000.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21139/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-28 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526328#comment-16526328
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thank you for great explanation. I am able to understand the flow 
better now.

I revisited "move apps" problem which i raised earlier based on new patch and 
don't think it requires any changes as variables required to calculate 
numActiveUsersWithOnlyPendingApps are already being set through 
submitApplication, finishApplication etc calls. However, I am seeing an minor 
update issue as described below:

Lets say, We want to move all apps from queue, A1 to queue, B1. A1 has 4 apps 
(Only 2 were accommodated because of max am limit constraint. So, remaining 2 
not yet activated). All these 4 apps are triggered by different users from u1 
to u4. For example app1 by u1 and so on. Only for app 1 & app2, there is an 
allocate request in pipeline. At this point, {{numActiveUsers}} is 4 and 
{{numActiveUsersWithOnlyPendingApps}} is 2 in Queue, A1. Now move has been 
triggered. Since there were running containers for both app 1 and app 2, app3 
and app4 has been activated before app 1 and app 2 in Queue, B1 as both these 
apps were busy in detaching and attaching containers. After the move operation 
and thread sleep of 5s, pulled these counts expecting u1 and u2 as 
ActiveUsersWithOnlyPendingApps, but couldn't able to see it. {{numActiveUsers}} 
is 2 as u3 and u4 had become active users and 
{{numActiveUsersWithOnlyPendingApps}} is 0 in Queue B1. Then, introduced an 
NodeUpdate event after the move operation just to force the user limit 
computation to see the impact on these counts. Now, can able to 
ActiveUsersWithOnlyPendingApps as 2 and ActiveUsers as 0 (as both u3 and u4 had 
become non active users by this time as there are no pending allocate request).

So, after move app operation and if there is no events (which can trigger user 
limit computation) for brief amount of time, am seeing incorrect 
{{numActiveUsersWithOnlyPendingApps}} count. Is this acceptable? or Should we 
trigger user limit computation after move operation like how we are doing it in 
other places? Please share your thoughts and correct my understanding if you 
see a gap

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8467) AsyncDispatcher should have a name & display it in logs to improve debug

2018-06-28 Thread Shuai Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526178#comment-16526178
 ] 

Shuai Zhang commented on YARN-8467:
---

Because it's just related to debug logs, there's no need to add new unit tests.

 

> AsyncDispatcher should have a name & display it in logs to improve debug
> 
>
> Key: YARN-8467
> URL: https://issues.apache.org/jira/browse/YARN-8467
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Shuai Zhang
>Priority: Trivial
> Attachments: YARN-8467.001.patch
>
>
> Currently each AbstractService has a dispatcher, but the dispatcher is not 
> named. Logs from dispatcher is mixed, which is quite hard to debug any hang 
> issues. I suggest
>  # Make it possible to name AsyncDispatcher & its thread (partially done in 
> YARN-6015)
>  # Mention the AsyncDispatcher name in all its logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8378) Missing default implementation of loading application with FileSystemApplicationHistoryStore

2018-06-28 Thread Lantao Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526165#comment-16526165
 ] 

Lantao Jin commented on YARN-8378:
--

[~sunilg] Could you have a time to review?

> Missing default implementation of loading application with 
> FileSystemApplicationHistoryStore 
> -
>
> Key: YARN-8378
> URL: https://issues.apache.org/jira/browse/YARN-8378
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Attachments: YARN-8378.1.patch
>
>
> [YARN-3700|https://issues.apache.org/jira/browse/YARN-3700] and 
> [YARN-3787|https://issues.apache.org/jira/browse/YARN-3787] add some 
> limitations (number, time) to loading applications from yarn timelineservice. 
> But this API missing the default implementation when we use 
> FileSystemApplicationHistoryStore for applicationhistoryservice instead of 
> using timelineservice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan updated YARN-8471:
---
Affects Version/s: 2.9.0

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> at some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throw NullPointerException  at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526071#comment-16526071
 ] 

genericqa commented on YARN-8471:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8471 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8471 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929529/YARN-8471.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21138/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
> Attachments: YARN-8471.001.patch
>
>
> at some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throw NullPointerException  at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8471) YARN RM hangs and stops allocating resources when applications successively running

2018-06-28 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan updated YARN-8471:
---
Summary: YARN RM hangs and stops allocating resources when applications 
successively running  (was: YARN throw NullPointerException at 
RegularContainerAllocator)

> YARN RM hangs and stops allocating resources when applications successively 
> running
> ---
>
> Key: YARN-8471
> URL: https://issues.apache.org/jira/browse/YARN-8471
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: tianjuan
>Priority: Major
> Fix For: 2.9.0
>
>
> at some point RM just hangs and stops allocating resources. At the point RM 
> get hangs, YARN throw NullPointerException  at 
> RegularContainerAllocator#allocate, and 
> RegularContainerAllocator#preCheckForPlacementSet, and 
> RegularContainerAllocator#getLocalityWaitFactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8471) YARN throw NullPointerException at RegularContainerAllocator

2018-06-28 Thread tianjuan (JIRA)
tianjuan created YARN-8471:
--

 Summary: YARN throw NullPointerException at 
RegularContainerAllocator
 Key: YARN-8471
 URL: https://issues.apache.org/jira/browse/YARN-8471
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: tianjuan
 Fix For: 2.9.0


at some point RM just hangs and stops allocating resources. At the point RM get 
hangs, YARN throw NullPointerException  at RegularContainerAllocator#allocate, 
and RegularContainerAllocator#preCheckForPlacementSet, and 
RegularContainerAllocator#getLocalityWaitFactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-06-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526017#comment-16526017
 ] 

genericqa commented on YARN-8435:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
35s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8435 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929515/YARN-8435.v5.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6d12671ea775 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ddbff7c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21137/testReport/ |
| Max. process+thread count | 676 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21137/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NPE when the same client simultaneously contact for the first time Yarn Router

[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-06-28 Thread rangjiaheng (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525975#comment-16525975
 ] 

rangjiaheng commented on YARN-8435:
---

Thanks [~giovanni.fumarola] for review. Sorry for misunderstood what you mean 
yesterday.

The java doc is OK in YARN-8435.v5.patch.

 

> NPE when the same client simultaneously contact for the first time Yarn Router
> --
>
> Key: YARN-8435
> URL: https://issues.apache.org/jira/browse/YARN-8435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: router
>Affects Versions: 2.9.0, 3.0.2
>Reporter: rangjiaheng
>Priority: Critical
> Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, 
> YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch
>
>
> When Two client process (with the same user name and the same hostname) begin 
> to connect to yarn router at the same time, to submit application, kill 
> application, ... and so on, then a java.lang.NullPointerException may throws 
> from yarn router.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-06-28 Thread rangjiaheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-8435:
--
Attachment: YARN-8435.v5.patch

> NPE when the same client simultaneously contact for the first time Yarn Router
> --
>
> Key: YARN-8435
> URL: https://issues.apache.org/jira/browse/YARN-8435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: router
>Affects Versions: 2.9.0, 3.0.2
>Reporter: rangjiaheng
>Priority: Critical
> Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, 
> YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch
>
>
> When Two client process (with the same user name and the same hostname) begin 
> to connect to yarn router at the same time, to submit application, kill 
> application, ... and so on, then a java.lang.NullPointerException may throws 
> from yarn router.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org