date:20150826

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-26 Thread Inigo Goiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716178#comment-14716178
 ] 

Inigo Goiri commented on YARN-1012:
---

I think this is very YARN specific. It relies on the ResourceCalculator and so 
on which come from Common though.

Regarding adding network and disk usage, I fully agree. You guys should first 
extend ResourceUtilization (as done in this patch) to support disk and network 
and then extend the node resource monitor (YARN-3534) to collect it from the 
node.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Fix For: 2.8.0
>
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716162#comment-14716162
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m 37s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752667/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4cbbfa2 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8924/console |


This message was automatically generated.

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, 
> yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716153#comment-14716153
 ] 

Jian He commented on YARN-2884:
---

Looks good to me overall, I think there are still some problems with the 
AMRMProxyToken implementation. Basically, long running service may not work 
with the AMRMProxy.

1) below code in DefaultRequestInterceptor should create and return a new 
AMRMProxyToken in the final returned allocate response when needed. Otherwise, 
AM will fail to talk with AMRMTokenProxy after the key is rolled over in the 
AMRMTokenProxySecretManager. 
{code}
  @Override
  public AllocateResponse allocate(AllocateRequest request)
  throws YarnException, IOException {
if (LOG.isDebugEnabled()) {
  LOG.debug("Forwarding allocate request to the real YARN RM");
}
AllocateResponse allocateResponse = rmClient.allocate(request);
if (allocateResponse.getAMRMToken() != null) {
  updateAMRMToken(allocateResponse.getAMRMToken());
}
return allocateResponse; <
  }
{code}
 Below code in ApplicationMasterService#allocate shows how that is done.
{code}
  if (nextMasterKey != null
  && nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
.getKeyId()) {
RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;
Token amrmToken = appAttempt.getAMRMToken();
if (nextMasterKey.getMasterKey().getKeyId() !=
appAttemptImpl.getAMRMTokenKeyId()) {
  LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back"
  + " to application: " + applicationId);
  amrmToken = rmContext.getAMRMTokenSecretManager()
  .createAndGetAMRMToken(appAttemptId);
  appAttemptImpl.setAMRMToken(amrmToken);
}
allocateResponse.setAMRMToken(org.apache.hadoop.yarn.api.records.Token
  .newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
.toString(), amrmToken.getPassword(), amrmToken.getService()
.toString()));
  }
{code}
2)  Some methods inside the AMRMProxyTokenSecretManager are not used at all. we 
may remove them ?

3) I think we need at least 1 end-to-end test for this. We can use 
MiniYarnCluster to simulate the whole thing. AM  talks with AMRMProxy which  
talks with RM to register/allocate/finish. In the test, we should also reduce 
the RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS so that we can simulate the 
token renew behavior.  I'm ok to have a separate jira to track the end-to-end 
test, as this is a bit of work.


> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716146#comment-14716146
 ] 

Jian He commented on YARN-4083:
---

One other thing to think about is what if NM died, should AM fall back to the 
RM?
Also, in case of RM HA, there will be multiple RM scheduler addresses, simply 
swapping out a single scheduler address will not work.

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

retrigger

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, 
> yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716119#comment-14716119
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m 39s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752657/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4cbbfa2 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8923/console |


This message was automatically generated.

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.POC.001.patch

Posting a v.1 POC patch. This implements the first query (the flow activity 
query). I'll follow it up with another one tomorrow that implements the second 
one too.

This is to get the design choices and correctness reviewed first. It does
- include the flow activity query as part of getEntities()
- create a data container for the flow activity table called FlowActivityEntity

It probably needs a fair amount of refactoring to make the reader code more 
manageable. Also, I need to add unit tests. They will come later.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

Updated configuration in FairSchedulerConfiguration

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

Attaching a patch based on the multiple of increment approach

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, yARN-3920.001.patch, 
> yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716062#comment-14716062
 ] 

Srikanth Kandula commented on YARN-1011:


+1


> [Umbrella] RM should dynamically schedule containers based on utilization of 
> currently allocated containers
> ---
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-26 Thread Srikanth Kandula (JIRA)

Srikanth Kandula created YARN-4088:
--

 Summary: RM should be able to process heartbeats from NM 
asynchronously
 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula


Today, the RM sequentially processes one heartbeat after another. 

Imagine a 3000 server cluster with each server heart-beating every 3s. This 
gives the RM 1ms on average to process each NM heartbeat. That is tough.

It is true that there are several underlying datastructures that will be 
touched during heartbeat processing. So, it is non-trivial to parallelize the 
NM heartbeat. Yet, it is quite doable...

Parallelizing the NM heartbeat would substantially improve the scalability of 
the RM, allowing it to either 
a) run larger clusters or 
b) support faster heartbeats or dynamic scaling of heartbeats
c) take more asks from each application or 
c) use cleverer/ more expensive algorithms such as node labels or better 
packing or ...

Indeed the RM's scalability limit has been cited as the motivating reason for a 
variety of efforts which will become less needed if this can be solved. Ditto 
for slow heartbeats.  See Sparrow and Mercury papers for example.

Can we take a shot at this?
If not, could we discuss why.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716034#comment-14716034
 ] 

Srikanth Kandula commented on YARN-1011:


This is a great idea. Is there an ETA for this? Could you comment on whether it 
is being deprioritized for some reason?

> [Umbrella] RM should dynamically schedule containers based on utilization of 
> currently allocated containers
> ---
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716032#comment-14716032
 ] 

Srikanth Kandula commented on YARN-3980:


+1 this would be very useful to have... Will enable even better packing.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716031#comment-14716031
 ] 

Srikanth Kandula commented on YARN-3534:


[~elgoiri], [~kasha], could you comment on extending this to also take in 
network and disk usage information?

> Collect memory/cpu usage on the node
> 
>
> Key: YARN-3534
> URL: https://issues.apache.org/jira/browse/YARN-3534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Fix For: 2.8.0
>
> Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
> YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
> YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, 
> YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, 
> YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
> YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
> YARN-3534-9.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> YARN should be aware of the resource utilization of the nodes when scheduling 
> containers. For this, this task will implement the collection of memory/cpu 
> usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716029#comment-14716029
 ] 

Srikanth Kandula commented on YARN-1012:


[~elgoiri], [~kasha] Could you comment on whether this should go into hadoop 
common. Also, it may be worthwhile to extend this to also account for network 
and disk usages of the containers... See Hadoop 12210.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Fix For: 2.8.0
>
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716023#comment-14716023
 ] 

Srikanth Kandula commented on YARN-2745:


[~aw] Done by [~chris.douglas]!

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716021#comment-14716021
 ] 

Srikanth Kandula commented on YARN-2745:


[~vinodkv] Thanks for the related. The efforts are complementary. Indeed, 
adapting assignment based on the dynamic usage would be a good thing to have. 
This JIRA is more about packing based on anticipated usages as indicated by the 
ask. Dynamic packing would be even better.


> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716019#comment-14716019
 ] 

Srikanth Kandula commented on YARN-2745:


Just a brief update on this JIRA... 

1) [~chris.douglas] pushed through "collection" of network and disk usages to 
Hadoop common. See Hadoop 12210. 

2) [~elgoiri] and [~kasha] in Yarn 3534 and Yarn 3980 collecting cpu and memory 
info of containers, push that information from the NM to the RM and make it 
available to the scheduler.

3) Packing requires the scheduler to look past the first "schedulable" task 
discovered by the capacity scheduler loop. Based on the feedback above, we have 
decoupled the architectural change needed from the actual packing policy. See 
Yarn 4056, called bundling. Many different packing policies are allowed in the 
bundle.

4) These changes are complementary and orthogonal to Yarn-1011. That JIRA 
recommends, rightly, to adapt RM allocation based on dynamic resource usage of 
the allocated containers. This JIRA is more about packing containers. It 
currently does so based on expected resource usages as indicated in the ask. 
Indeed, packing based on dynamic usage information would be strictly better and 
is left for future work.

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-26 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715998#comment-14715998
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Thanks Sunil G for reviewing the patch. The test case failure are unrelated to 
this patch!!

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-26 Thread Srikanth Kandula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715991#comment-14715991
 ] 

Srikanth Kandula commented on YARN-4081:


Extending to multiple resources is great, but why use a Map? Is there a rough 
idea how many different resources one may want to encode? It seems an overkill 
to incur so much additional overhead if say all that is needed is a handful of 
more resources. Ditto for encapsulating strings in URIs and the 
ResourceInformation wrapper over doubles. It would perhaps have been okay if 
this datastructure was less often used but if i understand correctly, Resources 
is created/destroyed at least once per ask/ assignment and often many more 
times...

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715892#comment-14715892
 ] 

Hadoop QA commented on YARN-4087:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 58s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 10s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752615/YARN-4087.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f44b599 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/console |


This message was automatically generated.

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715872#comment-14715872
 ] 

Bibin A Chundatt commented on YARN-4087:


So by  default in yarn-default.xml 

yarn.resourcemanager.fail-fast=true
yarn.fail-fast=false

In YarnConfiguration

{code}
  public static boolean shouldRMFailFast(Configuration conf) {
return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST,
conf.getBoolean(YarnConfiguration.YARN_FAIL_FAST,
YarnConfiguration.DEFAULT_YARN_FAIL_FAST));
  }
{code}

some mismatch rt?

No plans to change YarnConfiguration.RM_FAIL_FAST.



> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715829#comment-14715829
 ] 

Karthik Kambatla commented on YARN-4087:


+1, if fail-fast hasn't been in any prior release and we are not drastically 
altering the behavior.

In any case, it would be nice to release note this new behavior for 2.8.0. 

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Attachment: YARN-4087.1.patch

simple patch which flips the config


> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-26 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715809#comment-14715809
 ] 

Li Lu commented on YARN-3816:
-

Hi [~djp], I briefly looked at the patch, and have one quick question: In 
application table, we no longer store the type of the incoming entities, IIUC. 
All entity types from the application table will be added in HBaseReader, as in:
{code}
String entityType = isApplication ?
  TimelineEntityType.YARN_APPLICATION.toString() :
  EntityColumn.TYPE.readResult(result).toString();
{code} 
In this case, maybe we're missing YARN_APPLICATION_AGGREGATION types and we can 
no longer differentiating them? Or, any other ways we can recognize if an 
entity comes from application itself, or from aggregation? (Am I missing 
anything? )

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Summary: Set YARN_FAIL_FAST to be false by default  (was: Set RM_FAIL_FAST 
to be false by default)

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4087) Set RM_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)

Jian He created YARN-4087:
-

 Summary: Set RM_FAIL_FAST to be false by default
 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


Increasingly, I feel setting this property to be false makes more sense 
especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715765#comment-14715765
 ] 

Sangjin Lee commented on YARN-4074:
---

I am about 90% done with the POC patch for this. I'm shooting for some time 
tomorrow to be able to post the patch.

In the meantime, in order to enable [~varun_saxena] and others to make 
progress, the following is the proposal that I'm implementing. Please *do* let 
me know if you have any questions or issues with the proposal so we can adjust 
accordingly.

(REST API)
In order to support the POC UI, we will implement 2 new queries:
# given the cluster, return the N most recent flows from the flow activity table
# given the cluster, user, flow id, and flow run id, return the flow run (with 
metrics) from the flow run table

At the REST level, they can be represented as follows for example:
# /listFlows/clusterId?limit=100
# /flow/clusterId/userId/flowName/flowRun

(UI)
With these URLs, the UI can invoke the first URL to render the landing page 
with the table. The REST output contains the flow activity records along with 
all the flow runs that were active during the day.

If the user drills down on a single flow, then the client side can generate the 
second queries against all the flow runs for that flow to fetch the metrics at 
the flow run level.

If the user further drills down into a single flow run, then it can do a 
(existing) query to retrieve all applications for a given flow run to get the 
application entities.

(reader interface)
Currently I am *not* planning to add new flow-specific methods to the 
{{TimelineReader}} interface. Instead, you can use the existing 
{{getEntities()}} and {{getEntity()}} methods to perform the above new queries:
# {{getEntities()}} with cluster specified and entity type = YARN_FLOW_ACTIVITY 
(a new timeline entity type)
# {{getEntity()}} with cluster, user, flow id, flow run id specified and entity 
type = YARN_FLOW


> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715754#comment-14715754
 ] 

Varun Saxena commented on YARN-3528:


# In TestNodeStatusUpdater#createNMConfig, change has been missed. Still see 
hardcoded port.
{code}
 conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS,
localhostAddress + ":12346");
{code}
# In TestContainer, port is just used for creating container token. Dont need 
to call ServerSocketUtil#getPort.
# Nit : TestNodeManagerShutdown#startContainer, below commented line can be 
removed.
{code}
 //final int port = ServerSocketUtil.getPort(49156, 10);
{code}
# As you will be changing other things, maybe can change below as well.  In 
TestNodeManagerShutdown I dont see any need to add a try-catch block here. We 
have just replaced 12345 with a passed port.
{code}
-InetSocketAddress containerManagerBindAddress =
-NetUtils.createSocketAddrForHost("127.0.0.1", 12345);
+InetSocketAddress containerManagerBindAddress = null;
+try {
+  containerManagerBindAddress = 
NetUtils.createSocketAddrForHost("127.0.0.1", port);
+} catch (Exception e) {
+  throw new RuntimeException("Fail To Get the Port");
+}
{code}

Other things look fine.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715681#comment-14715681
 ] 

Varun Saxena commented on YARN-3528:


Will have a look.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715641#comment-14715641
 ] 

Robert Kanter commented on YARN-3528:
-

+1 LGTM.  
Any other comments [~varun_saxena]?

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-08-26 Thread Ben Podgursky (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715591#comment-14715591
 ] 

Ben Podgursky commented on YARN-2962:
-

Got it.  Thanks for the details.  It sounds like we'll have some workarounds 
available if we do run into trouble, which is hopefully good enough for now.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits

2015-08-26 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715561#comment-14715561
 ] 

Craig Condit commented on YARN-4048:


Just my two cents: Using cgroups on CentOS/RHEL 6.x is asking for it... We've 
experienced similar crashes using anything that utilizes cgroups, not just YARN 
(for example -- docker).

Cgroups is widely regarded as unstable in Linux kernel versions < 3.10 or so.


> Linux kernel panic under strict CPU limits
> --
>
> Key: YARN-4048
> URL: https://issues.apache.org/jira/browse/YARN-4048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chengbing Liu
>Priority: Critical
> Attachments: panic.png
>
>
> With YARN-2440 and YARN-2531, we have seen some kernel panics happening under 
> heavy pressure. Even with YARN-2809, it still panics.
> We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I 
> guess the latest version also has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715545#comment-14715545
 ] 

Hadoop QA commented on YARN-4086:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
4 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 25s | The applied patch generated  3 
new checkstyle issues (total was 23, now 26). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  53m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.cli.TestLogsCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752560/YARN-4086.001.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / a4d9acc |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/console |


This message was automatically generated.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715519#comment-14715519
 ] 

Varun Saxena commented on YARN-2962:


bq. how many applications did you have in the RM store before this became a 
problem
Will have to check. I think it was more than 1 apps in our case. Will let 
you know.

bq. switching the zk max messages size via -Djute.maxbuffer= a viable 
workaround?
Yes, that works. Also we can set a lower config value for number of completed 
apps to be stored in state store. Even 0 can be set.

bq.  Also, is there a sense of how close this ticket is to being merged? 
The patches currently here have to be rebased because of recent changes. Had 
put this on the back burner as this will go in trunk and not in branch-2. If 
its required to be handled earlier, I will focus on it. Plan to take this up in 
the coming month anyways.


> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4077) FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation

2015-08-26 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4077:
---
Component/s: fairscheduler

> FairScheduler Reservation should wait for most relaxed scheduling delay 
> permitted before issuing reservation
> 
>
> Key: YARN-4077
> URL: https://issues.apache.org/jira/browse/YARN-4077
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>
> Today if an allocation has a node local request that allows for relaxation, 
> we do not wait for the relaxation delay before issuing the reservation. This 
> can be too aggressive. Instead we should allow the scheduling delays of 
> relaxation to expire before we choose to allow reserving a node for the 
> container. This allows for the request to be satisfied on a different node 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4076) FairScheduler does not allow AM to choose which containers to preempt

2015-08-26 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4076:
---
Component/s: fairscheduler

> FairScheduler does not allow AM to choose which containers to preempt
> -
>
> Key: YARN-4076
> URL: https://issues.apache.org/jira/browse/YARN-4076
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Capacity scheduler allows for AM to choose which containers will be 
> preempted. See comment about corresponding work pending for FairScheduler 
> https://issues.apache.org/jira/browse/YARN-568?focusedCommentId=13649126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13649126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-08-26 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4086:

Attachment: YARN-4086.001.patch

The YARN-4086.001.patch allows the yarn CLI and web UIs to read aggregated logs 
from HAR files.  It's mostly the same as the prelim patch in MAPREDUCE-6415, 
with some minor changes and unit tests.  The patches for this and 
MAPREDUCE-6415 can be applied independently.

*Important:* For the unit tests, I had to include some HAR files, which are 
basically folders with a few files in them.  One of the files is a binary file, 
which makes generating and applying the patch tricky.  I got it to work by 
generating it with {{git diff --binary > FILE}} and to apply with {{git apply 
}}.  The regular {{patch}} command won't work and it has to be {{-p1}} 
and not {{-p0}}.  I'm not sure if Jenkins will be able to handle this.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-08-26 Thread Robert Kanter (JIRA)

Robert Kanter created YARN-4086:
---

 Summary: Allow Aggregated Log readers to handle HAR files
 Key: YARN-4086
 URL: https://issues.apache.org/jira/browse/YARN-4086
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter


This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715429#comment-14715429
 ] 

Allen Wittenauer edited comment on YARN-4084 at 8/26/15 8:10 PM:
-

So then "mvn compile" is actually what you want (I think) :)


was (Author: aw):
So then "mvn compile" is actually what you want

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715429#comment-14715429
 ] 

Allen Wittenauer commented on YARN-4084:


So then "mvn compile" is actually what you want

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715393#comment-14715393
 ] 

Hadoop QA commented on YARN-4082:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  1s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 33s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 32s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 23  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 51s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 31s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m  1s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752539/YARN-4082.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/console |


This message was automatically generated.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch, YARN-4082.2.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-08-26 Thread Ben Podgursky (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715342#comment-14715342
 ] 

Ben Podgursky commented on YARN-2962:
-

Hi,

We're looking at switching to a HA RM and I'm a bit concerned about this 
ticket, since we have a very active RM.  Couple questions for those who 
encountered the bug:

- how many applications did you have in the RM store before this became a 
problem?
- was switching the zk max messages size via -Djute.maxbuffer= a viable 
workaround?

Also, is there a sense of how close this ticket is to being merged?  Thanks,

Ben

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715341#comment-14715341
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  25m 15s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  1s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 43s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 21s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 compilation is broken. |
| {color:green}+1{color} | findbugs |   6m 21s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 17s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   0m 12s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   0m 19s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 13s | Tests failed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests |   0m 13s | Tests failed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   0m 18s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  60m 41s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-api |
|   | hadoop-yarn-client |
|   | hadoop-yarn-common |
|   | hadoop-yarn-server-applicationhistoryservice |
|   | hadoop-yarn-server-common |
|   | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752534/YARN-3717.20150826-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / a4d9acc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/console |


This message was automatically generated.

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Ved Prakash Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715338#comment-14715338
 ] 

Ved Prakash Pandey commented on YARN-4084:
--

I realize that my patch enforces to use the 
{{-Penable-yarn-server-test-module}} option to have normal builds. This is my 
bad. Rather, I will provide a patch tomorrow which have the switch like 
{{-Pdisable-yarn-server-test-module}} using which hadoop-yarn-server-test 
project can be skipped from the build. 

Please let me know if it sounds ok !!!



> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Ved Prakash Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715306#comment-14715306
 ] 

Ved Prakash Pandey commented on YARN-4084:
--

Thanks for reply Allen. 

Actually I am using both  -DskipTests in addition to -Dmaven.test.skip=true. 
The problem comes when I use -Dmaven.test.skip=true which restricts the test 
code compilation. For this comment sake let me allow to tell that 
maven.test.skip option skip both the test code compilation and test case 
execution whereas skipTests skips only execution (not compilation).

This may sound unethical to compile source code without compiling test source 
code and in-fact for open source community this may never be a scenario. But I 
ran into one requirement wherein my Continuous Integration environment, I have 
to make a complete build as fast as possible where every minute counts. In such 
a case disabling the test code compilation saves close to 3 to 4 minutes.


> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-26 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Attachment: YARN-4082.2.patch

Attached .2 patch, fixed findbugs warnings.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch, YARN-4082.2.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-26 Thread Varun Vasudev (JIRA)

Varun Vasudev created YARN-4085:
---

 Summary: Generate file with container resource limits in the 
container work dir
 Key: YARN-4085
 URL: https://issues.apache.org/jira/browse/YARN-4085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor


Currently, a container doesn't know what resource limits are being imposed on 
it. It would be helpful if the NM generated a simple file in the container work 
dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-26 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150826-1.patch

Fixing reported testcase failure and locally ran findbugs and dint find any 
issues induced by the code

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715222#comment-14715222
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745158/YARN-3635.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8918/console |


This message was automatically generated.

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715208#comment-14715208
 ] 

Vrushali C commented on YARN-4074:
--

My take is that we can make things as generic as possible but we should have 
separate apis for flows and flow runs. 

I had put up an initial proposal for flow based queries in ATS when we started 
off on this at 
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx
 

I believe for the two queries you have listed above [~sjlee0], there would be 
two rest apis as:

1) Get All Flows
Path: /listFlows//
Returns: paginated list of apps with aggregated stats (to populate the flows 
list tab on the UI)
Sample URL:
http://timelineservice.example.com/ws/v2/listFlows/clusterid?limit=2&&startTime=20140510&endTime=20140601
This would be an UI related aggregation query

2) Get specific Flow's runs
Path: /flow[version]
Returns: list of flows
Sample URL: 
http://timelineservice.example.com/ws/v2/flow/clusterid/userName/someFlowName_idenitying_a_flow?limit=2&&startTime=1390939248000&endTime=139361764800




> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-26 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715192#comment-14715192
 ] 

Vrushali C commented on YARN-4053:
--


The way I see this, it comes down to a basic question of whether we really 
*need* floating point precision in metric values. For instance, cost is a 
metric which could have a decimal value upon calculation. But, in my opinion 
say a cost of 5 dollars versus 5.347891 dollars versus a cost of 5.78913 are 
not that different. A cost of 6.x dollars is different from 5.x.  I believe 
that it does not matter THAT much that cost is 5.347891 or 5.79813.  These are 
hadoop applications, the time duration is rarely going to be exactly consistent 
for the exactly same code. So metrics will usually have a slight fluctuation 
between different runs of the exact same job. 

Storage and querying of Longs is straightforward and clean. No ambiguity in 
serialization. 

Contrasting that with storage of various numerical data types in metrics:
- all the complexity of storing of column prefixes that can tell us which type 
is stored so that serialization to/from hbase can be done correctly.
- the filtering in hbase becomes so much more complicated with all these 
different datatypes.





> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-26 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714416#comment-14714416
 ] 

Junping Du commented on YARN-3816:
--

Thanks [~varun_saxena] for review and comments!
bq. If we use same scheme for long or double, we may end up with 4 ORs' for a 
single metric. Maybe we can use cell tags for aggregation.
That's good point! When I was doing poc patch a few weeks ago, YARN-4053 
haven't been bring out to discussion so I thought it was a little overkill to 
use cell tag for specifying the only boolean value. Now it seems to be a good 
way, but I would prefer to defer this decision to YARN-4053 to address while 
there are other priority comments to address here so we can move faster. What 
do you think?

bq. Maybe in TimelineCollector#aggregateMetrics, we should do aggregation only 
if the flag is enabled.
That's true. That's part of reason why aggregation flag is added to metric. 
Will add check in next patch.

bq. In TimelineCollector#appendAggregatedMetricsToEntities any reason we are 
creating separate TimelineEntity objects for each metric ? Maybe create a 
single entity containing a set of metrics.
Nice catch.

bq. 3 new maps have been introduced in TimelineCollector and these are used as 
base to calculate aggregated value. What if the daemon crashes?
For RM, it could persistent maps to RMStateStore. For NM, it may not be enough 
as NM could be lost also. We need a mechanism that if TimelineCollector is 
relaunched somewhere else, it will read raw metrics and recover the maps before 
start to working. This will be part of failed over JIRAs like: YARN-3115, 
YARN-3359, etc.

bq. In TimelineMetricCalculator some functions have duplicate if conditions for 
long.
Fixed.

bq. In TimelineMetricCalculator#sum, to avoid negative values due to overflow, 
we can change conditions like below...
Like above comments, the overflow case will be handled in next patch.

bq. In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB?
I would prefer to use TreeMap because it sort key (timestamp) when accessing 
it. aggregateTo() algorithm assume metrics are sorted by timestamp.

bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some 
util class. TimelineMetric is part of object model and exposed to client. And 
IIUC aggregateTo wont be called by client.
As Li's mentioning below, it is a bit tricky to have utility class for any 
classes in API, because it would mislead user to use it which is not our 
intension, at least for now. aggregateTo is not straighfoward and generic 
useful like methods in TimelineMetricCalculator, so let's hold on to expose it 
as utility class for now. Make it static sounds good though.

bq. What is EntityColumnPrefix#AGGREGATED_METRICS meant for?
It is something developed at poc stage a few weeks ago, and it should be 
removed after we moving to ApplicationTable.

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713541#comment-14713541
 ] 

Allen Wittenauer edited comment on YARN-4084 at 8/26/15 2:36 PM:
-

Use -DskipTests in addition to -Dmaven.test.skip=true


was (Author: aw):
Use -PskipTests in addition to -Dmaven.test.skip=true

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713541#comment-14713541
 ] 

Allen Wittenauer commented on YARN-4084:


Use -PskipTests in addition to -Dmaven.test.skip=true

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4029) Update LogAggregationStatus to store on finish

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713076#comment-14713076
 ] 

Bibin A Chundatt commented on YARN-4029:


Hi [~xgong]

Could you please review patch attached .
Also can we add this jira as subtask of YARN-431?

> Update LogAggregationStatus to store on finish
> --
>
> Key: YARN-4029
> URL: https://issues.apache.org/jira/browse/YARN-4029
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4029.patch, Image.jpg
>
>
> Currently the log aggregation status is not getting updated to Store. When RM 
> is restarted will show NOT_START. 
> Steps to reproduce
> 
> 1.Submit mapreduce application
> 2.Wait for completion
> 3.Once application is completed switch RM
> *Log Aggregation Status* are changing
> *Log Aggregation Status* from SUCCESS to NOT_START



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713002#comment-14713002
 ] 

Bibin A Chundatt commented on YARN-3893:


Above comments are for 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713001#comment-14713001
 ] 

Bibin A Chundatt commented on YARN-3893:


Test failures are not related to this patch. Have looked into the failed 
testcases 

{{hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens}} - Due Bind 
exception 
{{hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService}}
 - Locally verified its working fine and success
{{hadoop.yarn.server.resourcemanager.TestClientRMService}} -Ran locally in 
eclipse its working fine

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712999#comment-14712999
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 39s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752437/0008-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712984#comment-14712984
 ] 

Brahma Reddy Battula commented on YARN-3528:


Testcase failures are unrelated..{{TestResourceLocalizationService}} is failing 
while cleanup dir's..

{noformat}
Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.205 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.275 sec  <<< ERROR!
java.lang.IllegalArgumentException: 
target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/3/filecache/10
 does not exist
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
{noformat}

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712975#comment-14712975
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 48s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 48s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   7m 27s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 46s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752441/YARN-3528-006.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/console |


This message was automatically generated.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712974#comment-14712974
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 19s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 14s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752434/0007-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712942#comment-14712942
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m  9s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752428/0006-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712931#comment-14712931
 ] 

Brahma Reddy Battula commented on YARN-3528:


[~rkanter] Sorry for delay and thanks for pinging.. Attached the patch kindly 
review...

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-006.patch

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712896#comment-14712896
 ] 

Varun Saxena commented on YARN-3893:


The latest patch, 0008-YARN-3893.patch LGTM.
+1 pending Jenkins.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0008-YARN-3893.patch

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0007-YARN-3893.patch

Missed one comment {{isRMActive}} check is not required.Attaching patch again

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0006-YARN-3893.patch

So JVM exit is the conclusion after discussion.
Attaching patch based on the same

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712767#comment-14712767
 ] 

Varun Saxena commented on YARN-3893:


Yes I agree. We can exit JVM directly. No need of using fail fast.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712745#comment-14712745
 ] 

Sunil G commented on YARN-3893:
---

As I see this, JVM exit is reasonable as proposed by Rohith earlier. Because 
scheduler configurations are wrong mostly, and its not required to switch to 
standby or fail-fast etc. Directly if we can exit JVM, it will be clean and 
there will be enough information available in logs to analyze for config fail 
reasons. 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-26 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712700#comment-14712700
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~vinodkv] [~zxu] could you check the latest patches?

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712672#comment-14712672
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 18s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 31s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   7m 44s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  53m 29s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 116m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752399/YARN-2884-V11.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/console |


This message was automatically generated.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712637#comment-14712637
 ] 

Varun Saxena commented on YARN-3893:


Infact according to me, we can crash RM on all times if config is wrong. 
Because till config is corrected, the RM where config is wrong cannot become 
active(and hence will be unusable). In that case, fail fast config wont even be 
required. So should we change the behavior to keep RM in standby(but up) if 
fail fast is set to false ? Anyways can discuss more in detail face to face. 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712619#comment-14712619
 ] 

Varun Saxena commented on YARN-3893:


I do not have any concern for exiting JVM. If fail fast is true(default 
behavior), JVM will exit anyways.

I was wondering if it would be semantically appropriate to make JVM exit in 
some cases if somebody has explicitly changed the fail fast config to false. 
Logs can fill up if yarn-site.xml is wrong on both RMs' too.

I am not sure about the webapp part though. Does it require client rm service 
to be initialized ? AFAIK, if RM is standby it will hit the webapp filter and 
redirect to other RM(which may be active). Haven't tested UI after applying 
previous patches, so maybe Bibin can tell. If there are some issues with 
webapp, we will have to exit the JVM if transition to standby fails. Because 
there may be no other way out then.
I will discuss further on this with you offline.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

75 matches

Mail list logo