[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555691#comment-14555691
 ] 

Karthik Kambatla commented on YARN-3655:


I would like to take a look at the patch as well.  

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555810#comment-14555810
 ] 

Lars Francke commented on YARN-3594:


I don't think an additional test is needed as it should already be covered by 
existing tests.

 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-22 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555819#comment-14555819
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

Hm.. Got you point.
Is DirectoryCollection class a good place to add newErrorDirs and 
newRepairedDirs ?
So finally this is my understanding: please correct me if I am wrong. 
Def:
newErrorDirs - Dirs which turned bad from localdirs or fulldirs.
newRepairedDirs - Dirs which turned good from errorDirs.
After calling checkLocalizedResources() with localdirs and fulldirs, we can 
call  {code}cleanUpLocalDir(lfs, del, localDir);{code} on newRepairedDirs. 
We will put newErrorDirs to statestore so that when nm restarts it can do a 
cleanup. Also We need to remove them from statestore if they become repaired.



 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3707:
-
Attachment: YARN-3707.1.patch

Attached ver.1 patch.

 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Priority: Blocker
 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556926#comment-14556926
 ] 

Karthik Kambatla commented on YARN-2942:


Thanks for your persistence through the multiple versions of this design, 
Robert. I think we have an actionable plan now, thanks Jason and Vinod for your 
inputs on the JIRA and offline. 

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, 
 ConcatableAggregatedLogsProposal_v5.pdf, 
 ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556946#comment-14556946
 ] 

Karthik Kambatla commented on YARN-3705:


Yarn follows HDFS behavior here. Force-manual is discouraged and messes with 
automatic failover. I understand the proposed improvement would help with this, 
but would like to make sure the behavior is consistent across HDFS and Yarn. 

 forcemanual transition of RM active/standby state in automatic-failover mode 
 should change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki

 Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin 
 -transitionToActive --forcemanual}} in automatic-failover.enabled mode 
 changes the active/standby state of ResouceManager while keeping the state of 
 ActiveStandbyElector. It should make elector to quit and rejoin otherwise 
 forcemanual transition should not be allowed in automatic-failover mode in 
 order to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556899#comment-14556899
 ] 

Hadoop QA commented on YARN-3632:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 32s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 28s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 
new checkstyle issues (total was 237, now 238). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 6  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  60m 26s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 39s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734921/YARN-3632.7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f346383 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8061/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8061/console |


This message was automatically generated.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined

2015-05-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:

Attachment: ConcatableAggregatedLogsProposal_v8.pdf

Uploaded v8 doc based on the latest discussions.  

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, 
 ConcatableAggregatedLogsProposal_v5.pdf, 
 ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556927#comment-14556927
 ] 

Hadoop QA commented on YARN-3707:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  50m  1s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m  6s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734932/YARN-3707.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f346383 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8062/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8062/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8062/console |


This message was automatically generated.

 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-22 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556868#comment-14556868
 ] 

MENG DING commented on YARN-1197:
-

So to summarize the current dilemma:

Situation:
- A container resource increase request has been granted, and a token has been 
issued to AM, and
- The increase action has not been fulfilled, and the token is not expired yet

Problem:
- AM can initiate a container resource decrease action to NM, and NM will 
fulfill it and notify RM, and then
- Before the toke expires, AM can still initiate a container resource increase 
action to NM with the token, and NM will fulfill it and notify RM

Proposed solution:
- When RM receives a container decrease message from NM, it will first check if 
there is an outstanding container increase action (by checking the 
ContainerResourceIncreaseExpirer)
- If the answer is no, RM will go ahead and update its internal resource 
bookkeeping and reduce the container resource allocation for this container.
- If the answer is yes, RM will skip the resource reduction in this cycle, keep 
the resource decrease message in its newlyDecreasedContainers data structure, 
and check again in the next NM-RM heartbeat cycle.
- If in the next heartbeat, a resource increase message to the same container 
comes, the previous resource decrease message will be dropped.

Not sure if there are better solution to this problem. Let me know if this 
makes sense or not.

Thanks,
Meng


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556914#comment-14556914
 ] 

Wangda Tan commented on YARN-3632:
--

bq. No, I want to avoid any possible interleaving of locks between the 
application and the queue, getting the ordering policy locks the queue briefly 
and this should not happen inside an application lock.
Makes sense to me, I think both are fine to me.

bq. The demand is being updated for that queue, I think the naming is clear 
enough.
I'm OK with this. 

Any other comments? [~jianhe].

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2238:
--
Attachment: YARN-2238.patch

Uploaded a patch to fix the filter problem.
Since the app table is actually shared on app page and scheduler page, the main 
idea is to not preserve the filter state of the data table. 

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-22 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556803#comment-14556803
 ] 

MENG DING commented on YARN-1197:
-

Well, I think I spoke too soon :-)

The example I gave above is not entirely correct:

1. A container is currently using 6G
2. AM asks RM to increase it to 8G
3. RM grants the increase request, allocates the resource to the container to 
8G, and issues a token to AM. It starts a timer and remembers the original 
resource allocation before the increase as 6G.
4. AM, instead of initiating the resource increase to NM, requests a resource 
decrease to NM to decrease it to 4G
5. The decrease is successful and RM gets the notification, and updates the 
container resource to 4G
6. Before the token expires, the AM requests the resource increase to NM
7 The increase is successful and RM gets the notification, and updates the 
container resource back to 8G

Step 6 and 7 should not be allowed because the RM has already reduced the 
container resource to 4G, which effectively invalidated the previous granted 
increase request (8G), even though the token has not yet expired. 




 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556814#comment-14556814
 ] 

Craig Welch commented on YARN-3632:
---

bq. {code} if (application.updateResourceRequests(ask)) { } {code}

No, I want to avoid any possible interleaving of locks between the application 
and the queue, getting the ordering policy locks the queue briefly and this 
should not happen inside an application lock.

bq. {code} updateDemandForQueue {code}

The demand is being updated for that queue, I think the naming is clear enough.



 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556877#comment-14556877
 ] 

Hudson commented on YARN-3701:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7896 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7896/])
YARN-3701. Isolating the error of generating a single app report when (xgong: 
rev 455b3acf0e82b214e06bd7b538968252945cd3c4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


 Isolating the error of generating a single app report when getting all apps 
 from generic history service
 

 Key: YARN-3701
 URL: https://issues.apache.org/jira/browse/YARN-3701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.1

 Attachments: YARN-3701.1.patch


 Nowadays, if some error of generating a single app report when getting the 
 application list from generic history service, it will throw the exception. 
 Therefore, even if it just 1 out of 100 apps has something wrong, the whole 
 app list is screwed. The worst impact is making the default page (app list) 
 of GHS web UI crash, wile REST API /applicationhistory/apps will also break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service

2015-05-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556701#comment-14556701
 ] 

Xuan Gong commented on YARN-3701:
-

Make sense. For getAttempts and get Containers, probably, instead of throwing 
the exception, we could create a blank page and say the attempt/container does 
not exist. We could do it separately. The patch for this ticket is good enough.

+1. Will commit shortly

 Isolating the error of generating a single app report when getting all apps 
 from generic history service
 

 Key: YARN-3701
 URL: https://issues.apache.org/jira/browse/YARN-3701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3701.1.patch


 Nowadays, if some error of generating a single app report when getting the 
 application list from generic history service, it will throw the exception. 
 Therefore, even if it just 1 out of 100 apps has something wrong, the whole 
 app list is screwed. The worst impact is making the default page (app list) 
 of GHS web UI crash, wile REST API /applicationhistory/apps will also break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556767#comment-14556767
 ] 

Craig Welch commented on YARN-3632:
---


bq. 1) ...

done

bq. 2) ...

done

bq. 3) ...

done

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556807#comment-14556807
 ] 

Wangda Tan commented on YARN-3632:
--

Thanks update [~cwelch],
The only comment from my side is, you can still simplify CapacityScheduler 
changes a little bit, in 
{code}
if (application.updateResourceRequests(ask)) {
}
{code}
You can simply get queue, and call demandUpdated within the if {...} block, you 
don't need save the allocation as well as queue object outside of the 
synchronized block, correct?
And the name {{updateDemandForQueue}} seems like a boolean, maybe renamed it to 
leafQueue should be clear enough.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556791#comment-14556791
 ] 

Zhijie Shen commented on YARN-3700:
---

[~xgong], thanks for the patch. Some comments bellow:

1. Actually not webapp will be affected by this config. REST API and 
application history protocol will be too. Can we rename it to 
yarn.timeline-service.generic-application-history.max-applications ?
{code}
1450  /** Defines how many applications can be loaded into timeline service 
web ui.*/
1451  public static final String TIMELINE_SERVICE_WEBAPP_MAX_APPS =
1452  TIMELINE_SERVICE_PREFIX + webapp.max-applications;
{code}

2. In TestApplicationHistoryClientService, would you please validate if the 
applications are retrieved in descending order according to submission time (if 
I remember it correctly).

3. I'm thing if it is good support override the default config by passing the 
query param in url, such as ?max-applications=100. Thoughts?

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3707:
--
Assignee: Wangda Tan

 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service

2015-05-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556872#comment-14556872
 ] 

Xuan Gong commented on YARN-3701:
-

Committed into trunk/branch-2/branch-2.7. Thanks, zhijie

 Isolating the error of generating a single app report when getting all apps 
 from generic history service
 

 Key: YARN-3701
 URL: https://issues.apache.org/jira/browse/YARN-3701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3701.1.patch


 Nowadays, if some error of generating a single app report when getting the 
 application list from generic history service, it will throw the exception. 
 Therefore, even if it just 1 out of 100 apps has something wrong, the whole 
 app list is screwed. The worst impact is making the default page (app list) 
 of GHS web UI crash, wile REST API /applicationhistory/apps will also break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3632:
-
Target Version/s: 2.8.0

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556938#comment-14556938
 ] 

Wangda Tan commented on YARN-3707:
--

Failed test is tracked by https://issues.apache.org/jira/browse/YARN-2871

 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556956#comment-14556956
 ] 

Jian He commented on YARN-3707:
---

looks good, +1

 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2238:
--
Attachment: YARN-2238.patch

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2238:
--
Attachment: (was: YARN-2238.patch)

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3676) Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests

2015-05-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3676:
--
Attachment: YARN-3676.5.patch

 Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL 
 resource requests
 

 Key: YARN-3676
 URL: https://issues.apache.org/jira/browse/YARN-3676
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Arun Suresh
Assignee: Arun Suresh
 Attachments: YARN-3676.1.patch, YARN-3676.2.patch, YARN-3676.3.patch, 
 YARN-3676.4.patch, YARN-3676.5.patch


 AssignMultiple is generally set to false to prevent overloading a Node (for 
 eg, new NMs that have just joined)
 A possible scheduling optimization would be to disregard this directive for 
 apps whose allowed locality is NODE_LOCAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-05-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-314:
-

Assignee: Karthik Kambatla

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: yarn-314-prelim.patch


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557004#comment-14557004
 ] 

Jian He commented on YARN-3632:
---

looks good to me too.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557007#comment-14557007
 ] 

Xuan Gong commented on YARN-3700:
-

upload a new patch to Address all the latest comments

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557005#comment-14557005
 ] 

Karthik Kambatla commented on YARN-3655:


Oh, and I found it hard to understand the test. Can we add some documentation 
to clarify what the test is doing? We should essentially test the following:
# Container gets reserved when not over maxAMShare
# Container doesn't get reserved when over maxAMShare
# If the maxAMShare were to go down due to fairshare going down, container gets 
unreserved. 

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557052#comment-14557052
 ] 

Hadoop QA commented on YARN-3700:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 43s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   3m  2s | Tests failed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| | |  47m 25s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryManagerOnTimelineStore
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734964/YARN-3700.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 446d515 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8064/console |


This message was automatically generated.

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557022#comment-14557022
 ] 

Xuan Gong commented on YARN-2238:
-

Test locally, the patch works fine.

+1 LGTM. Will commit later if no objection.

[~Naganarasimha] [~sjlee0] Could you take a look at this patch, too ?

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2238:
--
Target Version/s: 2.7.1  (was: 2.8.0)

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557096#comment-14557096
 ] 

Hadoop QA commented on YARN-3700:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 44s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m  9s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| | |  47m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734975/YARN-3700.2.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 446d515 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8065/console |


This message was automatically generated.

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557011#comment-14557011
 ] 

Hadoop QA commented on YARN-2238:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  1 
new checkstyle issues (total was 18, now 18). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| | |  38m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734957/YARN-2238.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 446d515 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8063/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8063/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8063/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8063/console |


This message was automatically generated.

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state

2015-05-22 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557067#comment-14557067
 ] 

Masatake Iwasaki commented on YARN-3705:


Thanks for the comment, [~kasha]. I expected that ZKFailoverController (running 
as external process of NameNode) detects the status change of NameNode and 
quitElection in HDFS case.. I will check the behaviour in HDFS again.

 forcemanual transition of RM active/standby state in automatic-failover mode 
 should change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki

 Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin 
 -transitionToActive --forcemanual}} in automatic-failover.enabled mode 
 changes the active/standby state of ResouceManager while keeping the state of 
 ActiveStandbyElector. It should make elector to quit and rejoin otherwise 
 forcemanual transition should not be allowed in automatic-failover mode in 
 order to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3700:

Attachment: YARN-3700.2.1.patch

fix the testcase failure

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557003#comment-14557003
 ] 

Karthik Kambatla commented on YARN-3655:


Comments on the patch:
# okToUnreserve 
## It was a little hard to wrap my head around. Can we negate it and call it 
{{isValidReservation(FSSchedulerNode)}}? 
## Can we get rid of the if-else and have a simple {{return hasContainerForNode 
 fitsInMaxShare  !isOverAMShareLimit}}?
# Add an {{if (isValidReservation)}} check in {{FSAppAttempt#reserve}} so all 
the reservation logic stays in one place? 
# In {{FSAppAttempt#assignContainer(node, request, nodeType, reserved)}}, 
## We can get rid of the fitsInMaxShare check immediately preceding the call to 
{{reserve}}.
## Given {{if (fitsIn(capability, available))}}-block ends in return, we don't 
need to put the continuation in else. 
# While adding this check in {{FSAppAttempt#assignContainer(node)}} might work 
in practice, it somehow feels out of place. Also, assignReservedContainer could 
also lead to a reservation? 
# Instead of calling {{okToUnreserve}}/{{!isValidReservation}} in 
{{FairScheduler#attemptScheduling}}, we should likely add it as the first check 
in {{FSAppAttempt#assignReservedContainer}}.
# Looks like assign-multiple is broken with reserved-containers. The while-loop 
for assign-multiple should look at both reserved and un-reserved containers 
assigned. Can we file a follow-up JIRA to fix this?  

 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation 
 -

 Key: YARN-3655
 URL: https://issues.apache.org/jira/browse/YARN-3655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3655.000.patch, YARN-3655.001.patch, 
 YARN-3655.002.patch


 FairScheduler: potential livelock due to maxAMShare limitation and container 
 reservation.
 If a node is reserved by an application, all the other applications don't 
 have any chance to assign a new container on this node, unless the 
 application which reserves the node assigns a new container on this node or 
 releases the reserved container on this node.
 The problem is if an application tries to call assignReservedContainer and 
 fail to get a new container due to maxAMShare limitation, it will block all 
 other applications to use the nodes it reserves. If all other running 
 applications can't release their AM containers due to being blocked by these 
 reserved containers. A livelock situation can happen.
 The following is the code at FSAppAttempt#assignContainer which can cause 
 this potential livelock.
 {code}
 // Check the AM resource usage for the leaf queue
 if (!isAmRunning()  !getUnmanagedAM()) {
   ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests();
   if (ask.isEmpty() || !getQueue().canRunAppAM(
   ask.get(0).getCapability())) {
 if (LOG.isDebugEnabled()) {
   LOG.debug(Skipping allocation because maxAMShare limit would  +
   be exceeded);
 }
 return Resources.none();
   }
 }
 {code}
 To fix this issue, we can unreserve the node if we can't allocate the AM 
 container on the node due to Max AM share limitation and the node is reserved 
 by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs

2015-05-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3700:

Attachment: YARN-3700.2.patch

 ATS Web Performance issue at load time when large number of jobs
 

 Key: YARN-3700
 URL: https://issues.apache.org/jira/browse/YARN-3700
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3700.1.patch, YARN-3700.2.patch


 Currently, we will load all the apps when we try to load the yarn 
 timelineservice web page. If we have large number of jobs, it will be very 
 slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3632:
--
Attachment: YARN-3632.7.patch

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3676) Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556858#comment-14556858
 ] 

Hadoop QA commented on YARN-3676:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  49m 56s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  86m  8s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734915/YARN-3676.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f346383 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8060/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8060/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8060/console |


This message was automatically generated.

 Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL 
 resource requests
 

 Key: YARN-3676
 URL: https://issues.apache.org/jira/browse/YARN-3676
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Arun Suresh
Assignee: Arun Suresh
 Attachments: YARN-3676.1.patch, YARN-3676.2.patch, YARN-3676.3.patch, 
 YARN-3676.4.patch, YARN-3676.5.patch


 AssignMultiple is generally set to false to prevent overloading a Node (for 
 eg, new NMs that have just joined)
 A possible scheduling optimization would be to disregard this directive for 
 apps whose allowed locality is NODE_LOCAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556922#comment-14556922
 ] 

Craig Welch commented on YARN-3632:
---

BTW, the whitespace and checkstyle look to be unimportant, the javac unrelated, 
and TestNodeLabelContainerAllocation passes fine for me with the patch so it is 
also unrelated.

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, 
 YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2238) filtering on UI sticks even if I move away from the page

2015-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2238:
-

Assignee: Jian He

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Jian He
  Labels: usability
 Attachments: YARN-2238.patch, filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556963#comment-14556963
 ] 

Hudson commented on YARN-3707:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7897 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7897/])
YARN-3707. RM Web UI queue filter doesn't work. Contributed by Wangda Tan 
(jianhe: rev 446d51591e6e99cc60a85c4b9fbac379a8caa49d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


 RM Web UI queue filter doesn't work
 ---

 Key: YARN-3707
 URL: https://issues.apache.org/jira/browse/YARN-3707
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.8.0

 Attachments: YARN-3707.1.patch


 It cannot filter queue under root, it looks like YARN-3362 causes this issue. 
 It changed .q field so that queue filter cannot get correct queue name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-05-22 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557149#comment-14557149
 ] 

Joep Rottinghuis commented on YARN-3706:


[~vrushalic] if we convert column names to lower case (to ensure that they are 
lower, in the face of potential later changes), should we also convert _all_ 
column names to lower case and cleanse them from separators right?

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor

 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-05-22 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557155#comment-14557155
 ] 

Vrushali C commented on YARN-3706:
--

Hmm, yes, that would be a good thing to do. It would be a one way conversion 
though, which should be fine but we need to be aware of, while generating the 
response. 
We don't do that in hraven for metrics and config and I think similarly in the 
YARN-3411 patch, we are not doing that for config and metrics. 


 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Assignee: Joep Rottinghuis
Priority: Minor

 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555964#comment-14555964
 ] 

Junping Du commented on YARN-3594:
--

+1. Patch LGTM. Committing this in.

 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-05-22 Thread Priyank Rastogi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyank Rastogi updated YARN-3543:
--
Labels:   (was: BB2015-05-TBR)

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2832) Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors

2015-05-22 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved YARN-2832.
-
Resolution: Duplicate

It is fixed as part of YARN-3375, closing as duplicate.

 Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors
 --

 Key: YARN-2832
 URL: https://issues.apache.org/jira/browse/YARN-2832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1, 2.5.1
 Environment: Any environment
Reporter: Tianyin Xu
 Attachments: health.check.service.1.patch


 NodeManager allows users to specify the health checker script that will be 
 invoked by the health-checker service via the configuration parameter, 
 _yarn.nodemanager.health-checker.script.path_ 
 During the _serviceInit()_ of the health-check service, NM checks whether the 
 parameter is set correctly using _shouldRun()_, as follows,
 {code:title=/* NodeHealthCheckerService.java */|borderStyle=solid}
   protected void serviceInit(Configuration conf) throws Exception {
 if (NodeHealthScriptRunner.shouldRun(conf)) {
   nodeHealthScriptRunner = new NodeHealthScriptRunner();
   addService(nodeHealthScriptRunner);
 }
 addService(dirsHandler);
 super.serviceInit(conf);
   }
 {code}
 The problem is that if the parameter is misconfigured (e.g., permission 
 problem, wrong path), NM does not have any log message to inform users which 
 could cause latent errors or mysterious problems (e.g., why my scripts does 
 not work?)
 I see the checking and printing logic is put in _serviceStart()_ function in 
 _NodeHealthScriptRunner.java_ (see the following code snippets). However, the 
 logic is very wrong. For an incorrect parameter that does not pass the 
 shouldRun check, _serviceStart()_ would never be called because the 
 _NodeHealthScriptRunner_ instance does not have the chance to be created (see 
 the code snippets above).
 {code:title=/* NodeHealthScriptRunner.java */|borderStyle=solid}
   protected void serviceStart() throws Exception {
 // if health script path is not configured don't start the thread.
 if (!shouldRun(conf)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 ... 
   }  
 {code}
 Basically, I think the checking and printing logic should be put in the 
 serviceInit() in NodeHealthCheckerService instead of serviceStart() in 
 NodeHealthScriptRunner.
 See the attachment for the simple patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-22 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555996#comment-14555996
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

For adding newErrorDirs do we have to create a new protobuf message and 
implement methods for storing and loading in all statestores?


 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-22 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555961#comment-14555961
 ] 

Lavkesh Lahngir commented on YARN-3591:
---

typo:  cleanUpLocalDir(lfs, del, newRepairedDirs);


 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3704) Container Launch fails with exitcode 127 with DefaultContainerExecutor

2015-05-22 Thread Devaraj K (JIRA)
Devaraj K created YARN-3704:
---

 Summary: Container Launch fails with exitcode 127 with 
DefaultContainerExecutor
 Key: YARN-3704
 URL: https://issues.apache.org/jira/browse/YARN-3704
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0, 3.0.0
Reporter: Devaraj K
Priority: Minor


Please find the below NM log when the issue occurs.

{code:xml}
2015-05-22 08:08:53,165 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1432208816246_0930_01_37 is : 127
2015-05-22 08:08:53,166 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: container_1432208816246_0930_01_37 
and exit code: 127
ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1432208816246_0930_01_37
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 127
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=127:
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:456)
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2015-05-22 08:08:53,179 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:262)
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
2015-05-22 08:08:53,180 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container exited with a non-zero exit code 127
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1432208816246_0930_01_37 transitioned from RUNNING to 
EXITED_WITH_FAILURE
2015-05-22 08:08:53,180 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1432208816246_0930_01_37
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-05-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556037#comment-14556037
 ] 

Junping Du commented on YARN-41:


Thanks [~devaraj.k] for updating the patch! Just finish my review.
Two major comments:
{code}
+// the isStopped check is for avoiding multiple unregistrations.
+if (this.registeredWithRM  !this.isStopped  !isNMUnderSupervision()) {
+  unRegisterNM();
+}
{code}
According to discussion above, I think we need to check 
yarn.nodemanager.recovery.enabled as well. Because even NM is under 
supervision, if we disable nm recovery, we should continue to unregister NM to 
RM.

{code}
  .addTransition(NodeState.RUNNING, NodeState.DECOMMISSIONED,
- RMNodeEventType.DECOMMISSION,
+ EnumSet.of(RMNodeEventType.DECOMMISSION, RMNodeEventType.SHUTDOWN),
{code}
So, the node after shutdown will become DECOMMISSIONED as final state? I think 
we don't expect these nodes show in DECOMMISSIONED list. Isn't it? May be we 
should have some new NodeState as SHUTDOWN for this case. This could make 
changes incompatible, at least for behaviors and UI. We may need to mark this 
JIRA as incompatible and document these changes somewhere when patch is done.

Some minor comments:
Add tests for new PB objects UnRegisterNodeManagerRequestPBImpl, 
UnRegisterNodeManagerResponsePBImpl into TestYarnServerApiClasses.java.

{code}
catch (YarnException e) {
+  throw new ServiceException(e);
+} catch (IOException e) {
+  throw new ServiceException(e);
+}
{code}
Better to replace with
{code}
catch (YarnException | IOException e) {
throw new ServiceException(e);
}
{code} 

{code}
+  @Test
+  public void testUnRegisterNodeManager() throws Exception {
+UnRegisterNodeManagerRequest request = recordFactory
+.newRecordInstance(UnRegisterNodeManagerRequest.class);
+assertNotNull(client.unRegisterNodeManager(request));
+
+ResourceTrackerTestImpl.exception = true;
+try {
+  client.unRegisterNodeManager(request);
+  fail(there  should be YarnException);
+} catch (YarnException e) {
+  assertTrue(e.getMessage().startsWith(testMessage));
+} finally {
+  ResourceTrackerTestImpl.exception = false;
+}
+  }
{code}
If other exception get thrown here with wrong message, the test would still 
pass. Isn't it? better to catch all exceptions and check if it is 
YarnException. 

{code}
+  private void unRegisterNM() {
+RecordFactory recordFactory = RecordFactoryPBImpl.get();
+UnRegisterNodeManagerRequest request = recordFactory
+.newRecordInstance(UnRegisterNodeManagerRequest.class);
+request.setNodeId(this.nodeId);
+try {
+  resourceTracker.unRegisterNodeManager(request);
+  LOG.info(Successfully Unregistered the Node with ResourceManager);
+} catch (Exception e) {
+  LOG.warn(Unregistration of Node failed., e);
+}
+  }
{code}
Put nodeId in the log could help in trouble shooting, also add the missing 
period in log.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556000#comment-14556000
 ] 

Hudson commented on YARN-3594:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7892 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7892/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3703) Container Launch fails with exitcode 2 with DefaultContainerExecutor

2015-05-22 Thread Devaraj K (JIRA)
Devaraj K created YARN-3703:
---

 Summary: Container Launch fails with exitcode 2 with 
DefaultContainerExecutor
 Key: YARN-3703
 URL: https://issues.apache.org/jira/browse/YARN-3703
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0, 3.0.0
Reporter: Devaraj K
Priority: Minor


Please find the below NM log when the issue occurs.

{code:xml}
2015-05-21 20:14:53,907 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1432208816246_0225_01_34 is : 2
2015-05-21 20:14:53,908 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: container_1432208816246_0225_01_34 
and exit code: 2
ExitCodeException exitCode=2:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1432208816246_0225_01_34
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 2
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=2:
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:456)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:262)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2015-05-21 20:14:53,910 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
2015-05-21 20:14:53,910 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container exited with a non-zero exit code 2
2015-05-21 20:14:53,911 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1432208816246_0225_01_34 transitioned from RUNNING to 
EXITED_WITH_FAILURE
2015-05-21 20:14:53,911 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1432208816246_0225_01_34
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3477:
-
Labels:   (was: BB2015-05-TBR)

 TimelineClientImpl swallows exceptions
 --

 Key: YARN-3477
 URL: https://issues.apache.org/jira/browse/YARN-3477
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-3477-001.patch, YARN-3477-002.patch


 If timeline client fails more than the retry count, the original exception is 
 not thrown. Instead some runtime exception is raised saying retries run out
 # the failing exception should be rethrown, ideally via 
 NetUtils.wrapException to include URL of the failing endpoing
 # Otherwise, the raised RTE should (a) state that URL and (b) set the 
 original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-05-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556098#comment-14556098
 ] 

Devaraj K commented on YARN-41:
---

Thanks a lot [~djp] for review and for the comments.

bq. So, the node after shutdown will become DECOMMISSIONED as final state? I 
think we don't expect these nodes show in DECOMMISSIONED list. Isn't it? May be 
we should have some new NodeState as SHUTDOWN for this case.

{code:xml}Shouldn't be counting shut-down nodes in LOST. Adding a new state is 
perhaps an over-kill, DECOMMISSIONED is the closest I can think of.{code}

Initial got the above comment from Vinod about the node state. I feel we can 
proceed here with DECOMMISSIONED state as Vinod suggested and also it will not 
create any compatibility issue. Adding a new state and Ui change can be done as 
per your suggestions with the following up jira and can be implemented for 
trunk without causing the compatibility issue in 2.x versions. Please post your 
comments on this.


bq. If other exception get thrown here with wrong message, the test would still 
pass. Isn't it? better to catch all exceptions and check if it is YarnException.
I don't see any problem with this test. If any other exception gets here then 
that exception would be directly thrown from the test and anyway test fails 
with that exception. Please correct me if I am missing anything here.

 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1391:
-
Labels:   (was: BB2015-05-TBR)

 Lost node list should be identify by NodeId
 ---

 Key: YARN-1391
 URL: https://issues.apache.org/jira/browse/YARN-1391
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch


 in case of multiple node managers on a single machine. each of them should be 
 identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1426:
-
Labels:   (was: BB2015-05-TBR)

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-05-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556170#comment-14556170
 ] 

Junping Du commented on YARN-41:


bq. Initial got the above comment from Vinod about the node state. I feel we 
can proceed here with DECOMMISSIONED state as Vinod suggested and also it will 
not create any compatibility issue.
Sorry. I could miss that comments before. 
OK. I think we are talking about UI and behavior compatibility rather than API 
compatibility. The previous things we can tolerant within major releases but 
the later one we cannot. Both ways doesn't break API compatibility as NodeState 
is marked as UNSTABLE and we were just adding a DECOMMISSIONING state. Isn't it?
The compatibility issue I mentioned above is still there as behavior changes: 
user (and management tools) could feel confused that after this patch, the node 
will show up in the decommission list after normally shutdown. It has been a 
long time that decommission means we want to retire some nodes, so will reject 
the registration of these nodes when they are restarted (unless we explicitly 
remove these nodes from the list), but this is not the case after this patch. I 
agree with Vinod that LOST is pretty far away here, but DECOMMISSIONED is also 
not so ideal I think.
Thoughts?
 


 The RM should handle the graceful shutdown of the NM.
 -

 Key: YARN-41
 URL: https://issues.apache.org/jira/browse/YARN-41
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Ravi Teja Ch N V
Assignee: Devaraj K
 Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
 MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
 YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch


 Instead of waiting for the NM expiry, RM should remove and handle the NM, 
 which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556095#comment-14556095
 ] 

Hudson commented on YARN-3675:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556093#comment-14556093
 ] 

Hudson commented on YARN-3646:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/])
YARN-3646. Applications are getting stuck some times in case of retry (devaraj: 
rev 0305316d6932e6f1a05021354d77b6934e57e171)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java


 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti
Assignee: Raju Bairishetti
 Fix For: 2.7.1

 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch


 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI

2015-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556156#comment-14556156
 ] 

Hadoop QA commented on YARN-1126:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12697127/YARN-905-addendum.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 55ed655 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8056/console |


This message was automatically generated.

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch


 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556083#comment-14556083
 ] 

Hudson commented on YARN-3594:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556094#comment-14556094
 ] 

Hudson commented on YARN-3694:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/])
YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh 
Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072)
* hadoop-yarn-project/CHANGES.txt
* hadoop-project/src/site/site.xml


 Fix dead link for TimelineServer REST API
 -

 Key: YARN-3694
 URL: https://issues.apache.org/jira/browse/YARN-3694
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
Priority: Minor
  Labels: newbie
 Fix For: 2.7.1

 Attachments: YARN-3694.patch


 There is a dead link in the index.
 {code:title=hadoop-project/src/site/site.xml}
   item name=Timeline Server 
 href=TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}
 should be fixed as
 {code}
   item name=Timeline Server 
 href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556092#comment-14556092
 ] 

Hudson commented on YARN-3684:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/])
YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more 
extensible mechanism of context objects. Contributed by Sidharta Seethana. 
(vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 

[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556078#comment-14556078
 ] 

Hudson commented on YARN-3684:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/935/])
YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more 
extensible mechanism of context objects. Contributed by Sidharta Seethana. 
(vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java
* 

[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556081#comment-14556081
 ] 

Hudson commented on YARN-3675:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/935/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556079#comment-14556079
 ] 

Hudson commented on YARN-3646:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/935/])
YARN-3646. Applications are getting stuck some times in case of retry (devaraj: 
rev 0305316d6932e6f1a05021354d77b6934e57e171)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti
Assignee: Raju Bairishetti
 Fix For: 2.7.1

 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch


 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556069#comment-14556069
 ] 

Hudson commented on YARN-3594:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/935/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556080#comment-14556080
 ] 

Hudson commented on YARN-3694:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/935/])
YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh 
Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072)
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Fix dead link for TimelineServer REST API
 -

 Key: YARN-3694
 URL: https://issues.apache.org/jira/browse/YARN-3694
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
Priority: Minor
  Labels: newbie
 Fix For: 2.7.1

 Attachments: YARN-3694.patch


 There is a dead link in the index.
 {code:title=hadoop-project/src/site/site.xml}
   item name=Timeline Server 
 href=TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}
 should be fixed as
 {code}
   item name=Timeline Server 
 href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state

2015-05-22 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created YARN-3705:
--

 Summary: forcemanual transition of RM active/standby state in 
automatic-failover mode should change elector state
 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin 
-transitionToActive --forcemanual}} in automatic-failover.enabled mode changes 
the active/standby state of ResouceManager while keeping the state of 
ActiveStandbyElector. It should make elector to quit and rejoin otherwise 
forcemanual transition should not be allowed in automatic-failover mode in 
order to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI

2015-05-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556118#comment-14556118
 ] 

Junping Du commented on YARN-1126:
--

The patch has been uploaded for long time. Re-kick Jenkins test to see if it 
still applies.

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch


 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1126) Add validation of users input nodes-states options to nodes CLI

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1126:
-
Labels:   (was: BB2015-05-TBR)

 Add validation of users input nodes-states options to nodes CLI
 ---

 Key: YARN-1126
 URL: https://issues.apache.org/jira/browse/YARN-1126
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-905-addendum.patch


 Follow the discussion in YARN-905.
 (1) case-insensitive checks for all.
 (2) validation of users input, exit with non-zero code and print all valid 
 states when user gives an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-22 Thread Raju Bairishetti (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raju Bairishetti reassigned YARN-3644:
--

Assignee: Raju Bairishetti

 Node manager shuts down if unable to connect with RM
 

 Key: YARN-3644
 URL: https://issues.apache.org/jira/browse/YARN-3644
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Srikanth Sundarrajan
Assignee: Raju Bairishetti

 When NM is unable to connect to RM, NM shuts itself down.
 {code}
   } catch (ConnectException e) {
 //catch and throw the exception if tried MAX wait time to connect 
 RM
 dispatcher.getEventHandler().handle(
 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
 throw new YarnRuntimeException(e);
 {code}
 In large clusters, if RM is down for maintenance for longer period, all the 
 NMs shuts themselves down, requiring additional work to bring up the NMs.
 Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
 effects, where non connection failures are being retried infinitely by all 
 YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1515:
-
Target Version/s:   (was: 2.6.0)

 Provide ContainerManagementProtocol#signalContainer processing a batch of 
 signals 
 --

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, 
 YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556334#comment-14556334
 ] 

Hudson commented on YARN-3675:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2015-05-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556183#comment-14556183
 ] 

Junping Du commented on YARN-1426:
--

Thanks [~jeagles] for delivering the patch! The patch LGTM overall. 
Just one minor issues:
{code}
-  new RMNMInfo(rmContext, scheduler);
+  rmNMInfo = new RMNMInfo(rmContext, scheduler);
+  rmNMInfo.registerMBean();
{code}
I think we may prefer to move rmNMInfo.registerMBean(); to serviceStart() 
instead of serviceInit(). Theoretically, serviceStop() should undo things we 
did in serviceStart(). Isn't it?

 YARN Components need to unregister their beans upon shutdown
 

 Key: YARN-1426
 URL: https://issues.apache.org/jira/browse/YARN-1426
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1743:
-
Labels: documentation  (was: BB2015-05-TBR documentation)

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, 
 YARN-1743-3.patch, YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables

2015-05-22 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556277#comment-14556277
 ] 

Joep Rottinghuis commented on YARN-3706:


[~sjlee0] I've created this jira and will upload a partial initial patch with 
the layout of the code. I can't seen to assign this jira to myself. Could you 
please do so for me? cc [~vrushalic]

 Generalize native HBase writer for additional tables
 

 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Priority: Minor

 When reviewing YARN-3411 we noticed that we could change the class hierarchy 
 a little in order to accommodate additional tables easily.
 In order to get ready for benchmark testing we left the original layout in 
 place, as performance would not be impacted by the code hierarchy.
 Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1515:
-
Labels:   (was: BB2015-05-TBR)

 Provide ContainerManagementProtocol#signalContainer processing a batch of 
 signals 
 --

 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, 
 YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, 
 YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch


 This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
 timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556200#comment-14556200
 ] 

Hudson commented on YARN-3694:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/])
YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh 
Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072)
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Fix dead link for TimelineServer REST API
 -

 Key: YARN-3694
 URL: https://issues.apache.org/jira/browse/YARN-3694
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
Priority: Minor
  Labels: newbie
 Fix For: 2.7.1

 Attachments: YARN-3694.patch


 There is a dead link in the index.
 {code:title=hadoop-project/src/site/site.xml}
   item name=Timeline Server 
 href=TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}
 should be fixed as
 {code}
   item name=Timeline Server 
 href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556188#comment-14556188
 ] 

Hudson commented on YARN-3594:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556226#comment-14556226
 ] 

Hudson commented on YARN-3646:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/])
YARN-3646. Applications are getting stuck some times in case of retry (devaraj: 
rev 0305316d6932e6f1a05021354d77b6934e57e171)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt


 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti
Assignee: Raju Bairishetti
 Fix For: 2.7.1

 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch


 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556215#comment-14556215
 ] 

Hudson commented on YARN-3594:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556228#comment-14556228
 ] 

Hudson commented on YARN-3675:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556330#comment-14556330
 ] 

Hudson commented on YARN-3684:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/])
YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more 
extensible mechanism of context objects. Contributed by Sidharta Seethana. 
(vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 

[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556332#comment-14556332
 ] 

Hudson commented on YARN-3646:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/])
YARN-3646. Applications are getting stuck some times in case of retry (devaraj: 
rev 0305316d6932e6f1a05021354d77b6934e57e171)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java


 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti
Assignee: Raju Bairishetti
 Fix For: 2.7.1

 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch


 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556321#comment-14556321
 ] 

Hudson commented on YARN-3594:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/])
YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. 
Contributed by Lars Francke. (junping_du: rev 
132d909d4a6509af9e63e24cbb719be10006b6cd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


 WintuilsProcessStubExecutor.startStreamReader leaks streams
 ---

 Key: YARN-3594
 URL: https://issues.apache.org/jira/browse/YARN-3594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Lars Francke
Priority: Trivial
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3594.1.patch


 while looking at the file, my IDE highlights that the thread runnables 
 started by 
 {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams 
 as they exit.
 a java7 try-with-resources would trivially fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556333#comment-14556333
 ] 

Hudson commented on YARN-3694:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/])
YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh 
Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072)
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Fix dead link for TimelineServer REST API
 -

 Key: YARN-3694
 URL: https://issues.apache.org/jira/browse/YARN-3694
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
Priority: Minor
  Labels: newbie
 Fix For: 2.7.1

 Attachments: YARN-3694.patch


 There is a dead link in the index.
 {code:title=hadoop-project/src/site/site.xml}
   item name=Timeline Server 
 href=TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}
 should be fixed as
 {code}
   item name=Timeline Server 
 href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1462:
-
Labels:   (was: BB2015-05-TBR)

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, 
 YARN-1462.2.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556197#comment-14556197
 ] 

Hudson commented on YARN-3684:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/])
YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more 
extensible mechanism of context objects. Contributed by Sidharta Seethana. 
(vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 

[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556201#comment-14556201
 ] 

Hudson commented on YARN-3675:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556199#comment-14556199
 ] 

Hudson commented on YARN-3646:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/])
YARN-3646. Applications are getting stuck some times in case of retry (devaraj: 
rev 0305316d6932e6f1a05021354d77b6934e57e171)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java


 Applications are getting stuck some times in case of retry policy forever
 -

 Key: YARN-3646
 URL: https://issues.apache.org/jira/browse/YARN-3646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Raju Bairishetti
Assignee: Raju Bairishetti
 Fix For: 2.7.1

 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch


 We have set  *yarn.resourcemanager.connect.wait-ms* to -1 to use  FOREVER 
 retry policy.
 Yarn client is infinitely retrying in case of exceptions from the RM as it is 
 using retrying policy as FOREVER. The problem is it is retrying for all kinds 
 of exceptions (like ApplicationNotFoundException), even though it is not a 
 connection failure. Due to this my application is not progressing further.
 *Yarn client should not retry infinitely in case of non connection failures.*
 We have written a simple yarn-client which is trying to get an application 
 report for an invalid  or older appId. ResourceManager is throwing an 
 ApplicationNotFoundException as this is an invalid or older appId.  But 
 because of retry policy FOREVER, client is keep on retrying for getting the 
 application report and ResourceManager is throwing 
 ApplicationNotFoundException continuously.
 {code}
 private void testYarnClientRetryPolicy() throws  Exception{
 YarnConfiguration conf = new YarnConfiguration();
 conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, 
 -1);
 YarnClient yarnClient = YarnClient.createYarnClient();
 yarnClient.init(conf);
 yarnClient.start();
 ApplicationId appId = ApplicationId.newInstance(1430126768987L, 
 10645);
 ApplicationReport report = yarnClient.getApplicationReport(appId);
 }
 {code}
 *RM logs:*
 {noformat}
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875162 Retry#0
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1430126768987_10645' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 
 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call 
 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
 from 10.14.120.231:61621 Call#875163 Retry#0
 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1526) change owner before setting setguid

2015-05-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1526:
-
Labels:   (was: BB2015-05-TBR)

 change owner before setting setguid
 ---

 Key: YARN-1526
 URL: https://issues.apache.org/jira/browse/YARN-1526
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
 Environment: BSD
Reporter: Radim Kolar
 Attachments: create-order-chown.txt, create-order.txt


 if nodemgr work directory has copy group from parent flag (often set on 
 /tmp), chmod fails to set gid bit because owners group does not match caller 
 group.
 to chown first, then chmod



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556224#comment-14556224
 ] 

Hudson commented on YARN-3684:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/])
YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more 
extensible mechanism of context objects. Contributed by Sidharta Seethana. 
(vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-05-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556239#comment-14556239
 ] 

Rohith commented on YARN-3543:
--

[~aw] Would you help to understand and resolve build issue? Basically the issue 
what I observe is the patch containes many file changes that includes many 
projects. When the test cases are triggered, it is ignoring the applied patches 
and taking existing class files which causing the compilation failure and other 
issues. But if I apply patch and build , it is successfull.

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3706) Generalize native HBase writer for additional tables

2015-05-22 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created YARN-3706:
--

 Summary: Generalize native HBase writer for additional tables
 Key: YARN-3706
 URL: https://issues.apache.org/jira/browse/YARN-3706
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Joep Rottinghuis
Priority: Minor


When reviewing YARN-3411 we noticed that we could change the class hierarchy a 
little in order to accommodate additional tables easily.
In order to get ready for benchmark testing we left the original layout in 
place, as performance would not be impacted by the code hierarchy.

Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node

2015-05-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556299#comment-14556299
 ] 

Hudson commented on YARN-3675:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/])
YARN-3675. FairScheduler: RM quits when node removal races with 
continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 
4513761869c732cf2f462763043067ebf8749df7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler: RM quits when node removal races with continousscheduling on 
 the same node
 -

 Key: YARN-3675
 URL: https://issues.apache.org/jira/browse/YARN-3675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, 
 YARN-3675.003.patch


 With continuous scheduling, scheduling can be done on a node thats just 
 removed causing errors like below.
 {noformat}
 12:28:53.782 AM FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
   at java.lang.Thread.run(Thread.java:745)
 12:28:53.783 AMINFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >