date:20151007

[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-07 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948161#comment-14948161
 ] 

Varun Saxena commented on YARN-4237:


I have started.
I would have infact updated patch for YARN-4179 yesterday itself but was facing 
some issues with tests. We take system time to set flow's activity time. So in 
that case I cannot write deterministic tests. Maybe i can use powermock to mock 
System.currentTimeMillis which is a static function.

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948134#comment-14948134
 ] 

Rohith Sharma K S commented on YARN-261:


Attached the patch fixing review comments. The attached patch does fail 
attempt. Regarding killing attempt, I tried initial prototyping it but it takes 
some time. I would prefer this to go in and kill attempt take up later in 
different JIRA if required. 
[~jlowe] The attached patch 0002-YARN-261.patch is the latest one. Kindly 
review it

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948124#comment-14948124
 ] 

Rohith Sharma K S commented on YARN-261:


Updated the issue description as per the patch implementation.

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-261) Ability to fail AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-261:
---
Attachment: 0002-YARN-261.patch

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-261) Ability to fail AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-261:
---
Attachment: (was: 0002-YARN-261.patch)

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, 
> YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, 
> YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-261) Ability to fail AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-261:
---
Summary: Ability to fail AM attempts  (was: Ability to kill AM attempts)

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-261) Ability to kill AM attempts

2015-10-07 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-261:
---
Attachment: 0002-YARN-261.patch

> Ability to kill AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-07 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948093#comment-14948093
 ] 

Jun Gong commented on YARN-4201:


Could anyone help review it please?

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948051#comment-14948051
 ] 

Naganarasimha G R commented on YARN-4237:
-

Hi [~varun_saxena], if you already not started on this or YARN-4179 , can i 
work on them ?

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948044#comment-14948044
 ] 

Naganarasimha G R commented on YARN-3367:
-

bq.  There is still no hard guarantee that they will be received by the server 
in the same order.
But as per the current patch If events are queued and sent one by one from the 
client then server should receive in the same order right ?, Basically its not 
multi threaded dispatcher but all events are queued and dispatched in a 
separate thread.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4238) createdTime is not reported while publishing entities to ATSv2

2015-10-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948041#comment-14948041
 ] 

Naganarasimha G R commented on YARN-4238:
-

Hi [~varun_saxena],
Similar to V1 we have already captured these information in 
TimelineServiceV2Publisher too, Its been captured as part of events of those 
entities
If you want it to be as part of Entity i can add it there but IMO it makes more 
sense to be as part of event, as different states of application( basically any 
entity) are captured as events.


> createdTime is not reported while publishing entities to ATSv2
> --
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948032#comment-14948032
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m  2s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   8m 46s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  59m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765505/YARN-3943.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 35affec |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9375/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947996#comment-14947996
 ] 

zhihai xu commented on YARN-3943:
-

Thanks [~jlowe]! yes, the comments are great. Nice catch for the backwards 
compatibility problem! I uploaded a new patch YARN-3943.002.patch, which 
addressed all your comments, Please review it.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.002.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch, 
> YARN-3943.002.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4239) Flow page for Web UI

2015-10-07 Thread Varun Saxena (JIRA)

Varun Saxena created YARN-4239:
--

 Summary: Flow page for Web UI
 Key: YARN-4239
 URL: https://issues.apache.org/jira/browse/YARN-4239
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4238) createdTime is not reported while publishing entities to ATSv2

2015-10-07 Thread Varun Saxena (JIRA)

Varun Saxena created YARN-4238:
--

 Summary: createdTime is not reported while publishing entities to 
ATSv2
 Key: YARN-4238
 URL: https://issues.apache.org/jira/browse/YARN-4238
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Saxena
Assignee: Varun Saxena


While publishing entities from RM and elsewhere we are not sending created 
time. For instance, created time in TimelineServiceV2Publisher class and for 
other entities in other such similar classes is not updated. We can easily 
update created time when sending application created event. Likewise for 
modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-10-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947847#comment-14947847
 ] 

Eric Payne commented on YARN-3769:
--

Investigating test failures and checkstyle warnings

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-07 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4237:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2928

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-07 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947833#comment-14947833
 ] 

Varun Saxena commented on YARN-4237:


- Given a time range and user name, list flow runs in a given flow id. 
- Given a time range and user name, list all flows in a given cluster. Date 
range will be supported in YARN-4179. Do we need to support user ?

> Support additional queries for ATSv2 Web UI
> ---
>
> Key: YARN-4237
> URL: https://issues.apache.org/jira/browse/YARN-4237
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4237) Support additional queries for ATSv2 Web UI

2015-10-07 Thread Varun Saxena (JIRA)

Varun Saxena created YARN-4237:
--

 Summary: Support additional queries for ATSv2 Web UI
 Key: YARN-4237
 URL: https://issues.apache.org/jira/browse/YARN-4237
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947822#comment-14947822
 ] 

Hadoop QA commented on YARN-4235:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  7s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 16s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  56m 36s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765465/YARN-4235.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6dd47d7 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9373/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9373/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9373/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9373/console |


This message was automatically generated.

> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4233) YARN Timeline Service plugin: ATS v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4233:

Description: 
Copy the description from YARN-3942:
This adds a new timeline store plugin that is intended as a stop-gap measure to 
mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. 
The intent of this plugin is to provide a workable solution for running the Tez 
UI against the timeline server on a large-scale clusters running many thousands 
of jobs per day.


> YARN Timeline Service plugin: ATS v1.5
> --
>
> Key: YARN-4233
> URL: https://issues.apache.org/jira/browse/YARN-4233
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Copy the description from YARN-3942:
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2. The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4233) YARN Timeline Service plugin: ATS v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4233:

Description: 
Copy the description from YARN-3942: Timeline store to read events from HDFS
This adds a new timeline store plugin that is intended as a stop-gap measure to 
mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. 
The intent of this plugin is to provide a workable solution for running the Tez 
UI against the timeline server on a large-scale clusters running many thousands 
of jobs per day.


  was:
Copy the description from YARN-3942:
This adds a new timeline store plugin that is intended as a stop-gap measure to 
mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. 
The intent of this plugin is to provide a workable solution for running the Tez 
UI against the timeline server on a large-scale clusters running many thousands 
of jobs per day.



> YARN Timeline Service plugin: ATS v1.5
> --
>
> Key: YARN-4233
> URL: https://issues.apache.org/jira/browse/YARN-4233
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Copy the description from YARN-3942: Timeline store to read events from HDFS
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2. The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4233) YARN Timeline Service plugin: ATS v1.5

2015-10-07 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947699#comment-14947699
 ] 

Lars Francke commented on YARN-4233:


Hi Xuan,
could you add a bit of detail to this issue? I'm trying to understand what's 
going on but don't really know what it's about. Thank you!

> YARN Timeline Service plugin: ATS v1.5
> --
>
> Key: YARN-4233
> URL: https://issues.apache.org/jira/browse/YARN-4233
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4236) Metric for aggregated resources allocation per queue

2015-10-07 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4236:
---
Attachment: YARN-4236.patch

> Metric for aggregated resources allocation per queue
> 
>
> Key: YARN-4236
> URL: https://issues.apache.org/jira/browse/YARN-4236
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4236.patch
>
>
> We currently track allocated memory and allocated vcores per queue but we 
> don't have a good rate metric on how fast we're allocating these things. In 
> other words, a straight line in allocatedmb could equally be one extreme of 
> no new containers are being allocated or allocating a bunch of containers 
> where we free exactly what we allocate each time. Adding a resources 
> allocated per second per queue would give us a better insight into the rate 
> of resource churn on a queue. Based on this aggregated resource allocation 
> per queue we can easily have some tools to measure the rate of resource 
> allocation per queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4236) Metric for aggregated resources allocation per queue

2015-10-07 Thread Chang Li (JIRA)

Chang Li created YARN-4236:
--

 Summary: Metric for aggregated resources allocation per queue
 Key: YARN-4236
 URL: https://issues.apache.org/jira/browse/YARN-4236
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chang Li
Assignee: Chang Li


We currently track allocated memory and allocated vcores per queue but we don't 
have a good rate metric on how fast we're allocating these things. In other 
words, a straight line in allocatedmb could equally be one extreme of no new 
containers are being allocated or allocating a bunch of containers where we 
free exactly what we allocate each time. Adding a resources allocated per 
second per queue would give us a better insight into the rate of resource churn 
on a queue. Based on this aggregated resource allocation per queue we can 
easily have some tools to measure the rate of resource allocation per queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-10-07 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947680#comment-14947680
 ] 

MENG DING commented on YARN-3026:
-

Hi, [~leftnoteasy]

I've got another question while studying this patch. Regarding the following 
code change:

{code}
@@ -1106,6 +958,11 @@ private Resource computeUserLimit(FiCaSchedulerApp 
application,
 queueCapacities.getAbsoluteCapacity(nodePartition),
 minimumAllocation);
 
+// Assume we have required resource equals to minimumAllocation, this can
+// make sure user limit can continuously increase till queueMaxResource
+// reached.
+Resource required = minimumAllocation;
+
{code}

Before this patch, the required resource is passed into the 
{{computeUserLimit}} function as a parameter, indicating the actual resource 
requirement. Now it is always set to minimumAllocation. I understand that the 
leaf queue won't know the required resource any more since the patch moves the 
application specific logic out of the {{LeafQueue.java}}, but the fact that the 
required resource can be set to some arbitrary value seems quite odd.

More specifically, it seems that the *required* resource only applies when the 
queue is over capacity when calculating the userLimit, but I am not sure how 
useful this userLimit is under this circumstance (i.e., over capacity 
situation)? I know it is used to calculate application headroom, but this 
headroom is not being checked for resource allocation (which only checks the 
{{ResourceLimits.headroom}} set in {{AbstractCSQueue.canAssignToThisQueue}}). 

Not sure if I've made myself clear... Thanks in advance for shedding some light 
on this :-)

> Move application-specific container allocation logic from LeafQueue to 
> FiCaSchedulerApp
> ---
>
> Key: YARN-3026
> URL: https://issues.apache.org/jira/browse/YARN-3026
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
> YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch
>
>
> Have a discussion with [~vinodkv] and [~jianhe]: 
> In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
> are located in LeafQueue.java in implementation. To make a cleaner scope of 
> LeafQueue, we'd better move some of them to FiCaSchedulerApp.
> Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
> from ParentQueue (like 15% of cluster resource), and it distributes resources 
> to children apps, and it should be agnostic to internal logic of children 
> apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
> application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-07 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4235:

Attachment: YARN-4235.001.patch

Handle empty groups

> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947641#comment-14947641
 ] 

Jason Lowe edited comment on YARN-3943 at 10/7/15 9:36 PM:
---

Thanks for the patch, [~zxu]!

I think this patch causes backwards compatibility problems for users who have 
increased the max-disk-utilization-per-disk-percentage from the default.  For 
example, if someone changed it to 100 and only wants disks to be treated full 
if they actually fill up completely then this patch will silently make those 
disks unusable once they fill until they free up at least 10% of their space.  
That's not going to be good if the cluster runs the disks over 90% most of the 
time, as it will effectively take the disk out of rotation for a long, long 
time.  A more compatible change would be to _not_ set a value in yarn-default 
for this property.  If the property is set then we use it, but if we get a null 
for it then we know it isn't specified and we should just use the same 
threshold for low as for high.

Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on 
diskUtilizationPercentageCutoffHigh rather than 100.0?

Nit: I think using Math.max / Math.min could make the range capping more 
readable (certainly less redundant with the identifiers and constants), but 
it's not must-fix.

Nit: "Use" should be "Using" in the following log message.  "Use" implies we 
are giving a directive requiring the user to do it, while "Using" states the 
code is doing it for them.
{code}
+  if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) {
+LOG.warn("Use " + YarnConfiguration.
+NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " +
+YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+", because " + YarnConfiguration.
+NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+" is not configured properly.");
{code}



was (Author: jlowe):
Thanks for the patch, [~zxu]!

I think this patch causes backwards compatibility problems for users who have 
increased the max-disk-utilization-per-disk-percentage from the default.  For 
example, if someone changed it to 100 and only wants disks to be treated full 
if they actually fill up completely then this patch will silently make those 
disks unusable once they fill until they free up at least 10% of their space.  
That's not going to be good if the cluster runs the disks over 90% most of the 
time, as it will effectively take the disk out of rotation for a long, long 
time.  A more compatible change would be to _not_ set a value in yarn-default 
for this property.  If the property is set then we use it, but if we get a null 
for it then we know it isn't specified and we shouldn't just use the same 
threshold for low as for high.

Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on 
diskUtilizationPercentageCutoffHigh rather than 100.0?

Nit: I think using Math.max / Math.min could make the range capping more 
readable (certainly less redundant with the identifiers and constants), but 
it's not must-fix.

Nit: "Use" should be "Using" in the following log message.  "Use" implies we 
are giving a directive requiring the user to do it, while "Using" states the 
code is doing it for them.
{code}
+  if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) {
+LOG.warn("Use " + YarnConfiguration.
+NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " +
+YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+", because " + YarnConfiguration.
+NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+" is not configured properly.");
{code}


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For e

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947641#comment-14947641
 ] 

Jason Lowe commented on YARN-3943:
--

Thanks for the patch, [~zxu]!

I think this patch causes backwards compatibility problems for users who have 
increased the max-disk-utilization-per-disk-percentage from the default.  For 
example, if someone changed it to 100 and only wants disks to be treated full 
if they actually fill up completely then this patch will silently make those 
disks unusable once they fill until they free up at least 10% of their space.  
That's not going to be good if the cluster runs the disks over 90% most of the 
time, as it will effectively take the disk out of rotation for a long, long 
time.  A more compatible change would be to _not_ set a value in yarn-default 
for this property.  If the property is set then we use it, but if we get a null 
for it then we know it isn't specified and we shouldn't just use the same 
threshold for low as for high.

Shouldn't we be capping the max of diskUtilizationPercentageCutoffLow based on 
diskUtilizationPercentageCutoffHigh rather than 100.0?

Nit: I think using Math.max / Math.min could make the range capping more 
readable (certainly less redundant with the identifiers and constants), but 
it's not must-fix.

Nit: "Use" should be "Using" in the following log message.  "Use" implies we 
are giving a directive requiring the user to do it, while "Using" states the 
code is doing it for them.
{code}
+  if (lowUsableSpacePercentagePerDisk > highUsableSpacePercentagePerDisk) {
+LOG.warn("Use " + YarnConfiguration.
+NM_MAX_PER_DISK_UTILIZATION_PERCENTAGE + " as " +
+YarnConfiguration.NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+", because " + YarnConfiguration.
+NM_WM_LOW_PER_DISK_UTILIZATION_PERCENTAGE +
+" is not configured properly.");
{code}


> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-07 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947620#comment-14947620
 ] 

Sangjin Lee commented on YARN-3367:
---

{quote}
You mean that flush need not be coupled with the sync call ? if so will we be 
exposing one more interface in client side ? Felt coupling flush with sync was 
a better option.
{quote}

I meant either coupling async with no flush or coupling sync with explicit 
flush. But I vaguely remember [~djp] was of the opinion that sync doesn't 
necessarily imply flush. So this requires a more discussion.

{quote}
Agree with your points only concerned what were issues/points Junping Du had, 
when he had mentioned in the description as

3. The sequence of events could be out of order because each posting operation 
thread get out of waiting loop randomly. We should have something like event 
loop in TimelineClient side, putEntities() only put related entities into a 
queue of entities and a separated thread handle to deliver entities in queue to 
collector via REST call.
{quote}

Yes, understood. I guess I am trying to say that even if we can preserve the 
order of event writing, there is still no hard guarantee that they will be 
received by the server in the same order. Therefore, I am of the opinion that 
we do not need to spend too much effort ensuring this order (would be nice, but 
would not be fatal if we didn't do it). The server would just need to rely on 
explicit timestamps if it needs to determine the order of events.

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.1.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234.1.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-07 Thread Anubhav Dhoot (JIRA)

Anubhav Dhoot created YARN-4235:
---

 Summary: FairScheduler PrimaryGroup does not handle empty groups 
returned for a user 
 Key: YARN-4235
 URL: https://issues.apache.org/jira/browse/YARN-4235
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


We see NPE if empty groups are returned for a user. This causes a NPE and cause 
RM to crash as below

{noformat}
2015-09-22 16:51:52,780  FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3212)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
at java.lang.Thread.run(Thread.java:745)
2015-09-22 16:51:52,797  INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947315#comment-14947315
 ] 

Bikas Saha commented on YARN-1509:
--

Sorry for coming in late on this. I have some questions on the API.

Why are there separate methods for increase and decrease instead of a single 
method to change the container resource size? By comparing the existing 
resource allocation to a container and the new requested resource allocation, 
it should be clear whether an increase or decrease is being requested.

Also, for completeness, is there a need for a cancelContainerResourceChange()? 
After a container resource change request has been submitted, what are my 
options as a user other than to wait for the request to be satisfied by the RM? 

If I release the container, then does it mean all pending change requests for 
that container should be removed? From a quick look at the patch, it does not 
look like that is being covered, unless I am missing something.

What will happen if the AM restarts after submitting a change request. Does the 
AM-RM re-register protocol need an update to handle the case of 
re-synchronizing on the change requests? Whats happens if the RM restarts? If 
these are explained in a document, then please point me to the document. The 
patch did not seem to have anything around this area. So I thought I would ask.

Also, why have the callback interface methods been made non-public? Would that 
be an incompatible change?

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Description: In this ticket, we will add new put APIs in timelineClient to 
let clients/applications have the option to use ATS v1.5  (was: In this ticket, 
we will add new put api in timelineClient to let clients/applications have the 
option to use ATS v1.5)

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Summary: New put APIs in TimelineClient for ats v1.5  (was: New put api in 
TimelineClient for ats v1.5)

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> In this ticket, we will add new put api in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4234) New put api in TimelineClient for ats v1.5

2015-10-07 Thread Xuan Gong (JIRA)

Xuan Gong created YARN-4234:
---

 Summary: New put api in TimelineClient for ats v1.5
 Key: YARN-4234
 URL: https://issues.apache.org/jira/browse/YARN-4234
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Xuan Gong
Assignee: Xuan Gong


In this ticket, we will add new put api in timelineClient to let 
clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947261#comment-14947261
 ] 

Naganarasimha G R commented on YARN-3367:
-

Hi [~sjlee0] , Thanks for the comments, 
Went through the comments which had pointed out  in YARN-4061. Felt most of 
them valid.
bq. Whether we will do that on the sync side or not, I think we have some 
flexibility.
You mean that flush need not be coupled with the sync call ? if so will we be 
exposing one more interface in client side ? Felt coupling flush with sync was 
a better option.

bq. As for the timestamps, I am also arguing that the timestamps should always 
be set explicitly for entities/metrics/events, and that the server should rely 
on the explicit timestamps, rather than on time of receipt.
Agree with your points only concerned what were issues/points [~djp] had, when 
he had mentioned in the description as 
bq. 3. The sequence of events could be out of order because each posting 
operation thread get out of waiting loop randomly. We should have something 
like event loop in TimelineClient side, putEntities() only put related entities 
into a queue of entities and a separated thread handle to deliver entities in 
queue to collector via REST call.


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-4233) YARN Timeline Service plugin: ATS v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-4233:
---

Assignee: Xuan Gong

> YARN Timeline Service plugin: ATS v1.5
> --
>
> Key: YARN-4233
> URL: https://issues.apache.org/jira/browse/YARN-4233
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4219:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-4233

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-10-07 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3942:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4233

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4233) YARN Timeline Service plugin: ATS v1.5

2015-10-07 Thread Xuan Gong (JIRA)

Xuan Gong created YARN-4233:
---

 Summary: YARN Timeline Service plugin: ATS v1.5
 Key: YARN-4233
 URL: https://issues.apache.org/jira/browse/YARN-4233
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-07 Thread Neelesh Srinivas Salian (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947142#comment-14947142
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

I'll go back and fix those. I know why this happened; the added implementation 
of incrementAllocationCapability on Fifo and Capacity. Need to do this in a 
better way.

Will update soon.

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-10-07 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947108#comment-14947108
 ] 

Sangjin Lee commented on YARN-3798:
---

Please disregard. I thought the patch was misnamed, but it wasn't. Somehow 
jenkins checked out trunk for this test.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(Zo

[jira] [Commented] (YARN-4221) Store user in app to flow table

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947070#comment-14947070
 ] 

Hadoop QA commented on YARN-4221:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 52s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 47s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 43s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  42m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765396/YARN-4221-YARN-2928.02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 5a3db96 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9372/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9372/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9372/console |


This message was automatically generated.

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch, 
> YARN-4221-YARN-2928.02.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-07 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947056#comment-14947056
 ] 

MENG DING commented on YARN-1509:
-

Test failure is not related to this patch.
Checkstyle warning is the same as before.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947025#comment-14947025
 ] 

Hadoop QA commented on YARN-1509:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 28s | The applied patch generated  5 
new checkstyle issues (total was 79, now 78). |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   7m 23s | Tests failed in 
hadoop-yarn-client. |
| | |  46m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765393/YARN-1509.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 61b3547 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/diffcheckstylehadoop-yarn-client.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9371/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9371/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9371/console |


This message was automatically generated.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-10-07 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947014#comment-14947014
 ] 

zhihai xu commented on YARN-3943:
-

Hi [~jlowe], Could you help review the patch thanks?

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4221) Store user in app to flow table

2015-10-07 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4221:
---
Attachment: YARN-4221-YARN-2928.02.patch

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch, 
> YARN-4221-YARN-2928.02.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-07 Thread MENG DING (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1509:

Attachment: YARN-1509.5.patch

Thanks [~leftnoteasy]. Attaching the patch that addresses the comments.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4232) TopCLI console shows exceptions for help command

2015-10-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946658#comment-14946658
 ] 

Bibin A Chundatt commented on YARN-4232:


Testcase failure is not related this jira patch

> TopCLI console shows exceptions  for help command
> -
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946648#comment-14946648
 ] 

Hudson commented on YARN-4228:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2403 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2403/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946646#comment-14946646
 ] 

Hudson commented on YARN-4209:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2403 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2403/])
YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4232) TopCLI console shows exceptions for help command

2015-10-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946598#comment-14946598
 ] 

Hadoop QA commented on YARN-4232:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 58s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   8m  0s | Tests failed in 
hadoop-yarn-client. |
| | |  47m 46s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.TestApplicationClientProtocolOnHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765343/0001-YARN-4232.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3112f26 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9370/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9370/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9370/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9370/console |


This message was automatically generated.

> TopCLI console shows exceptions  for help command
> -
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol

[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-07 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946540#comment-14946540
 ] 

Varun Vasudev commented on YARN-4009:
-

[~jeagles] - just to make sure we're on the same page - you're suggesting that 
if we find the HttpCrossOriginFilterInitializer(which is present in 
hadoop-common) in the filter initializers list, we should just remove it from 
the initializers list and use the timeline server configurations instead?

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946513#comment-14946513
 ] 

Hudson commented on YARN-4228:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #464 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/464/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946512#comment-14946512
 ] 

Hudson commented on YARN-4209:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #464 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/464/])
YARN-4209. RMStateStore FENCED state doesn’t work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946497#comment-14946497
 ] 

Hudson commented on YARN-4228:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2435 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2435/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4232) TopCLI console shows exceptions for help command

2015-10-07 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: 0001-YARN-4232.patch

Currently HA mode is not supported.For getting RM start time http request gets 
submitted to default IP port . Clearing screen and showing help message when 
request is failing.

Attaching patch for the same.

> TopCLI console shows exceptions  for help command
> -
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-07 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946465#comment-14946465
 ] 

Varun Saxena commented on YARN-4178:


Thanks for the commit [~sjlee0] and others for the reviews

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Fix For: YARN-2928
>
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch, YARN-4178-YARN-2928.05.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4232) TopCLI console shows exceptions for help command

2015-10-07 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-4232:
--

 Summary: TopCLI console shows exceptions  for help command
 Key: YARN-4232
 URL: https://issues.apache.org/jira/browse/YARN-4232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


*Steps to reproduce*

Start Top command in YARN in HA mode
./yarn top

{noformat}
usage: yarn top
 -cols  Number of columns on the terminal
 -delay The refresh delay(in seconds), default is 3 seconds
 -help   Print usage; for help while the tool is running press 'h'
 + Enter
 -queuesComma separated list of queues to restrict applications
 -rows  Number of rows on the terminal
 -types Comma separated list of types to restrict applications,
 case sensitive(though the display is lower case)
 -users Comma separated list of users to restrict applications

{noformat}

Execute *for help while the tool is running press 'h'  + Enter* while top tool 
is running

Exception is thrown in console continuously
{noformat}
15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
at 
org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)

{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946451#comment-14946451
 ] 

Hudson commented on YARN-4228:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #499 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/499/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946394#comment-14946394
 ] 

Hudson commented on YARN-4209:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2434 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2434/])
YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4228) FileSystemRMStateStore use IOUtils#close instead of fs#close

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946392#comment-14946392
 ] 

Hudson commented on YARN-4228:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1228 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1228/])
YARN-4228. FileSystemRMStateStore use IOUtils#close instead of fs#close. 
(rohithsharmaks: rev 3793cbe4c3cce5d03c4a18d562cbcb7cacd8f743)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


> FileSystemRMStateStore use IOUtils#close instead of fs#close
> 
>
> Key: YARN-4228
> URL: https://issues.apache.org/jira/browse/YARN-4228
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4228.patch
>
>
> NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
> initialization fails on rm start up
> {noformat}
> 2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946390#comment-14946390
 ] 

Hudson commented on YARN-4209:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1228 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1228/])
YARN-4209. RMStateStore FENCED state doesnât work due to (rohithsharmaks: rev 
9156fc60c654e9305411686878acb443f3be1e67)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestMemoryRMStateStore.java


> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

65 matches

Mail list logo