[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372561#comment-14372561
 ] 

Hadoop QA commented on YARN-3069:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706130/YARN-3069.003.patch
  against trunk revision e1feb4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7062//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7062//console

This message is automatically generated.

> Document missing properties in yarn-default.xml
> ---
>
> Key: YARN-3069
> URL: https://issues.apache.org/jira/browse/YARN-3069
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
> YARN-3069.003.patch
>
>
> The following properties are currently not defined in yarn-default.xml.  
> These properties should either be
>   A) documented in yarn-default.xml OR
>   B)  listed as an exception (with comments, e.g. for internal use) in the 
> TestYarnConfigurationFields unit test
> Any comments for any of the properties below are welcome.
>   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
>   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
>   security.applicationhistory.protocol.acl
>   yarn.app.container.log.backups
>   yarn.app.container.log.dir
>   yarn.app.container.log.filesize
>   yarn.client.app-submission.poll-interval
>   yarn.client.application-client-protocol.poll-timeout-ms
>   yarn.is.minicluster
>   yarn.log.server.url
>   yarn.minicluster.control-resource-monitoring
>   yarn.minicluster.fixed.ports
>   yarn.minicluster.use-rpc
>   yarn.node-labels.fs-store.retry-policy-spec
>   yarn.node-labels.fs-store.root-dir
>   yarn.node-labels.manager-class
>   yarn.nodemanager.container-executor.os.sched.priority.adjustment
>   yarn.nodemanager.container-monitor.process-tree.class
>   yarn.nodemanager.disk-health-checker.enable
>   yarn.nodemanager.docker-container-executor.image-name
>   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
>   yarn.nodemanager.linux-container-executor.group
>   yarn.nodemanager.log.deletion-threads-count
>   yarn.nodemanager.user-home-dir
>   yarn.nodemanager.webapp.https.address
>   yarn.nodemanager.webapp.spnego-keytab-file
>   yarn.nodemanager.webapp.spnego-principal
>   yarn.nodemanager.windows-secure-container-executor.group
>   yarn.resourcemanager.configuration.file-system-based-store
>   yarn.resourcemanager.delegation-token-renewer.thread-count
>   yarn.resourcemanager.delegation.key.update-interval
>   yarn.resourcemanager.delegation.token.max-lifetime
>   yarn.resourcemanager.delegation.token.renew-interval
>   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
>   yarn.resourcemanager.metrics.runtime.buckets
>   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.reservation-system.class
>   yarn.resourcemanager.reservation-system.enable
>   yarn.resourcemanager.reservation-system.plan.follower
>   yarn.resourcemanager.reservation-system.planfollower.time-step
>   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
>   yarn.resourcemanager.webapp.spnego-keytab-file
>   yarn.resourcemanager.webapp.spnego-principal
>   yarn.scheduler.include-port-in-node-name
>   yarn.timeline-service.delegation.key.update-interval
>   yarn.timeline-service.delegation.token.max-lifetime
>   yarn.timeline-service.delegation.token.renew-interval
>   yarn.timeline-service.generic-application-history.enabled
>   
> yarn.timeline-service.generic-application-history.fs-history-store.compression-type
>   yarn.timeline-service.generic-application-history.fs-history-store.uri
>   yarn.timeline-service.generic-application-history.store-class
>   yar

[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml

2015-03-20 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3069:
-
Attachment: YARN-3069.003.patch

- Added minimal comments to most of the missing properties.
- Could use help adding descriptions to any remaining properties with empty 
description sections.
- Turn on error checking in both directions (XML->Java, Java->XML)

> Document missing properties in yarn-default.xml
> ---
>
> Key: YARN-3069
> URL: https://issues.apache.org/jira/browse/YARN-3069
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
> YARN-3069.003.patch
>
>
> The following properties are currently not defined in yarn-default.xml.  
> These properties should either be
>   A) documented in yarn-default.xml OR
>   B)  listed as an exception (with comments, e.g. for internal use) in the 
> TestYarnConfigurationFields unit test
> Any comments for any of the properties below are welcome.
>   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
>   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
>   security.applicationhistory.protocol.acl
>   yarn.app.container.log.backups
>   yarn.app.container.log.dir
>   yarn.app.container.log.filesize
>   yarn.client.app-submission.poll-interval
>   yarn.client.application-client-protocol.poll-timeout-ms
>   yarn.is.minicluster
>   yarn.log.server.url
>   yarn.minicluster.control-resource-monitoring
>   yarn.minicluster.fixed.ports
>   yarn.minicluster.use-rpc
>   yarn.node-labels.fs-store.retry-policy-spec
>   yarn.node-labels.fs-store.root-dir
>   yarn.node-labels.manager-class
>   yarn.nodemanager.container-executor.os.sched.priority.adjustment
>   yarn.nodemanager.container-monitor.process-tree.class
>   yarn.nodemanager.disk-health-checker.enable
>   yarn.nodemanager.docker-container-executor.image-name
>   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
>   yarn.nodemanager.linux-container-executor.group
>   yarn.nodemanager.log.deletion-threads-count
>   yarn.nodemanager.user-home-dir
>   yarn.nodemanager.webapp.https.address
>   yarn.nodemanager.webapp.spnego-keytab-file
>   yarn.nodemanager.webapp.spnego-principal
>   yarn.nodemanager.windows-secure-container-executor.group
>   yarn.resourcemanager.configuration.file-system-based-store
>   yarn.resourcemanager.delegation-token-renewer.thread-count
>   yarn.resourcemanager.delegation.key.update-interval
>   yarn.resourcemanager.delegation.token.max-lifetime
>   yarn.resourcemanager.delegation.token.renew-interval
>   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
>   yarn.resourcemanager.metrics.runtime.buckets
>   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
>   yarn.resourcemanager.reservation-system.class
>   yarn.resourcemanager.reservation-system.enable
>   yarn.resourcemanager.reservation-system.plan.follower
>   yarn.resourcemanager.reservation-system.planfollower.time-step
>   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
>   yarn.resourcemanager.webapp.spnego-keytab-file
>   yarn.resourcemanager.webapp.spnego-principal
>   yarn.scheduler.include-port-in-node-name
>   yarn.timeline-service.delegation.key.update-interval
>   yarn.timeline-service.delegation.token.max-lifetime
>   yarn.timeline-service.delegation.token.renew-interval
>   yarn.timeline-service.generic-application-history.enabled
>   
> yarn.timeline-service.generic-application-history.fs-history-store.compression-type
>   yarn.timeline-service.generic-application-history.fs-history-store.uri
>   yarn.timeline-service.generic-application-history.store-class
>   yarn.timeline-service.http-cross-origin.enabled
>   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-20 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.012.patch

Fix patching conflict introduced by YARN-3356.

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch, 
> YARN-2868.012.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372531#comment-14372531
 ] 

Hadoop QA commented on YARN-3241:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706123/YARN-3241.002.patch
  against trunk revision e1feb4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7060//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7060//console

This message is automatically generated.

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
> YARN-3241.002.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372524#comment-14372524
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12706116/YARN-2495.20150321-1.patch
  against trunk revision e1feb4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7059//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7059//console

This message is automatically generated.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495.20150321-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-03-20 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372515#comment-14372515
 ] 

Chengbing Liu commented on YARN-3024:
-

[~kasha], I think we should use {{==}} for enum members, for it is both null 
safe and it saves a function call.
The TODOs were there before this patch. Previously there were 5 TODOs. I did 
some refactoring to remove the duplicated code, and now there are 3. Would you 
like me to create JIRAs to follow the issue?

> LocalizerRunner should give DIE action when all resources are localized
> ---
>
> Key: YARN-3024
> URL: https://issues.apache.org/jira/browse/YARN-3024
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.7.0
>
> Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
> YARN-3024.03.patch, YARN-3024.04.patch
>
>
> We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
> end of localization process.
> The problem is {{findNextResource()}} can return null even when {{pending}} 
> was not empty prior to the call. This method removes localized resources from 
> {{pending}}, therefore we should check the return value, and gives DIE action 
> when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3241:

Attachment: YARN-3241.002.patch

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
> YARN-3241.002.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3241:

Attachment: (was: YARN-3241.002.patch)

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-20 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20150321-1.patch

Hopefully final patch :)
Have taken care of the following :
* StringArrayProto.stringElement -> elements
* remove original.setNodeLabels(null) in 
testNodeHeartbeatRequestPBImplWithNullLabels
* NodeLabelsProviderService -> NodeLabelsProvider modifications
* nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495.20150321-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372472#comment-14372472
 ] 

Hadoop QA commented on YARN-3241:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706102/YARN-3241.002.patch
  against trunk revision 7f1e2f9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7058//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7058//console

This message is automatically generated.

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
> YARN-3241.002.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372468#comment-14372468
 ] 

Naganarasimha G R commented on YARN-3034:
-

Ok [~zjshen], will have a look at YARN-3040 and raise the issue, but can you or 
[~djp] review the current patch and commit if fine.

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, 
> YARN-3034.20150320-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372456#comment-14372456
 ] 

Hudson commented on YARN-3345:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7392 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7392/])
YARN-3345. Add non-exclusive node label API. Contributed by Wangda Tan (jianhe: 
rev e1feb4ea1a532d680d6ca69b55ffcae1552d64f0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/RMNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/NodeLabelPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UpdateNodeLabelsRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/NodeLabelsStoreEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UpdateNodeLabelsRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/event/StoreUpdateNodeLabelsEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UpdateNodeLabelsResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UpdateNodeLabelsResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeLabelsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java


> Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager
> --
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> 

[jira] [Commented] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372445#comment-14372445
 ] 

Hadoop QA commented on YARN-1612:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642521/YARN-1612-v2.patch
  against trunk revision fe5c23b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7055//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7055//console

This message is automatically generated.

> Change Fair Scheduler to not disable delay scheduling by default
> 
>
> Key: YARN-1612
> URL: https://issues.apache.org/jira/browse/YARN-1612
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Chen He
> Attachments: YARN-1612-v2.patch, YARN-1612.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372436#comment-14372436
 ] 

Hadoop QA commented on YARN-2306:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666133/YARN-2306-2.patch
  against trunk revision fe5c23b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7053//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7053//console

This message is automatically generated.

> leak of reservation metrics (fair scheduler)
> 
>
> Key: YARN-2306
> URL: https://issues.apache.org/jira/browse/YARN-2306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2306-2.patch, YARN-2306.patch
>
>
> This only applies to fair scheduler. Capacity scheduler is OK.
> When appAttempt or node is removed, the metrics for 
> reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
> back.
> These are important metrics for administrator. The wrong metrics confuses may 
> confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372434#comment-14372434
 ] 

Hadoop QA commented on YARN-3336:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706081/YARN-3336.004.patch
  against trunk revision fe5c23b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7051//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7051//console

This message is automatically generated.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;   

[jira] [Commented] (YARN-3350) YARN RackResolver spams logs with messages at info level

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372431#comment-14372431
 ] 

Hudson commented on YARN-3350:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7391 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7391/])
YARN-3350. YARN RackResolver spams logs with messages at info level. 
Contributed by Wilfred Spiegelenburg (junping_du: rev 
7f1e2f996995e1883d9336f720c27621cf1b73b6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java


> YARN RackResolver spams logs with messages at info level
> 
>
> Key: YARN-3350
> URL: https://issues.apache.org/jira/browse/YARN-3350
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Fix For: 2.8.0
>
> Attachments: YARN-3350.2.patch, YARN-3350.patch, 
> yarn-RackResolver-log.txt
>
>
> When you run an application the container logs shows a lot of messages for 
> the RackResolver:
> 2015-03-10 00:58:30,483 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.yarn.util.RackResolver: Resolved node175.example.com to 
> /rack15
> A real world example for a large job was generating 20+ messages in 2 
> milliseconds during a sustained period of time flooding the logs causing the 
> node to run out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Commented] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372430#comment-14372430
 ] 

Hadoop QA commented on YARN-3383:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12706096/YARN-3383-032015.patch
  against trunk revision 7f1e2f9.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7056//console

This message is automatically generated.

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
> Attachments: YARN-3383-032015.patch
>
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2098) App priority support in Fair Scheduler

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372428#comment-14372428
 ] 

Hadoop QA commented on YARN-2098:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647316/YARN-2098.patch
  against trunk revision 7f1e2f9.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7057//console

This message is automatically generated.

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1,we should
> change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.

2015-03-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372421#comment-14372421
 ] 

Junping Du commented on YARN-3334:
--

Hi [~gtCarrera9], thanks for the questions here. I agree that at the end of the 
day we should have a dedicated TimelineServiceTest for v1 and v2 which we can 
submit different applications (include not not limit to DistributedShell) and 
do related check. I remember you filed a JIRA to refactor timeline test case 
for TestDistributedShell. May be we can start from there? Just 2 cents.

> [Event Producers] NM start to posting some app related metrics in early POC 
> stage of phase 2.
> -
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372412#comment-14372412
 ] 

Sangjin Lee commented on YARN-3040:
---

[~zjshen], thanks for your updated patch and prompt answers! I'll go over the 
new patch in some more detail, and get back to you. I haven't looked at the 
patch just yet, and therefore I might be saying something dumb, but I thought 
I'd reply to some of your points. Hopefully this will move things forward.

bq. RM will have all the above context info. When constructing and starting RM 
collector, we should make sure it be setup.
Since RM's collector will handle multiple applications, there is no one-to-one 
relationship between flow/flow-run/app and an instance of the RM collector. RM 
will just have to retain that information in memory for multiple apps, and pass 
that along on a per-call basis to the storage.

bq. Personally, I prefer to user ID to be uniform among the all the context 
properties. ID indicates it can be used to identify a flow.
I'm OK with "flow id" if it increases consistency.

bq. I thought version is part of flow id. I think we can revisit it once the 
schema is done, and we finalized the generic description about the flow 
structure and the notation. So far I'd like to keep it as what it is now. 
Thoughts?
Hmm, I didn't think the version as part of the flow id. Here we're thinking bit 
ahead to the storage and query aspects of it, but it's perfectly feasible to 
ask questions like "give me the latest 10 runs of the flow named 'foo.pig'". 
Note that those latest 10 runs can have different versions. This implies there 
needs to be a semantic differentiation between the flow id (name) and the flow 
version. Namely, in this query the flow version is *not* used to retrieve the 
last 10 runs. So I would advocate having a separate field/attribute named "flow 
version" from "flow id".

As for the run id being numeric, as Li alluded to it, there is a significant 
advantage in having run id's as numbers (longs really) as it lends itself to 
super-easy sorting. It's a little bit of storage concern leaking to the higher 
level abstraction, but it's a strong reason to qualify it as a number IMO.

bq. It makes sense, but when RM restarts we use the new start time of RM to 
identify the app instead of the one before. In current way, cluster_xyz will 
contain the application_xyz_123. This was my rationale before. And this default 
cluster id construction is only used in the case the user didn't specify the 
cluster id in config file. In production, user should specify one. I'll thought 
about the question again.
I'm still not sure why it would make sense to have different logical cluster 
id's every time the RM/cluster restarts. Logically, a single cluster should be 
identified by a long-lived name. For example, UIs will be built on questions 
like "give me top 10 flows on cluster ABC". Queries like that surely wouldn't 
care about cluster restarts.

As for the default value, in fact I would imagine most use cases would not set 
the cluster id (just assuming the cluster default would be filled in). That 
would be the norm, not the exception.

Hope these help...

> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch, YARN-3040.2.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372395#comment-14372395
 ] 

Hadoop QA commented on YARN-1297:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628664/YARN-1297-2.patch
  against trunk revision fe5c23b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7054//console

This message is automatically generated.

> Miscellaneous Fair Scheduler speedups
> -
>
> Key: YARN-1297
> URL: https://issues.apache.org/jira/browse/YARN-1297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, 
> YARN-1297.patch
>
>
> I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
> identified a bunch of minimally invasive changes that can shave off a few 
> milliseconds.
> The main one is demoting a couple INFO log messages to DEBUG, which brought 
> my benchmark down from 16000 ms to 6000.
> A few others (which had way less of an impact) were
> * Most of the time in comparisons was being spent in Math.signum.  I switched 
> this to direct ifs and elses and it halved the percent of time spent in 
> comparisons.
> * I removed some unnecessary instantiations of Resource objects
> * I made it so that queues' usage wasn't calculated from the applications up 
> each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372393#comment-14372393
 ] 

Hadoop QA commented on YARN-3345:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706077/YARN-3345.7.patch
  against trunk revision 586348e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7049//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7049//console

This message is automatically generated.

> Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager
> --
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch, YARN-3345.7.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372389#comment-14372389
 ] 

zhihai xu commented on YARN-3241:
-

Hi [~kasha], many thanks for the review. These are very good suggestions. I 
uploaded a new patch YARN-3241.002.patch, which addressed all your comments. 
Please review it.

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
> YARN-3241.002.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372384#comment-14372384
 ] 

Li Lu commented on YARN-3040:
-

Hi [~zjshen], some quick thoughts...

bq. It sounds like each NM will need to have multiple timeline clients (one for 
each application).
bq. That's correct.
bq. The RM will have its own collector, and it does not go through the 
TimelineClient API. How would that work?
bq. RM will have all the above context info. When constructing and starting RM 
collector, we should make sure it be setup.

For both RM and NMs, they are posting predefined "application history info", 
but not "generic" (I'm trying to use the wording in ATS v1 but correct me if 
I'm wrong.). I'm thinking the if it's possible to have another client 
implement, based on our existing implement, that can handle multiple 
applications within the same client? It sounds not quite scalable if we have 
one client for each app in the RM...

bq. I thought version is part of flow id. I think we can revisit it once the 
schema is done, and we finalized the generic description about the flow 
structure and the notation. So far I'd like to keep it as what it is now. 
Thoughts?

One most significant advantage to have run ids as integers is we can easily 
sort all existing runs for one flow in ascending or descending order. This 
might be a solid use case in general? 

bq. It makes sense, but when RM restarts we use the new start time of RM to 
identify the app instead of the one before. In current way, cluster_xyz will 
contain the application_xyz_123. This was my rationale before. And this default 
cluster id construction is only used in the case the user didn't specify the 
cluster id in config file. In production, user should specify one. I'll thought 
about the question again.

Mostly fine, but I have some concerns about rolling upgrades. With rolling 
upgrades, if we're not specifying cluster ids explicitly, applications that 
live across an upgrade will have two different primary keys. Even though we may 
merge this in our reader (which still sounds suboptimal), this may pose a 
challenge to our aggregators (data will be aggregated to two different entities 
across time). Any suggestions on this? 

> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch, YARN-3040.2.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3241:

Attachment: YARN-3241.002.patch

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch, 
> YARN-3241.002.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1501) Fair Scheduler will NPE if it hits IOException on queue assignment

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372374#comment-14372374
 ] 

Hadoop QA commented on YARN-1501:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620440/YARN-1501.patch
  against trunk revision fe5c23b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7052//console

This message is automatically generated.

> Fair Scheduler will NPE if it hits IOException on queue assignment
> --
>
> Key: YARN-1501
> URL: https://issues.apache.org/jira/browse/YARN-1501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: haosdent
>  Labels: newbie
> Fix For: 2.2.1
>
> Attachments: YARN-1501.patch
>
>
> {code}
> try {
>   QueuePlacementPolicy placementPolicy = allocConf.getPlacementPolicy();
>   queueName = placementPolicy.assignAppToQueue(queueName, user);
>   if (queueName == null) {
> return null;
>   }
>   queue = queueMgr.getLeafQueue(queueName, true);
> } catch (IOException ex) {
>   LOG.error("Error assigning app to queue, rejecting", ex);
> }
> 
> if (rmApp != null) {
>   rmApp.setQueue(queue.getName());
> } else {
>   LOG.warn("Couldn't find RM app to set queue name on");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372364#comment-14372364
 ] 

Hadoop QA commented on YARN-2868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12706083/YARN-2868.011.patch
  against trunk revision fe5c23b.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7050//console

This message is automatically generated.

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2306:
---
Component/s: (was: scheduler)
 fairscheduler

> leak of reservation metrics (fair scheduler)
> 
>
> Key: YARN-2306
> URL: https://issues.apache.org/jira/browse/YARN-2306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2306-2.patch, YARN-2306.patch
>
>
> This only applies to fair scheduler. Capacity scheduler is OK.
> When appAttempt or node is removed, the metrics for 
> reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
> back.
> These are important metrics for administrator. The wrong metrics confuses may 
> confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2098:
---
Component/s: (was: scheduler)
 fairscheduler

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1,we should
> change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1612:
---
Component/s: (was: scheduler)
 fairscheduler

> Change Fair Scheduler to not disable delay scheduling by default
> 
>
> Key: YARN-1612
> URL: https://issues.apache.org/jira/browse/YARN-1612
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Chen He
> Attachments: YARN-1612-v2.patch, YARN-1612.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2083:
---
Component/s: (was: scheduler)
 fairscheduler

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
> Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, 
> YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2083:
---
Labels:   (was: fairscheduler)

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
> Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, 
> YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1501) Fair Scheduler will NPE if it hits IOException on queue assignment

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1501:
---
Component/s: (was: scheduler)
 fairscheduler

> Fair Scheduler will NPE if it hits IOException on queue assignment
> --
>
> Key: YARN-1501
> URL: https://issues.apache.org/jira/browse/YARN-1501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: haosdent
>  Labels: newbie
> Fix For: 2.2.1
>
> Attachments: YARN-1501.patch
>
>
> {code}
> try {
>   QueuePlacementPolicy placementPolicy = allocConf.getPlacementPolicy();
>   queueName = placementPolicy.assignAppToQueue(queueName, user);
>   if (queueName == null) {
> return null;
>   }
>   queue = queueMgr.getLeafQueue(queueName, true);
> } catch (IOException ex) {
>   LOG.error("Error assigning app to queue, rejecting", ex);
> }
> 
> if (rmApp != null) {
>   rmApp.setQueue(queue.getName());
> } else {
>   LOG.warn("Couldn't find RM app to set queue name on");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups

2015-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372354#comment-14372354
 ] 

Karthik Kambatla commented on YARN-1297:


[~sandyr] - sorry for dropping the ball on this. Are you able to update the 
patch? 

> Miscellaneous Fair Scheduler speedups
> -
>
> Key: YARN-1297
> URL: https://issues.apache.org/jira/browse/YARN-1297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, 
> YARN-1297.patch
>
>
> I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
> identified a bunch of minimally invasive changes that can shave off a few 
> milliseconds.
> The main one is demoting a couple INFO log messages to DEBUG, which brought 
> my benchmark down from 16000 ms to 6000.
> A few others (which had way less of an impact) were
> * Most of the time in comparisons was being spent in Math.signum.  I switched 
> this to direct ifs and elses and it halved the percent of time spent in 
> comparisons.
> * I removed some unnecessary instantiations of Resource objects
> * I made it so that queues' usage wasn't calculated from the applications up 
> each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1050) Document the Fair Scheduler REST API

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1050:
---
Component/s: fairscheduler

> Document the Fair Scheduler REST API
> 
>
> Key: YARN-1050
> URL: https://issues.apache.org/jira/browse/YARN-1050
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
> Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch
>
>
> The documentation should be placed here along with the Capacity Scheduler 
> documentation: 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1297) Miscellaneous Fair Scheduler speedups

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1297:
---
Component/s: (was: scheduler)
 fairscheduler

> Miscellaneous Fair Scheduler speedups
> -
>
> Key: YARN-1297
> URL: https://issues.apache.org/jira/browse/YARN-1297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.patch, 
> YARN-1297.patch
>
>
> I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
> identified a bunch of minimally invasive changes that can shave off a few 
> milliseconds.
> The main one is demoting a couple INFO log messages to DEBUG, which brought 
> my benchmark down from 16000 ms to 6000.
> A few others (which had way less of an impact) were
> * Most of the time in comparisons was being spent in Math.signum.  I switched 
> this to direct ifs and elses and it halved the percent of time spent in 
> comparisons.
> * I removed some unnecessary instantiations of Resource objects
> * I made it so that queues' usage wasn't calculated from the applications up 
> each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372351#comment-14372351
 ] 

Wangda Tan commented on YARN-2901:
--

Patched and tried, UI looks great! One suggestion is: do we need fold single 
line message when line is too long to be:
{{Unable to load native-hadoop library for your platform... using builtin-java 
cla}}
{{Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable}}
Actually the 2nd line can be entirely show in one line.
I think this is caused by hard coded column width, can we make it dynamic under 
different environment?

Maybe we need always not fold one line message?

Will include implementation review in next review.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372350#comment-14372350
 ] 

Wangda Tan commented on YARN-3383:
--

Patch LGTM, will commit later.

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
> Attachments: YARN-3383-032015.patch
>
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3383:

Attachment: YARN-3383-032015.patch

Agree. Upload a simple patch to quickly fix this problem. 

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
> Attachments: YARN-3383-032015.patch
>
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372342#comment-14372342
 ] 

Zhijie Shen commented on YARN-3034:
---

Kindly raise one issue that's not covered in YARN-3040: RMTimelineCollector 
needs to have the context info setup when being constructed and started.

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, 
> YARN-3034.20150320-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372341#comment-14372341
 ] 

Karthik Kambatla commented on YARN-2868:


+1, pending Jenkins.

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3383:

Priority: Major  (was: Minor)

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3383:

Priority: Minor  (was: Major)

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
>Priority: Minor
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-3383:
---

Assignee: Li Lu

> AdminService should use "warn" instead of "info" to log exception when 
> operation fails
> --
>
> Key: YARN-3383
> URL: https://issues.apache.org/jira/browse/YARN-3383
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Li Lu
>
> Now it uses info:
> {code}
>   private YarnException logAndWrapException(IOException ioe, String user,
>   String argName, String msg) throws YarnException {
> LOG.info("Exception " + msg, ioe);
> {code}
> But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3383) AdminService should use "warn" instead of "info" to log exception when operation fails

2015-03-20 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3383:


 Summary: AdminService should use "warn" instead of "info" to log 
exception when operation fails
 Key: YARN-3383
 URL: https://issues.apache.org/jira/browse/YARN-3383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Wangda Tan


Now it uses info:
{code}
  private YarnException logAndWrapException(IOException ioe, String user,
  String argName, String msg) throws YarnException {
LOG.info("Exception " + msg, ioe);
{code}
But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3040:
--
Attachment: YARN-3040.2.patch

The new patch changes the way that to pass in the context information to the 
aggregator. Again it's based on the assumption that the context won't change 
during the lifecycle of the app. Therefore, we don't need to specify the 
context info for every put-entity request, but set it to the timeline collector 
when is starts. The backend and the context information to keep is not change 
altered in the new patch.

In the new data flow of context information, clusterId is obtained from the 
configuration, appId is obtained when constructing the timeline collector. User 
and flow and flow run info will be passed to the collector at the starting 
stage via collector<->NM RPC interface. Among the three, user info is already 
available in NM, flow and flow run need to be provided by the user when 
submitting the application via the tag field. This info will be passed to NM 
when starting the AM container via the env of CLC. The collector will issue the 
query to NM to ask for this info.

The distributed shell has been updated to show how the client can pass flow and 
flow run info into the application. Test cases has been modified and added to 
verify: 1 the newly added RPC call works, 2 the context info works e2e.

To answer Sangjin' s questions:

bq. How can individual frameworks (MR, tez, ...) set these attributes and pass 
them to the RM at the time of the application launch? How does that information 
get passed to the TimelineClient and to the timeline collector?

The the description of the context information data flow before. And take a 
look at DS app for reference.

bq. It sounds like each NM will need to have multiple timeline clients (one for 
each application).

That's correct.

bq. The RM will have its own collector, and it does not go through the 
TimelineClient API. How would that work?

RM will have all the above context info. When constructing and starting RM 
collector, we should make sure it be setup.

bq. flowId should be flowName (that's the standard terminology we're using)

Personally, I prefer to user ID to be uniform among the all the context 
properties. ID indicates it can be used to identify a flow.

bq. flow version seems to be missing from this; while flow version is not part 
of the primary key of the entity, it is a necessary attribute
bq. I think flow run id can (and should) be a long; it doesn't have to be a 
generic string

I thought version is part of flow id. I think we can revisit it once the schema 
is done, and we finalized the *generic* description about the flow structure 
and the notation. So far I'd like to keep it as what it is now. Thoughts?

bq. the default cluster id should be just the cluster name; I'm not sure why we 
need to add the cluster start timestamp; 

It makes sense, but when RM restarts we use the new start time of RM to 
identify the app instead of the one before. In current way, cluster_xyz will 
contain the application_xyz_123. This was my rationale before. And this default 
cluster id construction is only used in the case the user didn't specify the 
cluster id in config file. In production, user should specify one. I'll thought 
about the question again.

bq. hopefully isUnitTest can be removed with the changes I made in the previous 
commit

Right. It's not necessary.

> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch, YARN-3040.2.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.

2015-03-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372329#comment-14372329
 ] 

Li Lu commented on YARN-3334:
-

Hi [~djp], thanks for the patch! I'd like to review it later, but just a quick 
(and general) question here. We're using TestDistributedShell for our POC for 
both stage 1 and 2, and probably for the next few phases as well. Most of our 
tests are mainly based on the infrastructure of the YARN distributed shell, but 
not to verify the correctness of the distributed shell itself. Maybe at some 
point we'd like to build our own end-to-end test for timeline service v2, based 
on, but independent from, TestDistributedShell? If we start to move things out 
from TestDistributedShell right before we merge YARN-2928 back, maybe it will 
be a little bit late? 

I'm not asking for an immediate answer for this problem for this specific JIRA, 
but that's maybe something we need to keep in mind...

> [Event Producers] NM start to posting some app related metrics in early POC 
> stage of phase 2.
> -
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3336:

Target Version/s: 2.7.0
Hadoop Flags: Reviewed

+1 for patch v004 pending Jenkins.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372290#comment-14372290
 ] 

zhihai xu commented on YARN-3336:
-

Oh, that is a good idea. I uploaded a new patch YARN-3336.004.patch, which use 
the instance counter for the verification. Please review it.
many thanks [~cnauroth]! 

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-03-20 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.011.patch

Updated again

> Add metric for initial container launch time to FairScheduler
> -
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
> YARN-2868.006.patch, YARN-2868.007.patch, YARN-2868.008.patch, 
> YARN-2868.009.patch, YARN-2868.010.patch, YARN-2868.011.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3336:

Attachment: YARN-3336.004.patch

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch, YARN-3336.004.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372280#comment-14372280
 ] 

Wangda Tan commented on YARN-2495:
--

2) You decide. :)
5)/7) You're correct, provider should only know it's label instead of "updated" 
or not, so null from provider should be considered to be "empty". And 
areNodeLabelsUpdated only receives non-null parameters, add a comment before 
areNodeLabelsUpdated to indicate null variables will be handled.

Thanks,

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372277#comment-14372277
 ] 

Li Lu commented on YARN-3047:
-

Hi [~varun_saxena]! Thanks for the patch. Here are some of my comments, sorry 
for the late reply. 

# In TimelineReaderStore, {{DEFAULT_LIMIT}} is never used. Do we want to keep 
this value here, or allow subclasses to access it, or we set up another 
configuration for this?
# NameValuePair duplicates the same file in ahs, timeline/NameValuePair. Since 
this is something used by both versions, maybe we can unify the two files 
somewhere? Maybe we'd like to avoid duplicating these files especially their 
contents are mostly the same and their paths are quite confusing. 
# In the original design, there was a reader pool that holds a set of readers. 
Each reader queries the underlying storage layer (and in future, the active 
apps). I was expecting this reader pool to be in TimelineReaderManager or 
TimelineReaderServer, but I could not find the logic to add or dispatch 
readers. Am I missing something here?
# In TimelineReaderWebServices, 
{code}
try {
  return new NameValuePair(strs[0].trim(),
  GenericObjectMapper.OBJECT_READER.readValue(strs[1].trim()));
} catch (Exception e) {
  return new NameValuePair(strs[0].trim(), strs[1].trim());
}
{code}
Why are we catching Exceptions, rather than the precise exception readValue may 
throw?
# In TimelineReaderWebServices
{code}
  public AboutInfo about(
  @Context HttpServletRequest req,
  @Context HttpServletResponse res) {
init(res);
return new AboutInfo("Timeline API");
  }
{code}
This about info seems to be confusing. It's exactly the same as the v1 timeline 
server, but on our endpoint we only have the reader APIs available. 
# Just want to double check that we do want to have the same end point for both 
timeline readers and collectors. (Seems fine with me, since the reader process 
runs on different machines as our collectors. )
# We're using Java 7 now so we can use switch statements for strings:
{code}
if (s.equals("EVENTS")) {
fieldList.add(Field.EVENTS);
  } else if (s.equals("LASTEVENTONLY")) {
fieldList.add(Field.LAST_EVENT_ONLY);
  } else if (s.equals("METRICS")) {
fieldList.add(Field.METRICS);
  } else if (s.equals("INFO")) {
fieldList.add(Field.INFO);
  } else if (s.equals("CONFIGS")) {
fieldList.add(Field.CONFIGS);
  } else {
throw new IllegalArgumentException("Requested nonexistent field " + s);
  }
{code}
For more information: 
http://docs.oracle.com/javase/7/docs/technotes/guides/language/strings-switch.html
# TimelineReaderServer, do we have a fixed order to start/stop service for 
super?
# TimelineReaderServer, notice that the following lines:
{code}
TimelineReaderServer timelineReaderServer = null;
try {
  timelineReaderServer = new TimelineReaderServer();
{code}
the line for the try statement is started with a tab instead of spaces. 

Still, I would be very appreciate if you could upload a simple writeup for the 
reader design. Thanks for working on this! 



> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3047.001.patch, YARN-3047.02.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3345) Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager

2015-03-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3345:
-
Attachment: YARN-3345.7.patch

Changed "shareable" to "exclusive" to make API more accurate.

> Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager
> --
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch, YARN-3345.7.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager

2015-03-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372209#comment-14372209
 ] 

Jian He commented on YARN-3345:
---

looks good. +1

> Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager
> --
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3345) Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager

2015-03-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3345:
-
Summary: Add non-exclusive node label API to RMAdmin protocol and 
NodeLabelsManager  (was: Add non-exclusive node label RMAdmin CLI/API)

> Add non-exclusive node label API to RMAdmin protocol and NodeLabelsManager
> --
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372165#comment-14372165
 ] 

Wangda Tan commented on YARN-3345:
--

Failed tests are not related to this patch.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer

2015-03-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372127#comment-14372127
 ] 

Chris Nauroth commented on YARN-3336:
-

Hello, [~zxu].  Thank you for providing the new patch and adding the test.

I think we can avoid the changes in {{FileSystem}} by adding an instance 
counter to {{MyFS}}.  We can increment it in the constructor and decrement it 
in {{close}}.  Then, the test can get the value of the counter before making 
the calls to {{obtainSystemTokensForUser}} and assert that the counter has the 
same value after those calls.

> FileSystem memory leak in DelegationTokenRenewer
> 
>
> Key: YARN-3336
> URL: https://issues.apache.org/jira/browse/YARN-3336
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3336.000.patch, YARN-3336.001.patch, 
> YARN-3336.002.patch, YARN-3336.003.patch
>
>
> FileSystem memory leak in DelegationTokenRenewer.
> Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new 
> FileSystem entry will be added to  FileSystem#CACHE which will never be 
> garbage collected.
> This is the implementation of obtainSystemTokensForUser:
> {code}
>   protected Token[] obtainSystemTokensForUser(String user,
>   final Credentials credentials) throws IOException, InterruptedException 
> {
> // Get new hdfs tokens on behalf of this user
> UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(user,
>   UserGroupInformation.getLoginUser());
> Token[] newTokens =
> proxyUser.doAs(new PrivilegedExceptionAction[]>() {
>   @Override
>   public Token[] run() throws Exception {
> return FileSystem.get(getConfig()).addDelegationTokens(
>   UserGroupInformation.getLoginUser().getUserName(), credentials);
>   }
> });
> return newTokens;
>   }
> {code}
> The memory leak happened when FileSystem.get(getConfig()) is called with a 
> new proxy user.
> Because createProxyUser will always create a new Subject.
> The calling sequence is 
> FileSystem.get(getConfig())=>FileSystem.get(getDefaultUri(conf), 
> conf)=>FileSystem.CACHE.get(uri, conf)=>FileSystem.CACHE.getInternal(uri, 
> conf, key)=>FileSystem.CACHE.map.get(key)=>createFileSystem(uri, conf)
> {code}
> public static UserGroupInformation createProxyUser(String user,
>   UserGroupInformation realUser) {
> if (user == null || user.isEmpty()) {
>   throw new IllegalArgumentException("Null user");
> }
> if (realUser == null) {
>   throw new IllegalArgumentException("Null real user");
> }
> Subject subject = new Subject();
> Set principals = subject.getPrincipals();
> principals.add(new User(user));
> principals.add(new RealUser(realUser));
> UserGroupInformation result =new UserGroupInformation(subject);
> result.setAuthenticationMethod(AuthenticationMethod.PROXY);
> return result;
>   }
> {code}
> FileSystem#Cache#Key.equals will compare the ugi
> {code}
>   Key(URI uri, Configuration conf, long unique) throws IOException {
> scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
> authority = 
> uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
> this.unique = unique;
> this.ugi = UserGroupInformation.getCurrentUser();
>   }
>   public boolean equals(Object obj) {
> if (obj == this) {
>   return true;
> }
> if (obj != null && obj instanceof Key) {
>   Key that = (Key)obj;
>   return isEqual(this.scheme, that.scheme)
>  && isEqual(this.authority, that.authority)
>  && isEqual(this.ugi, that.ugi)
>  && (this.unique == that.unique);
> }
> return false;
>   }
> {code}
> UserGroupInformation.equals will compare subject by reference.
> {code}
>   public boolean equals(Object o) {
> if (o == this) {
>   return true;
> } else if (o == null || getClass() != o.getClass()) {
>   return false;
> } else {
>   return subject == ((UserGroupInformation) o).subject;
> }
>   }
> {code}
> So in this case, every time createProxyUser and FileSystem.get(getConfig()) 
> are called, a new FileSystem will be created and a new entry will be added to 
> FileSystem.CACHE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372106#comment-14372106
 ] 

Hudson commented on YARN-3356:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7389 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7389/])
YARN-3356. Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to 
track used-resources-by-label. Contributed by Wangda Tan (jianhe: rev 
586348e4cbf197188057d6b843a6701cfffdaff3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java


> Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track 
> used-resources-by-label.
> --
>
> Key: YARN-3356
> URL: https://issues.apache.org/jira/browse/YARN-3356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, 
> YARN-3356.4.patch, YARN-3356.5.patch
>
>
> Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
> should use ResourceRequest to track resource-usage/pending by label for 
> better resource tracking and preemption. 
> And also, when application's pending resource changed (container allocated, 
> app completed, moved, etc.), we need update ResourceUsage of queue 
> hierarchies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3378) a load test client that can replay a volume of history files

2015-03-20 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-3378:
-

Assignee: Sangjin Lee  (was: Li Lu)

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372049#comment-14372049
 ] 

Hudson commented on YARN-3269:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7388 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7388/])
YARN-3269. Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
qualified path. Contributed by Xuan Gong (junping_du: rev 
d81109e588493cef31e68508a3d671203bd23e12)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3241:
---
 Component/s: (was: scheduler)
  fairscheduler
Hadoop Flags: Incompatible change

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2963) Helper library that allows requesting containers from multiple queues

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2963:
---
Target Version/s: 2.8.0  (was: 2.7.0)

> Helper library that allows requesting containers from multiple queues
> -
>
> Key: YARN-2963
> URL: https://issues.apache.org/jira/browse/YARN-2963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2963-preview.patch
>
>
> As proposed on the mailing list (yarn-dev), it would be nice to have a way 
> for YARN applications to request containers from multiple queues. 
> e.g. Oozie might want to run a single AM for all user jobs and request one 
> container per launcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2015-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1492:
---
Target Version/s: 2.7.0  (was: 2.6.0)

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371907#comment-14371907
 ] 

Karthik Kambatla commented on YARN-3241:


Thanks for the update, Zhihai. A couple of minor comments:
# Let us rename QueueNameException to InvalideQueueNameException
# QueueManager#checkQueueNodeName: If we want this method to boolean, I propose 
we call it isQueueNameValid. Otherwise, we should probably make it void and 
throw the exception from within this method and not have a separate check at 
the caller site.

> Leading space, trailing space and empty sub queue name may cause 
> MetricsException for fair scheduler
> 
>
> Key: YARN-3241
> URL: https://issues.apache.org/jira/browse/YARN-3241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3241.000.patch, YARN-3241.001.patch
>
>
> Leading space, trailing space and empty sub queue name may cause 
> MetricsException(Metrics source XXX already exists! ) when add application to 
> FairScheduler.
> The reason is because QueueMetrics parse the queue name different from the 
> QueueManager.
> QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
> and trailing space in the sub queue name, It will also remove empty sub queue 
> name.
> {code}
>   static final Splitter Q_SPLITTER =
>   Splitter.on('.').omitEmptyStrings().trimResults(); 
> {code}
> But QueueManager won't remove Leading space, trailing space and empty sub 
> queue name.
> This will cause out of sync between FSQueue and FSQueueMetrics.
> QueueManager will think two queue names are different so it will try to 
> create a new queue.
> But FSQueueMetrics will treat these two queue names as same queue which will 
> create "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371899#comment-14371899
 ] 

Sangjin Lee commented on YARN-3040:
---

Hi [~zjshen], thanks much for working on this. I just took a quick look at the 
patch and the discussion. It seems like you'll update it soon, but I'll pass 
along my comments just in case.

One high level comment: the original intent of this JIRA is more of an 
end-to-end flow of the flow information (flow name, flow version, and flow run 
id). How can individual frameworks (MR, tez, ...) set these attributes and pass 
them to the RM at the time of the application launch? How does that information 
get passed to the TimelineClient and to the timeline collector? We do need the 
API from the beginning portion of the end-to-end picture as well.

bq. new TimelineClient is constructed per application, and in the context of 
one application, we can reasonably assume this context information should be 
unchanged.

There are a couple of things to consider here (and it sounds like that may be 
part of the offline discussion). We need to make sure we handle the case of 
NM's writing container-related info. It sounds like each NM will need to have 
multiple timeline clients (one for each application).

More importantly, we need to think about the RM use case. The RM will have its 
own collector, and it does not go through the TimelineClient API. How would 
that work?

More individual comments:
- flowId should be flowName (that's the standard terminology we're using)
- flow version seems to be missing from this; while flow version is not part of 
the primary key of the entity, it is a necessary attribute
- I think flow run id can (and should) be a long; it doesn't have to be a 
generic string
- in light of this, it might be slightly better to have a (flow) context API 
rather than individual arguments where you can set all these flow-related 
attributes
- the default cluster id should be just the cluster name; I'm not sure why we 
need to add the cluster start timestamp; it would mean that every restart of 
the resource manager would create a new logical cluster in the timeline 
service; I'm not sure I agree with that
- hopefully isUnitTest can be removed with the changes I made in the previous 
commit


> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371884#comment-14371884
 ] 

Jian He edited comment on YARN-3294 at 3/20/15 7:19 PM:


Thanks for updating the patch, looks good overall, 
- how about passing the ‘logHierarchy’ as a parameter too ? so that the web 
service also work for user-provided class name, may be also take in logLevel 
and dumpLocation as a parameter too ?
{code}
   @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })

public String dumpSchedulerLogs(String time) throws IOException {
{code} 
- question - wondering whether we need the “@QueryParam” for the 'time' 
parameter, as other some other places in the same file do, e.g. getApps;
- if no permission for RM to write logs, will the webService throw error ?
- {{new File(System.getProperty("hadoop.log.dir"), targetFilename);}}, should 
this be “yarn.log.dir” ?
- {{return "Capacity scheduler logs are being created.";}} it should probably 
return JAXB formatted response. The response seems broken if I directly access 
the web service from my browser
- AdHocLogDumper#appenderLevels - entry is never removed from the map, is this 
excepted?
- may be mark private/unstable for the AdHocLogDumper for now


was (Author: jianhe):
Thanks for updating the patch, looks good overall, 
- how about passing the ‘logHierarchy’ as a parameter too ? so that the web 
service also work for user-provided class name, may be also take in logLevel 
and dumpLocation as a parameter too ?
{code}
   @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })

public String dumpSchedulerLogs(String time) throws IOException {
{code} 
- question - wondering whether we need the “@QueryParam” for the 'time' 
parameter, as other some other places in the same file do, e.g. getApps;
- if no permission for RM to write logs, will the webService throw error ?
- {{new File(System.getProperty("hadoop.log.dir"), targetFilename);}}, should 
this be “yarn.log.dir” ?
- {{return "Capacity scheduler logs are being created.";}} it should probably 
return JAXB formatted response. The response seems broken if I directly access 
the web service from my browser
- AdHocLogDumper#appenderLevels - entry is never removed from the map, is this 
excepted?

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371884#comment-14371884
 ] 

Jian He edited comment on YARN-3294 at 3/20/15 7:15 PM:


Thanks for updating the patch, looks good overall, 
- how about passing the ‘logHierarchy’ as a parameter too ? so that the web 
service also work for user-provided class name, may be also take in logLevel 
and dumpLocation as a parameter too ?
{code}
   @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })

public String dumpSchedulerLogs(String time) throws IOException {
{code} 
- question - wondering whether we need the “@QueryParam” for the 'time' 
parameter, as other some other places in the same file do, e.g. getApps;
- if no permission for RM to write logs, will the webService throw error ?
- {{new File(System.getProperty("hadoop.log.dir"), targetFilename);}}, should 
this be “yarn.log.dir” ?
- {{return "Capacity scheduler logs are being created.";}} it should probably 
return JAXB formatted response. The response seems broken if I directly access 
the web service from my browser
- AdHocLogDumper#appenderLevels - entry is never removed from the map, is this 
excepted?


was (Author: jianhe):
Thanks for updating the patch, looks good overall, 
- how about passing the ‘logHierarchy’ as a parameter too ? so that the web 
service also work for user-provided class name, may be also take in 
dumpLocation as a parameter too ?
{code}
   @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })

public String dumpSchedulerLogs(String time) throws IOException {
{code} 
- question - wondering whether we need the “@QueryParam” for the 'time' 
parameter, as other some other places in the same file do, e.g. getApps;
- if no permission for RM to write logs, will the webService throw error ?
- {{new File(System.getProperty("hadoop.log.dir"), targetFilename);}}, should 
this be “yarn.log.dir” ?
- {{return "Capacity scheduler logs are being created.";}} it should probably 
return JAXB formatted response. The response seems broken if I directly access 
the web service from my browser
- AdHocLogDumper#appenderLevels - entry is never removed from the map, is this 
excepted?

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371884#comment-14371884
 ] 

Jian He commented on YARN-3294:
---

Thanks for updating the patch, looks good overall, 
- how about passing the ‘logHierarchy’ as a parameter too ? so that the web 
service also work for user-provided class name, may be also take in 
dumpLocation as a parameter too ?
{code}
   @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })

public String dumpSchedulerLogs(String time) throws IOException {
{code} 
- question - wondering whether we need the “@QueryParam” for the 'time' 
parameter, as other some other places in the same file do, e.g. getApps;
- if no permission for RM to write logs, will the webService throw error ?
- {{new File(System.getProperty("hadoop.log.dir"), targetFilename);}}, should 
this be “yarn.log.dir” ?
- {{return "Capacity scheduler logs are being created.";}} it should probably 
return JAXB formatted response. The response seems broken if I directly access 
the web service from my browser
- AdHocLogDumper#appenderLevels - entry is never removed from the map, is this 
excepted?

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371835#comment-14371835
 ] 

Hadoop QA commented on YARN-3345:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705976/YARN-3345.6.patch
  against trunk revision 1561231.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7046//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7046//console

This message is automatically generated.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371830#comment-14371830
 ] 

Naganarasimha G R commented on YARN-2495:
-

Thanks for review  [~wangda] :
1) {{StringArrayProto.stringElement -> elements}}, while naming had done based 
on Protoc but as you mentioned seems like 
elements would make more sense in terms of java api, will correct it as part of 
the next patch.

2) {quote} After thought, I think optional bool areNodeLabelsAcceptedByRM = 7 
\[default = false\]; should be true to be more defensive: We need make sure 
there's no error when somebody forget to set this field. {quote} 
Well my view as explained earlier is lil diff, as even if some body forget it 
test case will ensure they do not miss it but the case which i mentioned 
earlier check whether its legitamate scenario
??But consider the case where NM gets upgraded first then it should not be the 
case NM sends labels and older RM ignores the additional labels but response by 
default sends labels are accepted. And also felt by name/functionality, it 
should be set to true only after RM accepts the labels??

3) will be taken care in next patch

4) {quote}NodeLabelsProviderService -> NodeLabelsProvider, like most other 
modules, we don't need to make "service" as a part of the classname, change sub 
classes and NodeManager.createNodeLabelsProviderService as well.{quote}
Well i agree with this but long b4 you had only asked me to change it 
NodeLabelsProviderService  [Prev comment 4th 
point|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14181031&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14181031].
 will get it done in next patch

5) {quote}NodeStatusUpdaterImpl.run:
617 int lastHeartbeatID = 0;
618 Set nodeLabelsLastUpdatedToRM = null;
619 if (hasNodeLabelsProvider)
No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? 
By default is "not change" instead of "empty", correct?  {quote}
nodeLabelsLastUpdatedToRM => lastUpdatedNodeLabelsToRM i.e. its not the one 
which will be sent as part of heartbeat (nodeLabelsForHeartbeat is used) and 
nodeLabelsLastUpdatedToRM is reference for comparison whether labels have 
changed since the last call. And as per the logic, NodeLabelsProvider even if 
its return null we consider it as empty labels. So i feel its correctly set.

6) will take care in next patch

7) areNodeLabelsUpdated: Need check null? And could you add more test to cover 
the case when new fetched node labels and/or last node labels are null?
{{Need check null}} has been taken care before calling this method, also you 
had asked for commenting this which i have done in NodeLabelsProvider, wil do 
this here too.
{{add more test to cover the case when new fetched node labels}} already there 
in TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(ln num 271 
-277)
As explained in prev comment even though the NodeLabelsProvider return null  
node labels we consider it as empty set hence there is no way last node labels 
are null. Null is sent as part of heartbeat only when lastUpdatedNodeLabelsToRM 
== nodeLabelsForHeartbeat . Will put some comments for this. 

Please let me know your comments will try to give the patch ASAP.. :)




> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371812#comment-14371812
 ] 

Hadoop QA commented on YARN-3034:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705983/YARN-3034.20150320-1.patch
  against trunk revision a6a5aae.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7048//console

This message is automatically generated.

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, 
> YARN-3034.20150320-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371798#comment-14371798
 ] 

Hadoop QA commented on YARN-3347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705973/YARN-3347.1.patch
  against trunk revision 1561231.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7045//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7045//console

This message is automatically generated.

> Improve YARN log command to get AMContainer logs
> 
>
> Key: YARN-3347
> URL: https://issues.apache.org/jira/browse/YARN-3347
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3347.1.patch
>
>
> Right now, we could specify applicationId, node http address and container ID 
> to get the specify container log. Or we could only specify applicationId to 
> get all the container logs. It is very hard for the users to get logs for AM 
> container since the AMContainer logs have more useful information. Users need 
> to know the AMContainer's container ID and related Node http address.
> We could improve the YARN Log Command to allow users to get AMContainer logs 
> directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-03-20 Thread Vijay Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371786#comment-14371786
 ] 

Vijay Bhat commented on YARN-2828:
--

[~aw], I've gotten all the tests to pass again for this patch. Could you please 
review when you get a chance? Thanks!

> Enable auto refresh of web pages (using http parameter)
> ---
>
> Key: YARN-2828
> URL: https://issues.apache.org/jira/browse/YARN-2828
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tim Robertson
>Assignee: Vijay Bhat
>Priority: Minor
> Attachments: YARN-2828.001.patch, YARN-2828.002.patch, 
> YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, 
> YARN-2828.006.patch
>
>
> The MR1 Job Tracker had a useful HTTP parameter of e.g. "&refresh=3" that 
> could be appended to URLs which enabled a page reload.  This was very useful 
> when developing mapreduce jobs, especially to watch counters changing.  This 
> is lost in the the Yarn interface.
> Could be implemented as a page element (e.g. drop down or so), but I'd 
> recommend that the page not be more cluttered, and simply bring back the 
> optional "refresh" HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371783#comment-14371783
 ] 

Hadoop QA commented on YARN-3294:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705966/apache-yarn-3294.3.patch
  against trunk revision 1561231.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7044//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7044//console

This message is automatically generated.

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371772#comment-14371772
 ] 

Hudson commented on YARN-3369:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7384 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7384/])
YARN-3369. Missing NullPointer check in AppSchedulingInfo causes RM to die. 
(Brahma Reddy Battula via wangda) (wangda: rev 
6bc7710ec7f2592c4c87dd940fbe5827ef81fe72)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3369-003.patch, YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371773#comment-14371773
 ] 

Hudson commented on YARN-2777:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7384 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7384/])
YARN-2777. Mark the end of individual log in aggregated log. Contributed 
(xgong: rev 1a4b52869191b7e39c0101d3585efc12d6362c1c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Fix For: 2.7.0
>
> Attachments: YARN-2777.001.patch, YARN-2777.02.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371766#comment-14371766
 ] 

Karthik Kambatla commented on YARN-3024:


[~chengbing.liu], [~xgong] - is there a particular reason for changing 
{{.equals()}} calls to {{==}}? Also, the patch seems to add some TODOs. Were 
there any follow-up JIRAs filed? 

> LocalizerRunner should give DIE action when all resources are localized
> ---
>
> Key: YARN-3024
> URL: https://issues.apache.org/jira/browse/YARN-3024
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.7.0
>
> Attachments: YARN-3024.01.patch, YARN-3024.02.patch, 
> YARN-3024.03.patch, YARN-3024.04.patch
>
>
> We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
> end of localization process.
> The problem is {{findNextResource()}} can return null even when {{pending}} 
> was not empty prior to the call. This method removes localized resources from 
> {{pending}}, therefore we should check the return value, and gives DIE action 
> when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371739#comment-14371739
 ] 

Hadoop QA commented on YARN-3269:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701264/YARN-3269.2.patch
  against trunk revision 1561231.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7047//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7047//console

This message is automatically generated.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-20 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3034:

Attachment: YARN-3034.20150320-1.patch

Corrected the following : 
* TIMELINE_SERVICE_PREFIX + "version" 
* equals => equalsIgnoreCase as user may input v1 or v2 (in lower case) which 
should also be accepted. 
* log if RMTimelineCollector is configured or not

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch, 
> YARN-3034.20150320-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371687#comment-14371687
 ] 

Xuan Gong commented on YARN-2777:
-

Committed to trunk/branch-2/branch-2.7. Thanks, Varun!

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Fix For: 2.7.0
>
> Attachments: YARN-2777.001.patch, YARN-2777.02.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371680#comment-14371680
 ] 

Naganarasimha G R commented on YARN-3034:
-

Thanks [~sjlee0] surprisingly yest faced problem it dint update, but today 
seems to be working fine, 
[~djp], 
??Also, we should add a warning message log if user put something illegal here 
or it just get silent without any warn.?? This i feel is not required as we 
don't do this for any other configuration and also we have clearly captured the 
possible values in the yarn-default.xml  instead will log if 
RMTimelineCollector is configured, ur opinion?  For other comments, will 
include them in the patch 

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-03-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371650#comment-14371650
 ] 

Junping Du commented on YARN-3269:
--

+1. v2 patch LGTM. Will commit it shortly.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-03-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371660#comment-14371660
 ] 

Xuan Gong commented on YARN-2777:
-

+1 LGTM

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.02.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-03-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371652#comment-14371652
 ] 

Junping Du commented on YARN-3269:
--

Kick off Jenkins test again in case any possible conflict happen during last 
submit.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3345:
-
Attachment: YARN-3345.6.patch

Attached ver.6 addressed test failure.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch, YARN-3345.6.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371638#comment-14371638
 ] 

Sangjin Lee commented on YARN-3047:
---

Hi [~varun_saxena], could you kindly update your patch with the latest from the 
branch and also address the feedback? It'd also be great if you could write up 
a short document on the reader design. Thanks much!

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3047.001.patch, YARN-3047.02.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files

2015-03-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371626#comment-14371626
 ] 

Li Lu commented on YARN-3378:
-

Thanks [~Naganarasimha] for reminding me YARN-2556. I looked at it. It seems 
like there are some storage level implementations missing in our v2 branch to 
adopt that patch. Since we're benchmarking on a WIP project, maybe we'd like to 
organize the benchmarks in a different way? I'll definitely keep an eye on it 
when working on this one for v2. 

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.

2015-03-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v1.patch

Upload v1 patch with adding test of TestDistributedShell to verify NM posting 
metrics info to new timeline service and verifying it works locally.

> [Event Producers] NM start to posting some app related metrics in early POC 
> stage of phase 2.
> -
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3347) Improve YARN log command to get AMContainer logs

2015-03-20 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3347:

Attachment: YARN-3347.1.patch

> Improve YARN log command to get AMContainer logs
> 
>
> Key: YARN-3347
> URL: https://issues.apache.org/jira/browse/YARN-3347
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3347.1.patch
>
>
> Right now, we could specify applicationId, node http address and container ID 
> to get the specify container log. Or we could only specify applicationId to 
> get all the container logs. It is very hard for the users to get logs for AM 
> container since the AMContainer logs have more useful information. Users need 
> to know the AMContainer's container ID and related Node http address.
> We could improve the YARN Log Command to allow users to get AMContainer logs 
> directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3347) Improve YARN log command to get AMContainer logs

2015-03-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371611#comment-14371611
 ] 

Xuan Gong commented on YARN-3347:
-

In the same patch, we also show the logs for the running containers.

> Improve YARN log command to get AMContainer logs
> 
>
> Key: YARN-3347
> URL: https://issues.apache.org/jira/browse/YARN-3347
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Right now, we could specify applicationId, node http address and container ID 
> to get the specify container log. Or we could only specify applicationId to 
> get all the container logs. It is very hard for the users to get logs for AM 
> container since the AMContainer logs have more useful information. Users need 
> to know the AMContainer's container ID and related Node http address.
> We could improve the YARN Log Command to allow users to get AMContainer logs 
> directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3294:

Attachment: apache-yarn-3294.3.patch

Uploaded a new patch to address findbug errors.

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files

2015-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371586#comment-14371586
 ] 

Naganarasimha G R commented on YARN-3378:
-

Thanks [~sjlee0], thats fine with me  :) . Anyway Jonathan has shared a patch 
for YARN-2556 two days back, may be you and [~gtCarrera9] can have a look at it 
.

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371588#comment-14371588
 ] 

Hadoop QA commented on YARN-2901:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705935/apache-yarn-2901.1.patch
  against trunk revision 8041267.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7043//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7043//console

This message is automatically generated.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files

2015-03-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371560#comment-14371560
 ] 

Sangjin Lee commented on YARN-3378:
---

Thanks [~Naganarasimha] for reminding me of YARN-2556. I forgot about that one. 
I do agree that the purpose of these JIRAs is quite similar. It would be ideal 
if we can use what comes out of YARN-2556 with no or little modifications. In 
the meantime, we can leave this open until we're getting close to do this. If 
YARN-2556 can be used as is, we could close this one then. Does that sound 
reasonable?

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371531#comment-14371531
 ] 

Hadoop QA commented on YARN-3294:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705928/apache-yarn-3294.2.patch
  against trunk revision 8041267.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7042//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7042//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7042//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7042//console

This message is automatically generated.

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371472#comment-14371472
 ] 

Hudson commented on YARN-3379:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2088 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2088/])
YARN-3379. Fixed missing data in localityTable and ResourceRequests table in RM 
WebUI. Contributed by Xuan Gong (jianhe: rev 
4e886eb9cbd2dcb128bbfd17309c734083093a4c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java


> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.7.0
>
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371451#comment-14371451
 ] 

Hudson commented on YARN-3379:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #138 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/138/])
YARN-3379. Fixed missing data in localityTable and ResourceRequests table in RM 
WebUI. Contributed by Xuan Gong (jianhe: rev 
4e886eb9cbd2dcb128bbfd17309c734083093a4c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java


> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.7.0
>
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371444#comment-14371444
 ] 

Hadoop QA commented on YARN-3345:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705763/YARN-3345.5.patch
  against trunk revision 8041267.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.api.TestPBImplRecords

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7041//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7041//console

This message is automatically generated.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-03-20 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2901:

Attachment: apache-yarn-2901.1.patch

Uploaded a new patch to address the findbugs errors.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >