[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909073#comment-14909073
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 25s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  4s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   8m 26s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  59m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
| Timed out tests | 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762493/YARN-3943.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 67b0e96 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9270/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9270/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9270/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9270/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9270/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9270/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: (was: YARN-3943.001.patch)

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.001.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909070#comment-14909070
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  3s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  4s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  60m 11s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.nodelabels.TestDelegatedNodeLabelsUpdater |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762490/YARN-3964.009.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 67b0e96 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9269/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9269/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9269/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9269/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9269/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9269/console |


This message was automatically generated.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Description: 
This JIRA intend to provide a lifetime monitor service. 
The service will monitor the applications where the life time is configured. If 
the application is running beyond the lifetime, it will be killed. 
The lifetime will be considered from the submit time.

The thread monitoring interval is configurable.


> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-162) nodemanager log aggregation has scaling issues with namenode

2015-09-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908237#comment-14908237
 ] 

Kuhu Shukla commented on YARN-162:
--

[~sseth] If you are not working on this, may I assign it to myself? Thanks!


> nodemanager log aggregation has scaling issues with namenode
> 
>
> Key: YARN-162
> URL: https://issues.apache.org/jira/browse/YARN-162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 0.23.3
>Reporter: Nathan Roberts
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: YARN-162.txt, YARN-162_WIP.txt, YARN-162_v2.txt, 
> YARN-162_v2.txt
>
>
> Log aggregation causes fd explosion on the namenode. On large clusters this 
> can exhaust FDs to the point where datanodes can't check-in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-09-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2729:

Attachment: YARN-2729.20150925-1.patch

Hi [~wangda] & [~rohithsharma],
Please find the attached patch with the offline discussed modifications
{quote}
*For Config :*
Currently we are supporting configuration param : 
"yarn.nodemanager.node-labels.provider.configured-node-labels"
so instead can we have = > 
yarn.nodemanager.node-labels.provider.configured-node-partition which accepts 
only single partition and in future we can add 
yarn.nodemanager.node-labels.provider.configured-node-constraints which can 
support multiple constraints

*For Script :*
Currently in the patch i am searching for lines starting with "NODE_LABELS:" in 
script output and labels separated by "," though we currently support only a 
single Partition per node.
So my suggestion we can currently support for parsing lines starting with 
"NODE_PARTITION:"  and send the string following this pattern as Node Label. 
Later in future we can support for  "NODE_CONSTRAINTS:"  and accept multiple 
constraints for it. 
{quote}

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3417) AM to be able to exit with a request saying "restart me with these (possibly updated) resource requirements"

2015-09-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908122#comment-14908122
 ] 

Kuhu Shukla commented on YARN-3417:
---

[~varun_saxena] Are you working on this JIRA currently? If not may I take this 
up? Thanks!

> AM to be able to exit with a request saying "restart me with these (possibly 
> updated) resource requirements"
> 
>
> Key: YARN-3417
> URL: https://issues.apache.org/jira/browse/YARN-3417
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
>
> If an AM wants to reconfigure itself or restart with new resources, there's 
> no way to do this without the active participation of a client.
> It can call System.exit and rely on YARN to restart it -but that counts as a 
> failure and may lose the entire app. furthermore, that doesn't allow the AM 
> to resize itself.
> A simple exit-code to be interpreted as restart-without-failure could handle 
> the first case; an explicit call to indicate restart, including potentially 
> new resource/label requirements, could be more reliabile, and certainly more 
> flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4186) Make WebAppUtils a public API for yarn

2015-09-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned YARN-4186:
-

Assignee: Kuhu Shukla

> Make WebAppUtils a public API for yarn
> --
>
> Key: YARN-4186
> URL: https://issues.apache.org/jira/browse/YARN-4186
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>
> Application types like Tez will want to expose AM container log location to 
> users without having to make a call to the RM.
> Exposing getRunningLogURL as public will provide this functionality.
> Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908204#comment-14908204
 ] 

nijel commented on YARN-4205:
-

Thanks [~leftnoteasy] for the comments
Sorry for the small mistakes

Updated the patch
bq. RMAppLifeTimeMonitorService.rmApps -> applicationIdToLifetime? (or shorter 
name if you prefer), it's not rmApps actually
changed to monitoredapps

bq. public synchronized void unregister, synchronized could be removed?
Done

Other comments are fixed.

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4205:

Attachment: YARN-4205_02.patch

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: nijel
> Attachments: YARN-4205_01.patch, YARN-4205_02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2015-09-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908224#comment-14908224
 ] 

Kuhu Shukla commented on YARN-2024:
---

[~xgong] Are you still working on this JIRA? Thanks!

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Critical
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908250#comment-14908250
 ] 

Sunil G commented on YARN-4140:
---

Hi [~bibinchundatt]
One more minor nit:
bq.private boolean isRequestLabelChanged(ResourceRequest currentRequest, 
ResourceRequest previousRequest)
I think Its better we can change argument names here to {{requestOne}}, 
{{requestTwo}} as we are already using it in both contexts.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, 

[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-09-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908136#comment-14908136
 ] 

Kuhu Shukla commented on YARN-4198:
---

[~curino] May I assign this JIRA to myself in case its not taken.

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-342) RM doesn't retry token renewals

2015-09-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned YARN-342:


Assignee: Kuhu Shukla

> RM doesn't retry token renewals
> ---
>
> Key: YARN-342
> URL: https://issues.apache.org/jira/browse/YARN-342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.6
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
>Priority: Critical
>
> The RM stops trying to renew tokens if any exception occurs during the renew. 
>  This should be changed to abort only if the exception is {{InvalidToken}} to 
> allow resilience to transient network failures, issues associated with 
> aborted connections when the NN is overloaded, cluster upgrades, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908201#comment-14908201
 ] 

Sunil G commented on YARN-4111:
---

Yes [~rohithsharma]. This will be better.
But one thing to remember is, RMAppEvent or RMAppAttemptEvent is used in 
success cases also. Hence diagnostics is not mandatory in all RMApp or 
RMAppAttempt events raised. 
So in places where we want to use diagnostics, it can be passed as needed 
(failure cases like KILL or TIMEOUT). Existing ctor needs to retained in these 
events.

> Killed application diagnostics message should be set rather having static 
> mesage
> 
>
> Key: YARN-4111
> URL: https://issues.apache.org/jira/browse/YARN-4111
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: nijel
> Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, 
> YARN-4111_4.patch
>
>
> Application can be killed either by *user via ClientRMService* OR *from 
> scheduler*. Currently diagnostic message is set statically i.e {{Application 
> killed by user.}} neverthless of application killed by scheduler. This brings 
> the confusion to the user after application is Killed that he did not kill 
> application at all but diagnostic message depicts that 'application is killed 
> by user'.
> It would be useful if the diagnostic message are different for each cause of 
> KILL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3638) Yarn Resource Manager Scheduler page - show percentage of total cluster that each queue is using

2015-09-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned YARN-3638:
-

Assignee: Kuhu Shukla

> Yarn Resource Manager Scheduler page - show percentage of total cluster that 
> each queue is using
> 
>
> Key: YARN-3638
> URL: https://issues.apache.org/jira/browse/YARN-3638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.6.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Assignee: Kuhu Shukla
>Priority: Minor
>
> Request to show % of total cluster resources each queue is currently 
> consuming for jobs on the Yarn Resource Manager Scheduler page.
> Currently the Yarn Resource Manager Scheduler page shows the % of total used 
> for root queue and the % of each given queue's configured capacity that is 
> used (often showing say 150% if the max capacity is greater than configured 
> capacity to allow bursting where there are free resources). This is fine, but 
> it would be good to additionally show the % of total cluster that each given 
> queue is consuming and not just the % of that queue's configured capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-25 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0011-YARN-4140.patch

Hi [~sunilg]

Updating params as per comment

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat 

[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-09-25 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908355#comment-14908355
 ] 

Carlo Curino commented on YARN-4198:


[~kshukla], I am happy to collaborate, but we have a patch in the works with 
[~atumanov] and [~chris.douglas]... We tested at scale... seems to work well... 
we now want to double check it carefully, clean it up and submitted for review. 
However, as this is a very delicate piece, it would be great if you help us go 
over it and analyze it carefully. It is also likely that we missed some further 
opportunities of improvement.  

The general observation is that we are holding a bunch of big locks (e.g., CS) 
to make modifications to data structures that could be protected by much more 
fine grained locks, or made concurrency safe and not lock at all (as the entire 
CS anyway operate on a stale view of the cluster state due to hearbeats etc). 

We will post something soon, and I would really like your help on 
reviewing/extending this.

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908364#comment-14908364
 ] 

Sangjin Lee commented on YARN-4075:
---

Kicking off the build one more time. It might be due to the issue of concurrent 
hadoop builds on the same machine (see [the email thread on 
common-dev|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3CCAEB75Kj68TS2nELmgOkyobDyagzS-Bm0Fij8tb3zJRz8_1MMZQ%40mail.gmail.com%3E]).

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908458#comment-14908458
 ] 

Vrushali C commented on YARN-4075:
--

Thanks @sjlee! The tests passed this time. Committing the patch in (very slow 
network, in progress to get updates/sync etc.) 

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4175) Example of use YARN-1197

2015-09-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4175:
-
Assignee: MENG DING  (was: Wangda Tan)

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-09-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908676#comment-14908676
 ] 

Wangda Tan commented on YARN-4175:
--

Thanks for taking this [~mding], assigned to you!

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-09-25 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908669#comment-14908669
 ] 

MENG DING commented on YARN-4175:
-

Hi, [~leftnoteasy], if you are not working on this right now, I will be happy 
to take this one after YARN-1509 and YARN-1510 is done. I understand the 
urgency of the end-to-end test, and will  treat this as the highest priority.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908424#comment-14908424
 ] 

Sunil G commented on YARN-3964:
---

Hi [~dian.fu]
Couple of points.
1. Is it better to check for {{yarn.node-labels.enabled}} also in 
{{DelegatedNodeLabelsUpdater}} init. Else timer thread will print exception 
every time. 
2. {{createNodeLabelsMappingProvider}} throws exception from {{serviceInit}} if 
NodeLabelsMappingProvider class is not configured. This will disrupt RM. I feel 
you can create a {{SimpleNodeLabelsMappingProvider}} and keep that as default, 
and timer can also be stopped in this case from provider. Thoughts?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1510) Make NMClient support change container resources

2015-09-25 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908436#comment-14908436
 ] 

MENG DING commented on YARN-1510:
-

I forgot to say that the failed tests are not related, as they all passed in my 
local environment.

> Make NMClient support change container resources
> 
>
> Key: YARN-1510
> URL: https://issues.apache.org/jira/browse/YARN-1510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1510-YARN-1197.1.patch, 
> YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch
>
>
> As described in YARN-1197, YARN-1449, we need add API in NMClient to support
> 1) sending request of increase/decrease container resource limits
> 2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1510) Make NMClient support change container resources

2015-09-25 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908456#comment-14908456
 ] 

MENG DING commented on YARN-1510:
-

One thing I want to bring up for discussion is that this ticket adds two 
methods to the public callback interface {{NMClientAsync.CallbackHandler}}. 
This means if user wants to rebuild their ApplicationMaster against the new 
client library, they will have to modify their code to implement these two 
methods. I don't see a way around this unless we don't want to add these two 
methods.

{code}
+/**
+ * The API is called when NodeManager responds to indicate
+ * the container resource has been successfully increased.
+ * @param containerId the Id of the container
+ * @param resource the target resource of the container
+ */
+void onContainerResourceIncreased(
+ContainerId containerId, Resource resource);


+/**
+ * The API is called when an exception is raised in the process of
+ * increasing container resource.
+ * @param containerId the Id of the container
+ * @param t the raised exception
+ */
+void onIncreaseContainerResourceError(
+ContainerId containerId, Throwable t);
{code}

> Make NMClient support change container resources
> 
>
> Key: YARN-1510
> URL: https://issues.apache.org/jira/browse/YARN-1510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1510-YARN-1197.1.patch, 
> YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch
>
>
> As described in YARN-1197, YARN-1449, we need add API in NMClient to support
> 1) sending request of increase/decrease container resource limits
> 2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908503#comment-14908503
 ] 

Li Lu commented on YARN-3942:
-

Ah I see. BTW, seems like the cache loading process in refreshCache is properly 
synchronized. Please do let me know if there are any other problems and thank 
you for your help! 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908548#comment-14908548
 ] 

Vrushali C commented on YARN-4075:
--

Committed patch v05, thanks [~varun_saxena] for the patch and [~gtCarrera] , 
[~sjlee0] and [~jrottinghuis] for the discussions. 


> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908575#comment-14908575
 ] 

Hadoop QA commented on YARN-4205:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 53s | The applied patch generated  1 
new checkstyle issues (total was 238, now 238). |
| {color:red}-1{color} | whitespace |   0m  3s | The patch has 7  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  6s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests | 185m 18s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 234m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | 
hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus
 |
|   | hadoop.yarn.server.resourcemanager.TestResourceManager |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | 
hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifeTimeMonitorService |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestRM |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762396/YARN-4205_02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e65c5 |
| checkstyle |  

[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3942:

Attachment: YARN-3942-leveldb.002.patch

Thanks [~gss2002]! I modified the patch with your changes. As a quick update, 
the current Leveldb cache storage patch addresses the (potential) OOM issue. 
However, now we inevitably have to load all entities as a whole when refreshing 
the cache, which will introduce longer latency. I discussed this with [~hitesh] 
and [~xgong] and seems like the best solution to the latency problem will be 
reducing the granularity of the caches. Instead of loading (and caching) 
entities for a whole app, we need to load and cache a set of entities, like 
those ones in the same Tez DAG. Let's focus on the current fix in this JIRA, 
and open a new JIRA for the next step? Comments and suggestions are more than 
welcome. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-09-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908403#comment-14908403
 ] 

Naganarasimha G R commented on YARN-2729:
-

{{TestNodeStatusUpdaterForLabels}} is passing locally and check style is for 
yarnConfiguration file size so not much can be done for it. [~rohithsharma] can 
you please take a look at this jira ?

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908423#comment-14908423
 ] 

Greg Senia commented on YARN-3942:
--

Noticed this in the timeline server logs with certain hive jobs.. specifically 
at the beginning of the job is it just not caching fast enough?

2015-09-25 14:03:21,381 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,409 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,449 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,476 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,486 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,941 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,963 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,965 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,967 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,410 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,451 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,459 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,468 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,628 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,630 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,632 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,634 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,413 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,454 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,632 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,634 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908361#comment-14908361
 ] 

Hadoop QA commented on YARN-2729:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 53s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 11s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   9m  1s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  58m 48s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762403/YARN-2729.20150925-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e65c5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9267/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9267/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9267/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9267/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9267/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9267/console |


This message was automatically generated.

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, 
> YARN-2729.20150517-1.patch, YARN-2729.20150830-1.patch, 
> YARN-2729.20150925-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908435#comment-14908435
 ] 

Li Lu commented on YARN-3942:
-

H... Seems quite likely, but let me double check it...

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908471#comment-14908471
 ] 

Li Lu commented on YARN-4075:
-

Thanks [~vrushalic]! I'll start to connect the web UI with this end point 
immediately. 

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908440#comment-14908440
 ] 

Hadoop QA commented on YARN-4075:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 36s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   3m  0s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  46m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762170/YARN-4075-YARN-2928.05.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 2e7e0f0 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9268/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9268/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9268/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9268/console |


This message was automatically generated.

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908482#comment-14908482
 ] 

Greg Senia commented on YARN-3942:
--

I see the issue.. Our security folks mandated a umask change and my teammate 
was testing in lab.. umask is now 006

2015-09-25 14:37:29,056 - WARN  
[1438603067@qtp-117005517-38:TimelineDataManager@353] - Skip the timeline 
entity: { id: container_1443203299712_0007_01_11, type: YARN_CONTAINER }
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=yarn, access=READ_EXECUTE, 
inode="/tmp/entity-file-history/active/application_1443203299712_0007":greg:hdfs:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6815)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6797)
at org.apache.hadoop.hdfs.server.nam

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908489#comment-14908489
 ] 

Naganarasimha G R commented on YARN-3964:
-

Hi [~sunilg],
Good points sunil, But i feel we can have different approach
bq. Is it better to check for yarn.node-labels.enabled also in 
DelegatedNodeLabelsUpdater init. 
Instead we can avoid creating DelegatedNodeLabelsUpdater in 
{{createDelegatedNodeLabelsUpdater}} if {{yarn.node-labels.enabled}} is set as 
false.
bq. createNodeLabelsMappingProvider throws exception from serviceInit if 
NodeLabelsMappingProvider class is not configured. 
If the configuration is explicitly set for delegated-centralized then i prefer 
to fail fast if provider is not configured properly, similarly what we plan to 
do in NM side when node labels script path are not configured but provider is 
set to script . Thoughts ?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908790#comment-14908790
 ] 

Greg Senia commented on YARN-3942:
--

I was even able to create the problem with the REST/WS endpoint...  
http://xlabhadnnh2.example.com:8188/ws/v1/applicationhistory/apps/application_1443218824767_0002/appattempts/appattempt_1443218824767_0002_01/containers/container_1443218824767_0002_01_01

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-25 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908798#comment-14908798
 ] 

Bibin A Chundatt commented on YARN-4140:


Failures are not related to his patch

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat 

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908845#comment-14908845
 ] 

Jonathan Eagles commented on YARN-3942:
---

Also, I'm getting an error while starting up the timelineserver while the 
namenode is in safe mode. Can we wrap the mkdirs command in the 
EntityFilleTimelineStore to do a exists check on the directories before doing a 
mkdirs. That way we can bring the services up while the namenode is in safemode 
similar to the other timeline stores.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908765#comment-14908765
 ] 

Greg Senia commented on YARN-3942:
--

Got beyond that and now getting this:

2015-09-25 17:43:32,249 - ERROR [1121457019@qtp-365579627-11:AppBlock@169] - 
Failed to read the AM container of the application attempt 
appattempt_1443215518288_0002_01.
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908855#comment-14908855
 ] 

Li Lu commented on YARN-3942:
-

[~gss2002] the exception stack looks related to security settings? Not sure if 
the call has reached the underlying cache storage. Maybe there's something 
interesting in the ATS log? 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908893#comment-14908893
 ] 

Greg Senia commented on YARN-3942:
--

[~gtCarrera9] that is all that is in the ATS log w/ debug enabled on the root 
logger. I'm thinking of adding some debug lines to print out what is happening 
inside of those methods..

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-09-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4092:
--
Target Version/s: 2.8.0, 2.7.2, 2.6.2  (was: 2.8.0, 2.7.2)

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4092) RM HA UI redirection needs to be fixed when both RMs are in standby mode

2015-09-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4092:
--
Fix Version/s: 2.6.2

> RM HA UI redirection needs to be fixed when both RMs are in standby mode
> 
>
> Key: YARN-4092
> URL: https://issues.apache.org/jira/browse/YARN-4092
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-4092-branch-2.6.patch, YARN-4092.1.patch, 
> YARN-4092.2.patch, YARN-4092.3.patch, YARN-4092.4.patch
>
>
> In RM HA Environment, If both RM acts as Standby RM, The RM UI will not be 
> accessible. It will keep redirecting between both RMs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-09-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908935#comment-14908935
 ] 

Wangda Tan commented on YARN-4091:
--

[~sunilg], sure :), cannot wait to see it!

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908992#comment-14908992
 ] 

Sunil G commented on YARN-3964:
---

Hi [~Naganarasimha]
Yes, We could not even create provider in case of point 1. Much better.

For point2, I am still not very convinced. Fail-fast may be too  aggressive 
right?. We could think in both way also here. As we are giving the user the 
flexibility of providing a provider, implementation can be anything. So keeping 
a dummy one if no config is present and  give a warning is also an option. I 
had this thought only bcz we are making RM fail-fast which I am doubting 
whether its too aggressive or not.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3964:
--
Attachment: YARN-3964.009.patch

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909028#comment-14909028
 ] 

Dian Fu commented on YARN-3964:
---

Hi [~Naganarasimha], [~sunilg],
Thanks a lot for your review. Updated the patch according to your comments with 
the following exceptions:
{quote}
If updateInterval is disabled then only on register of nodes the labels are 
fetched, is that fine or is it better to fetch once and not run timer ?
{quote}
You mean not only fetching the labels on register, but also fetching the labels 
once for all the nodes when all nodes are registered? If it's so, I think 
this's unnecessary and difficult to implement. Firstly, the labels of all the 
registered nodes have been fetched once on register. Secondly, it's difficult 
to determine when to fetch the labels for all the nodes as some nodes may still 
not registered. Thoughts?
{quote}
createNodeLabelsMappingProvider throws exception from serviceInit if 
NodeLabelsMappingProvider class is not configured.
{quote}
I would agree with [~Naganarasimha] on this point. By default, {{centralized}} 
node label configuration type is enabled. If user explicitly set the node label 
configuration type to {{delegated-centralized}}, we can assume that he knows 
what does it mean for {{delegated-centralized}} and he knows he should provide 
a {{NodeLabelsMappingProvider}}. On the other hand, providing a 
{{SimpleNodeLabelsMappingProvider}} and make it as default would also be 
acceptable for me. If we decide to do this, then we should log the following 
with INFO to let user know which {{NodeLabelsMappingProvider}} is being used. 
{noformat}
{{LOG.debug("Node labels mapping provider class is : "
157   + nodeLabelsMappingProvider.getClass().toString());}} 
{noformat}Thoughts?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909029#comment-14909029
 ] 

Naganarasimha G R commented on YARN-3964:
-

Hi [~sunilg],
Point2 is debatable, my 2 cents here is, if {{delegated-centralized}} was the 
default i can agree, and if we have a default implementation which serves some 
purpose like {{MemoryRMStateStore}} then fine, if not i am not too convinced to 
create a dummy provider instance when user is explicitly setting  
{{delegated-centralized}}, cc/ [~leftnoteasy] ur thoughts ?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.001.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909031#comment-14909031
 ] 

Naganarasimha G R commented on YARN-3964:
-

Hi [~dian.fu],
bq. Secondly, it's difficult to determine when to fetch the labels for all the 
nodes as some nodes may still not registered. Thoughts?
makes sense, missed the point that the provider provides labels for the 
requested nodes only !.
Also have a feeling that default interval time can be as big as 30 mins as we 
are updating the labels for all nodes without checking if its modified or not. 
Thoughts ?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909043#comment-14909043
 ] 

zhihai xu commented on YARN-3943:
-

I attached a new patch YARN-3943.001.patch for review. The new patch will keep 
backward compatibility by using the old configuration 
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" 
as high water mark threshold for disk-full detection and creating a new 
configuration 
"yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage"
 as low water mark threshold for disk-not-full detection. It also makes both 
configurations use same default value and if low water mark threshold is more 
than high water mark threshold, it will be set to the same value as high water 
mark threshold.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch, YARN-3943.001.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907806#comment-14907806
 ] 

Dian Fu commented on YARN-3964:
---

Hi [~Naganarasimha],
Thanks a lot for your review. Have updated the patch according to your comments.
{quote}
CommonNodeLabelsManager.isDistributedNodeLabelConfiguration is set in 
distributed mode and we avoid reading and storing to FileSystemNodeLabelsStore. 
But in your case its not done, hence LabelStore Editlogs goes on increasing 
over time as we not checking for weather the labels are not modified for a 
given node while replacing. Eventually FileSystemNodeLabelsStore.recover during 
startup(/failover) might be come slow.
{quote}
In fact, in delegated-centralized mode, we should also avoid storing the 
node->labels map to FileSystemNodeLabelsStore. The new patch should have fixed 
this.
{quote}
During update even if one node has Labels which is not part of CLuster Labels 
it fails to update for other nodes is that fine ?
{quote}
This issue also exists when updating multiple nodes labels through CLI/REST. 
Maybe we can address this in a separate JIRA if we think this is a problem.


> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907787#comment-14907787
 ] 

Varun Saxena commented on YARN-4075:


Test failures are weird. They seem to be triggered by 
java.lang.NoClassDefFoundError: com/lmax/disruptor/WaitStrategy
Class not found should not be happening.
Passing in local.

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3964:
--
Attachment: YARN-3964.008.patch

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3679) Add documentation for timeline server filter ordering

2015-09-25 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907987#comment-14907987
 ] 

Mit Desai commented on YARN-3679:
-

[~xgong], thanks for looking.
YARN-3624 only changes the timeline server. But that includes the 
authentication part as well. Filter initialization is done during the 
authentication. So I have made the change in HttpAuthentication.

> Add documentation for timeline server filter ordering
> -
>
> Key: YARN-3679
> URL: https://issues.apache.org/jira/browse/YARN-3679
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-3679.patch
>
>
> Currently the auth filter is before static user filter by default. After 
> YARN-3624, the filter order is no longer reversed. So the pseudo auth's 
> allowing anonymous config is useless with both filters loaded in the new 
> order, because static user will be created before presenting it to auth 
> filter. The user can remove static user filter from the config to get 
> anonymous user work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907901#comment-14907901
 ] 

Sunil G commented on YARN-4141:
---

Test case failures are unrelated.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch, 
> 0003-YARN-4141.patch, 0004-YARN-4141.patch, 0005-YARN-4141.patch, 
> 0006-YARN-4141.patch, 0007-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2973) Capacity scheduler configuration ACLs not work.

2015-09-25 Thread apachehadoop (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908046#comment-14908046
 ] 

apachehadoop commented on YARN-2973:


I also met the same problem, can you tell me how do you solve this problem?

> Capacity scheduler configuration ACLs not work.
> ---
>
> Key: YARN-2973
> URL: https://issues.apache.org/jira/browse/YARN-2973
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: ubuntu 12.04, cloudera manager, cdh5.2.1
>Reporter: Jimmy Song
>Assignee: Rohith Sharma K S
>  Labels: acl, capacity-scheduler, yarn
>
> I follow this page to configure yarn: 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
>  
> I configured YARN to use capacity scheduler in yarn-site.xml with 
> yarn.resourcemanager.scheduler.class for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
>  Then modified capacity-scheduler.xml,
> ___
> 
> 
>   
> yarn.scheduler.capacity.root.queues
> default,extract,report,tool
>   
>   
> yarn.scheduler.capacity.root.state
> RUNNING
>   
>   
> yarn.scheduler.capacity.root.default.acl_submit_applications
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.acl_administer_queue
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.extract.acl_submit_applications
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.acl_administer_queue
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.capacity
> 15
>   
>   
> yarn.scheduler.capacity.root.report.acl_submit_applications
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.acl_administer_queue
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.tool.acl_submit_applications
>  
>   
>   
> yarn.scheduler.capacity.root.tool.acl_administer_queue
>  
>   
>   
> yarn.scheduler.capacity.root.tool.capacity
> 15
>   
> 
> ___
> I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
> applications to every queue. The queue acl does't work! And the queue used 
> capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-09-25 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908048#comment-14908048
 ] 

Mit Desai commented on YARN-4183:
-

Here is the scenario. We want the Yarn application to not use the timeline 
server during execution but use the application history server for the logs. 
This will not be possible with the current implementation. It is either both or 
none.

If we check for application history enabled, it indirectly tells that timeline 
service is enable. Because history server will not be enabled without enabling 
the timeline server. This way, the system metrics publisher can publish events 
to the history server even if the applications do not use the timeline server 
for execution.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2973) Capacity scheduler configuration ACLs not work.

2015-09-25 Thread apachehadoop (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

apachehadoop updated YARN-2973:
---
Environment: ubuntu 12.04, cloudera manager, cdh5.3.2  (was: ubuntu 12.04, 
cloudera manager, cdh5.2.1)

> Capacity scheduler configuration ACLs not work.
> ---
>
> Key: YARN-2973
> URL: https://issues.apache.org/jira/browse/YARN-2973
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: ubuntu 12.04, cloudera manager, cdh5.3.2
>Reporter: Jimmy Song
>Assignee: Rohith Sharma K S
>  Labels: acl, capacity-scheduler, yarn
>
> I follow this page to configure yarn: 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
>  
> I configured YARN to use capacity scheduler in yarn-site.xml with 
> yarn.resourcemanager.scheduler.class for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
>  Then modified capacity-scheduler.xml,
> ___
> 
> 
>   
> yarn.scheduler.capacity.root.queues
> default,extract,report,tool
>   
>   
> yarn.scheduler.capacity.root.state
> RUNNING
>   
>   
> yarn.scheduler.capacity.root.default.acl_submit_applications
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.acl_administer_queue
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.extract.acl_submit_applications
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.acl_administer_queue
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.capacity
> 15
>   
>   
> yarn.scheduler.capacity.root.report.acl_submit_applications
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.acl_administer_queue
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.tool.acl_submit_applications
>  
>   
>   
> yarn.scheduler.capacity.root.tool.acl_administer_queue
>  
>   
>   
> yarn.scheduler.capacity.root.tool.capacity
> 15
>   
> 
> ___
> I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
> applications to every queue. The queue acl does't work! And the queue used 
> capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907892#comment-14907892
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 57s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  55m 44s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 105m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762352/YARN-3964.008.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e65c5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9264/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9264/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9264/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9264/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9264/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9264/console |


This message was automatically generated.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-09-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908065#comment-14908065
 ] 

Sunil G commented on YARN-4091:
---

No problem. Thank you [~leftnoteasy] for sharing the thoughts.
bq.I think we can only store the next allocation data once request received,
Perfect. This will also help in putting less load on memory.

As we are almost fine with the approach, I feel I could come up with a 
prototype here. Is it ok or good enough to start with?

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)