[jira] [Updated] (YARN-3999) RM hangs on draining events

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3999:
--
Attachment: YARN-3999-branch-2.6.1.txt

Attaching patch that I committed to 2.6.1.

> RM hangs on draining events
> ---
>
> Key: YARN-3999
> URL: https://issues.apache.org/jira/browse/YARN-3999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.2
>
> Attachments: YARN-3999-branch-2.6.1.txt, YARN-3999-branch-2.7.patch, 
> YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, 
> YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch
>
>
> If external systems like ATS, or ZK becomes very slow, draining all the 
> events take a lot of time. If this time becomes larger than 10 mins, all 
> applications will expire. Fixes include:
> 1. add a timeout and stop the dispatcher even if not all events are drained.
> 2. Move ATS service out from RM active service so that RM doesn't need to 
> wait for ATS to flush the events when transitioning to standby.
> 3. Stop client-facing services (ClientRMService etc.) first so that clients 
> get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4047:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. The patch applies cleanly. Ran compilation before the 
push.

> ClientRMService getApplications has high scheduler lock contention
> --
>
> Key: YARN-4047
> URL: https://issues.apache.org/jira/browse/YARN-4047
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.2
>
> Attachments: YARN-4047.001.patch
>
>
> The getApplications call can be particuarly expensive because the code can 
> call checkAccess on every application being tracked by the RM.  checkAccess 
> will often call scheduler.checkAccess which will grab the big scheduler lock. 
>  This can cause a lot of contention with the scheduler thread which is busy 
> trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draining events

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3999:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1.

Had to fix a couple of minor merge conflicts. Dropped changes to 
TestAsyncDispatcher.java and TestRMAppLogAggregationStatus.java which don't 
exist in 2.6.1.

Ran compilation and TestAppManager, TestResourceManager, TestRMAppTransitions, 
TestRMAppAttemptTransitions, TestUtils, TestFifoScheduler before the push.

> RM hangs on draining events
> ---
>
> Key: YARN-3999
> URL: https://issues.apache.org/jira/browse/YARN-3999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.2
>
> Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
> YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
> YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch
>
>
> If external systems like ATS, or ZK becomes very slow, draining all the 
> events take a lot of time. If this time becomes larger than 10 mins, all 
> applications will expire. Fixes include:
> 1. add a timeout and stop the dispatcher even if not all events are drained.
> 2. Move ATS service out from RM active service so that RM doesn't need to 
> wait for ATS to flush the events when transitioning to standby.
> 3. Stop client-facing services (ClientRMService etc.) first so that clients 
> get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3978) Configurably turn off the saving of container info in Generic AHS

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3978:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. The patch applies cleanly for the most part except for 
a couple of minor merge conflicts in test-cases which I fixed.

Ran compilation and TestClientRMService, TestRMContainerImpl, 
TestChildQueueOrder, TestLeafQueue, TestReservations, TestFifoScheduler before 
the push.

> Configurably turn off the saving of container info in Generic AHS
> -
>
> Key: YARN-3978
> URL: https://issues.apache.org/jira/browse/YARN-3978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver, yarn
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>  Labels: 2.6.1-candidate
> Fix For: 3.0.0, 2.6.1, 2.8.0, 2.7.2
>
> Attachments: YARN-3978.001.patch, YARN-3978.002.patch, 
> YARN-3978.003.patch, YARN-3978.004.patch
>
>
> Depending on how each application's metadata is stored, one week's worth of 
> data stored in the Generic Application History Server's database can grow to 
> be almost a terabyte of local disk space. In order to alleviate this, I 
> suggest that there is a need for a configuration option to turn off saving of 
> non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread forrestchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

forrestchen updated YARN-4022:
--
Attachment: YARN-4022.003.patch

Fix checkstyle

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch, 
> YARN-4022.003.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2301) Improve yarn container command

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2301:
--
Attachment: YARN-2301-branch-2.6.1.txt

Attaching patch that I committed to 2.6.1.

> Improve yarn container command
> --
>
> Key: YARN-2301
> URL: https://issues.apache.org/jira/browse/YARN-2301
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Naganarasimha G R
>  Labels: 2.6.1-candidate, usability
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2301-branch-2.6.1.txt, YARN-2301.01.patch, 
> YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, 
> YARN-2301.20141204-1.patch, YARN-2303.patch
>
>
> While running yarn container -list  command, some 
> observations:
> 1) the scheme (e.g. http/https  ) before LOG-URL is missing
> 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
> print as time format.
> 3) finish-time is 0 if container is not yet finished. May be "N/A"
> 4) May have an option to run as yarn container -list  OR  yarn 
> application -list-containers  also.  
> As attempt Id is not shown on console, this is easier for user to just copy 
> the appId and run it, may  also be useful for container-preserving AM 
> restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2301) Improve yarn container command

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2301:
--
   Labels: 2.6.1-candidate usability  (was: usability)
Fix Version/s: 2.6.1

Pulled this into 2.6.1 as a dependency for YARN-3978. The patch applied 
cleanly, had to make minor change to the TestYarnCLI to make it work correctly 
on 2.6.1.

Ran compilation and TestYarnCLI, TestClientRMService, TestRMContainerImpl 
before the push.

> Improve yarn container command
> --
>
> Key: YARN-2301
> URL: https://issues.apache.org/jira/browse/YARN-2301
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Naganarasimha G R
>  Labels: 2.6.1-candidate, usability
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2301.01.patch, YARN-2301.03.patch, 
> YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, 
> YARN-2301.20141204-1.patch, YARN-2303.patch
>
>
> While running yarn container -list  command, some 
> observations:
> 1) the scheme (e.g. http/https  ) before LOG-URL is missing
> 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
> print as time format.
> 3) finish-time is 0 if container is not yet finished. May be "N/A"
> 4) May have an option to run as yarn container -list  OR  yarn 
> application -list-containers  also.  
> As attempt Id is not shown on console, this is easier for user to just copy 
> the appId and run it, may  also be useful for container-preserving AM 
> restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736193#comment-14736193
 ] 

Hadoop QA commented on YARN-4126:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  23m 18s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |  57m  0s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 125m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.security.token.delegation.web.TestWebDelegationToken |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754796/0004-YARN-4126.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a153b96 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9050/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9050/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9050/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9050/console |


This message was automatically generated.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736177#comment-14736177
 ] 

Hadoop QA commented on YARN-4133:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 34s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754810/YARN-4133.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a153b96 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9052/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9052/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9052/console |


This message was automatically generated.

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
>

[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-08 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.POC.004.patch

The v.4 POC patch posted.

- added the XmlElement notation for flow runs in the flow activity entity
- rebased against the v.5 patch for YARN-3901
- added more unit tests
- made sure the id's are set correctly on flow run entities and flow activity 
entities

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, 
> YARN-4074-YARN-2928.POC.004.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736158#comment-14736158
 ] 

Bibin A Chundatt commented on YARN-4106:


Findbug report not showing anything looks like build report problem

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736140#comment-14736140
 ] 

Sangjin Lee commented on YARN-3901:
---

Somehow the jenkins info didn't make it to the JIRA: 
https://builds.apache.org/job/PreCommit-YARN-Build/9044/

-1 overall

| Vote |   Subsystem |  Runtime   | Comment
|  -1  |  pre-patch  |  15m 55s   | Findbugs (version ) appears to be 
|  | || broken on YARN-2928.
|  +1  |@author  |  0m 1s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 2 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 6s | There were no new javac warning 
|  | || messages.
|  +1  |javadoc  |  10m 12s   | There were no new javadoc warning 
|  | || messages.
|  +1  |  release audit  |  0m 23s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  0m 16s| There were no new checkstyle 
|  | || issues.
|  -1  | whitespace  |  0m 31s| The patch has 7 line(s) that end in 
|  | || whitespace. Use git apply
|  | || --whitespace=fix.
|  +1  |install  |  1m 34s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 40s| The patch built with 
|  | || eclipse:eclipse.
|  -1  |   findbugs  |  0m 56s| The patch appears to introduce 7 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  +1  | yarn tests  |  1m 54s| Tests passed in 
|  | || hadoop-yarn-server-timelineservice.
|  | |  40m 33s   | 


Reason | Tests
 FindBugs  |  module:hadoop-yarn-server-timelineservice 


|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754731/YARN-3901-YARN-2928.5.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / e6afe26 |
| whitespace | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-server-timelineservice test log | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9044/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |



> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all th

[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736141#comment-14736141
 ] 

Hadoop QA commented on YARN-4131:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 54s | The applied patch generated  4 
new checkstyle issues (total was 32, now 36). |
| {color:red}-1{color} | whitespace |   0m 13s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests | 118m 56s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 55s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  4s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 36s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  54m 15s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 243m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapred.TestMRIntermediateDataEncryption |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.client.cli.TestYarnCLI |
| Timed out tests | 
org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754772/YARN-4131-v1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9047/console |


This message was automatically generated.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736130#comment-14736130
 ] 

Xianyin Xin commented on YARN-4120:
---

Create YARN-4134 to track it.

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2015-09-08 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-4134:
-

 Summary: FairScheduler preemption stops at queue level that all 
child queues are not over their fairshare
 Key: YARN-4134
 URL: https://issues.apache.org/jira/browse/YARN-4134
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Xianyin Xin


Now FairScheudler uses a choose-a-candidate method to select a container from 
leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
{code}
readLock.lock();
try {
  for (FSQueue queue : childQueues) {
if (candidateQueue == null ||
comparator.compare(queue, candidateQueue) > 0) {
  candidateQueue = queue;
}
  }
} finally {
  readLock.unlock();
}

// Let the selected queue choose which of its container to preempt
if (candidateQueue != null) {
  toBePreempted = candidateQueue.preemptContainer();
}
{code}
a candidate child queue is selected. However, if the queue's usage isn't over 
it's fairshare, preemption will not happen:
{code}
if (!preemptContainerPreCheck()) {
  return toBePreempted;
}
{code}
 A scenario:
{code}
root
   /\
  queue1   queue2
  /\
queue1.3, (  queue1.4  )
{code}
suppose there're 8 containers, and queues at any level have the same weight. 
queue1.3 takes 4 and queue2 takes 4, so both queue1 and queue2 are at their 
fairshare. Now we submit an app in queue1.4 with 4 containers needs, it should 
preempt 2 from queue1.3, but the candidate-containers selection procedure will 
stop at level that all of the child queues are not over their fairshare, and 
none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736112#comment-14736112
 ] 

Hadoop QA commented on YARN-4106:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 15s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 36s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 34s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754800/0006-YARN-4106.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a153b96 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9051/console |


This message was automatically generated.

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736100#comment-14736100
 ] 

Karthik Kambatla commented on YARN-4120:


That is also a valid concern. Can we track it in a separate JIRA? 

The preemption logic definitely needs revisiting. YARN-2154 is a starting 
point. [~asuresh] and I have been considering significant logic changes to 
better accommodate both preemption and future features like node-labeling, but 
haven't found the time to write it up and post here. 



> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Attachment: YARN-4133.000.patch

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Attachment: (was: YARN-4133.000.patch)

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736068#comment-14736068
 ] 

Xianyin Xin commented on YARN-4090:
---

Hi [~leftnoteasy], [~kasha], would you please take a look? since this change 
has relation with preemption, so link it with YARN-4120.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736052#comment-14736052
 ] 

Hadoop QA commented on YARN-4133:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 41s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 10s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m  6s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754780/YARN-4133.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9049/console |


This message was automatically generated.

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptio

[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736025#comment-14736025
 ] 

Bibin A Chundatt commented on YARN-4106:


Hi [~leftnoteasy]
Thnks for comments. Updates patch uploaded

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4106:
---
Attachment: 0006-YARN-4106.patch

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, 
> 0006-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0004-YARN-4126.patch

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735987#comment-14735987
 ] 

Xianyin Xin commented on YARN-4133:
---

Of course we can also address these problems one by one in different jiras. If 
you like this, just kindly ignore the above comment.

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Description: 
Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}. We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}
Also once the containers in {{warnedContainers}} are wrongly removed, it will 
never be preempted. Because these containers are already in 
{{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
return the containers in {{FSAppAttempt#preemptionMap}}.
{code}
  public RMContainer preemptContainer() {
if (LOG.isDebugEnabled()) {
  LOG.debug("App " + getName() + " is going to preempt a running " +
  "container");
}

RMContainer toBePreempted = null;
for (RMContainer container : getLiveContainers()) {
  if (!getPreemptionContainers().contains(container) &&
  (toBePreempted == null ||
  comparator.compare(toBePreempted, container) > 0)) {
toBePreempted = container;
  }
}
return toBePreempted;
  }
{code}

  was:
Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}, We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}


> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKill

[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735959#comment-14735959
 ] 

Xianyin Xin commented on YARN-4133:
---

Hi [~zxu], it seems the current preemption logic has many problems. I just 
updated one in 
[https://issues.apache.org/jira/browse/YARN-4120?focusedCommentId=14735952&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735952].
 I think a logic refactor is need, what do you think?

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}, We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2015-09-08 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735952#comment-14735952
 ] 

Xianyin Xin commented on YARN-4120:
---

Hi [~kasha], there's another issue in the current preemption logic, it's in 
{{FSParentQueue.java}} and {{FSLeafQueue.java}},
{code}
  public RMContainer preemptContainer() {
RMContainer toBePreempted = null;

// Find the childQueue which is most over fair share
FSQueue candidateQueue = null;
Comparator comparator = policy.getComparator();

readLock.lock();
try {
  for (FSQueue queue : childQueues) {
if (candidateQueue == null ||
comparator.compare(queue, candidateQueue) > 0) {
  candidateQueue = queue;
}
  }
} finally {
  readLock.unlock();
}

// Let the selected queue choose which of its container to preempt
if (candidateQueue != null) {
  toBePreempted = candidateQueue.preemptContainer();
}
return toBePreempted;
  }
{code}
{code}
  public RMContainer preemptContainer() {
RMContainer toBePreempted = null;

// If this queue is not over its fair share, reject
if (!preemptContainerPreCheck()) {
  return toBePreempted;
}
{code}
If the queue's hierarchy like that in the *Description*, suppose queue1 and 
queue2 have the same weight, and the cluster has 8 containers, 4 occupied by 
queue1.1 and 4 occupied by queue2. If new app was added in queue1.2, 2 
containers should be preempted from queue1.1. However, according the above 
code, queue1 and queue2 are both at their fairshare, so the preemption will not 
happen.

So if all of the childqueues at any level are at their fairshare, preemption 
will not happen even though there is/are resource deficit in some leafqueues.

I think we have to drop this logic in this case. As a candidate, we can 
calculates an ideal preemption distribution by traversing the queues. Any 
thoughts?

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4133:

Attachment: YARN-4133.000.patch

> Containers to be preempted leaks in FairScheduler preemption logic.
> ---
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leaks in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}, We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735947#comment-14735947
 ] 

Hadoop QA commented on YARN-4086:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 55s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| | |  51m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754773/YARN-4086.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9048/console |


This message was automatically generated.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch, YARN-4086.002.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735900#comment-14735900
 ] 

Joep Rottinghuis commented on YARN-3901:


The one remaining issue we have to tackle is when there are two app attempts. 
The previous app attempt ends up buffering some writes, and the new app attempt 
ends up writing a final_value.
Now if the flush happens before the first attempt its write comes in, we no 
longer have the unaggregated value for that app_id in order to discard against 
(the timestamp should have taken care of this order).
We can deal with this issue in three ways:
1) Ignore (risky and very hard to debug if it ever happens)
2) Keep the final value around until it has aged a certain time. Upside is that 
the value is initially kept (for for example 1-2 days?) and then later 
discarded. Downside is that we won't collapse values as quickly on flush as we 
can. The collapse would probably happen when a compaction happens, possibly 
only when a major compaction happens. But previous unaggregated values may have 
been written to disk anyway, so not sure how much of an issue this really is.
3) keep a list of the last x app_ids (aggregation compaction dimension values) 
on the aggregated flow-level data. What we would then do in the aggregator is 
to go through all the values as we currently do. We'd collapse all the values 
to keep only the latest per flow. Before we sum an item for the flow, we'd 
compare if the app_id was in the list of most recent x (10) apps that were 
completed and collapsed. 
Pro is that with a lower app completion rate in a flow, we'd be guarded against 
stale writes for longer than a fixed time period. We'd still limit the size of 
extra storage in tags to a list of x (10?) items.
Downside is that if apps complete in very rapid succession, we would 
potentially be protected from stale writes from an app for a shorter period of 
time. Given that there is a correlation between an app completion and its 
previous run, this may not be a huge factor. It's not like random previous app 
attempts are launched. This is really to cover the case when a new app attempt 
is launched, but the previous writer had some buffered writes that somehow 
still got through.

I'm sort of tempted towards 2, since that is the most similar to the existing 
TTL functionality, and probably the easiest to code and understand. Simply 
compact only after a certain time period has passed.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#633

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735887#comment-14735887
 ] 

Joep Rottinghuis commented on YARN-3901:


Thanks [~vrushalic]. I'm going to dig through the details on the latest patch.
Separately [~sjlee0] and I further discussed the challenges of taking the 
timestamp on the coprocessor, buffering writes, app restarts, timestamp 
collisions and ordering of various writes that come on.

1) Given that we have timestamps in # millis, then multiplying by 1,000 should 
suffice. It is unlikely that we'd have > 1M writes for one column in one region 
server for one flow. If we multiply by 1M we get close to the total date range 
that can fit in a long (still years to come, but still).

2) If we do any shifting of time, we should do the same everywhere to keep 
things consistent, and to keep the ability to ask what a particular row 
(roughly) looked like at any particular time (like last night midnight, what 
was the state of this entire row).

3) We think in the column helper, if the ATS client supplies a timestamp, we 
should multiply by 1,000. If we read any timestamp from HBase, we'll divide by 
1,000.

4) If the ATS client doesn't supply the timestamp, we'll grab the timestamp in 
the ats writer the moment the write arrives (and before it is batched / 
buffered in the buffered mutator, HBase client, or RS queue). We then take this 
time and multiply by 1,000. Reads again divide by 1,000 to get back to millis 
in epoch as before.

5) For Agg operation SUM, MIN, and MAX we take the least significant 3 digits 
of the app_id and add this to the (timestamp*1000), so that we create a unique 
timestamp per app in an active flow-run. This should avoid any collisions.
This takes care of uniqueness (no collisions on a single ms), but also solves 
for older instances of a writer (in case of a second AM attempt for example) or 
any other kind of ordering issue. The write are timestamped when they arrive at 
the writer.

6) If some piece of client code doesn't set any timestamp (this should be an 
error) then we cannot effectively order the writes as per the previous point. 
We still need to ensure that we don't have collisions. If the client supplied 
timestamp if LONG.Maxvalue, then we can generate the timestamp in the 
coprocessor on the servers side, modulo the counter to ensure uniqueness. We 
should still multiply by 1K to make the same amount of space for the unique 
counter.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian J

[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-09-08 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4086:

Attachment: YARN-4086.002.patch

The 002 patch makes that test less brittle.  I also fixed the RAT and 
checkstyle warnings.  The test failure was because test-patch couldn't handle 
the binary part of the patch.

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch, YARN-4086.002.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4131:
-
Attachment: YARN-4131-v1.patch

Update patch with following updates:
1. Add ContainerKilledType in KillContainerRequest to indicate container will 
be killed as preempted or expired (failed).
2. Add async call in YarnClient per Steve's above comments
3. Add more unit tests with fixing build failures.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, 
> YARN-4131-v1.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draining events

2015-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3999:
--
Labels: 2.6.1-candidate  (was: )

Adding to 2.6.1 from Jian's comment in the mailing list that I missed before.

> RM hangs on draining events
> ---
>
> Key: YARN-3999
> URL: https://issues.apache.org/jira/browse/YARN-3999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>  Labels: 2.6.1-candidate
> Fix For: 2.7.2
>
> Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
> YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
> YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch
>
>
> If external systems like ATS, or ZK becomes very slow, draining all the 
> events take a lot of time. If this time becomes larger than 10 mins, all 
> applications will expire. Fixes include:
> 1. add a timeout and stop the dispatcher even if not all events are drained.
> 2. Move ATS service out from RM active service so that RM doesn't need to 
> wait for ATS to flush the events when transitioning to standby.
> 3. Stop client-facing services (ClientRMService etc.) first so that clients 
> get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.

2015-09-08 Thread zhihai xu (JIRA)
zhihai xu created YARN-4133:
---

 Summary: Containers to be preempted leaks in FairScheduler 
preemption logic.
 Key: YARN-4133
 URL: https://issues.apache.org/jira/browse/YARN-4133
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu


Containers to be preempted leaks in FairScheduler preemption logic. It may 
cause missing preemption due to containers in {{warnedContainers}} wrongly 
removed. The problem is in {{preemptResources}}:
There are two issues which can cause containers  wrongly removed from 
{{warnedContainers}}:
Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
condition check:
{code}
(container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED)
{code}
Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
shouldn't remove container from {{warnedContainers}}, We should only remove 
container from {{warnedContainers}}, if container is not in state 
{{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
{{RMContainerState.ACQUIRED}}.
{code}
  if ((container.getState() == RMContainerState.RUNNING ||
  container.getState() == RMContainerState.ALLOCATED) &&
  isResourceGreaterThanNone(toPreempt)) {
warnOrKillContainer(container);
Resources.subtractFrom(toPreempt, 
container.getContainer().getResource());
  } else {
warnedIter.remove();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735859#comment-14735859
 ] 

Hadoop QA commented on YARN-1651:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  2s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 20 new or modified test files. |
| {color:red}-1{color} | javac |   8m 10s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 55s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |  31m  2s | The patch has 163  line(s) 
that end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 29s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 26s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | tools/hadoop tests |   0m 53s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  59m 24s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 154m 43s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754736/YARN-1651-4.YARN-1197.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / f86eae1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/diffJavacWarnings.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9045/console |


This message was automatically generated.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735837#comment-14735837
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #345 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/345/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* hadoop-yarn-project/CHANGES.txt


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4126:
--
Comment: was deleted

(was: yes, oozie has fixed its own. This is just YARN side fix.)

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735791#comment-14735791
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2284 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2284/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735786#comment-14735786
 ] 

Jian He commented on YARN-4126:
---

yes, oozie has fixed its own. This is just YARN side fix.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735787#comment-14735787
 ] 

Jian He commented on YARN-4126:
---

yes, oozie has fixed its own. This is just YARN side fix.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735766#comment-14735766
 ] 

Hadoop QA commented on YARN-2410:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 59s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 19s | Tests passed in 
hadoop-mapreduce-client-shuffle. |
| | |  37m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754746/YARN-2410-v7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| hadoop-mapreduce-client-shuffle test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/artifact/patchprocess/testrun_hadoop-mapreduce-client-shuffle.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9046/console |


This message was automatically generated.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, 
> YARN-2410-v6.patch, YARN-2410-v7.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3985:

Component/s: (was: fairscheduler)
 (was: capacityscheduler)

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735714#comment-14735714
 ] 

Hadoop QA commented on YARN-4126:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m  2s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |  53m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 120m 41s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754713/0003-YARN-4126.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 16b9037 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9042/console |


This message was automatically generated.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735704#comment-14735704
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2307 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2307/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-2410:
--
Attachment: YARN-2410-v7.patch

Modified ShuffleHandler to not use channel attachments. Moved MockNetty code to 
a helper method.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, 
> YARN-2410-v6.patch, YARN-2410-v7.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735683#comment-14735683
 ] 

Chang Li commented on YARN-4132:


[~jlowe] please help review the latest patch. Thanks!

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735677#comment-14735677
 ] 

Hadoop QA commented on YARN-4132:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 52s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 55s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  56m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754730/YARN-4132.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d9c1fab |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9043/console |


This message was automatically generated.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735676#comment-14735676
 ] 

MENG DING commented on YARN-1651:
-

Hi, [~leftnoteasy]
bq. I agree the general idea, and we should do the similar thing. However, I'm 
not sure caching in RM is a good idea, potentially a malicious AM can send 
millions of unknown-to-be-decreased-containers to RM when RM started. Maybe 
it's better to cache in AMRMClient side. I think we can do this in a separated 
JIRA? Could you file a new JIRA for this if you agree?

Your proposal makes sense. I will file a JIRA for this.

Thanks for addressing my comments. I don't have more comments for now. As per 
our discussion, I will come up with an end-to-end test based on 
distributedshell, and post onto this JIRA for review.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735672#comment-14735672
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #357 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/357/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1651:
-
Attachment: YARN-1651-4.YARN-1197.patch

Thanks comments! [~mding].

bq. I only mention this because pullNewlyAllocatedContainers() has a check for 
null for the same logic, so I think we may want to make it consistent?
Yes you're correct, updated code, thanks.

bq. So, based on my understanding, if an application has reserved some resource 
for a container resource increase request on a node, that amount of resource 
should never be unreserved in order for the application to allocate a regular 
container on some other node. But that doesn't seem to be the case right now? 
Can you confirm?
Done, now added check to {{getNodeIdToUnreserve}}, will check if a container is 
a increase reservation before cancel it.

bq. I think it will be desirable to implement a pendingDecrease set in 
SchedulerApplicationAttempt, and corresponding logic, just like 
SchedulerApplicationAttempt.pendingRelease. This is to guard against the 
situation when decrease requests are received while RM is in the middle of 
recovery, and has not received all container statuses from NM yet.
I agree the general idea, and we should do the similar thing. However, I'm not 
sure caching in RM is a good idea, potentially a malicious AM can send millions 
of unknown-to-be-decreased-containers to RM when RM started. Maybe it's better 
to cache in AMRMClient side. I think we can do this in a separated JIRA? Could 
you file a new JIRA for this if you agree?

bq. Some nits...
Addressed.

Uploaded ver.4 patch.

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, 
> YARN-1651-4.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3901:
-
Attachment: YARN-3901-YARN-2928.5.patch


Uploading patch v5 that incorporates Sangjin's review suggestions. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.2.patch

fixed broken test in TestYarnConfigurationFields. The other broken tests are 
not related to my changes(seem to be caused by network problem on testing 
platform). Those tests all pass on my .2 patch on my local machine.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.2.patch, YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735545#comment-14735545
 ] 

Sangjin Lee commented on YARN-4074:
---

It'd be great if you could take a look at the latest patch and let me know your 
feedback. Thanks!

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735541#comment-14735541
 ] 

Sangjin Lee commented on YARN-4075:
---

Sorry [~varun_saxena], it took me a while to review this. The patch looks good 
for the most part.

FYI, I incorporated the XmlElement annotation for flow runs in 
{{FlowActivityEntity}} in YARN-4074. This change will be in the next patch 
(once I rebase with Vrushali's latest for YARN-3091). I also implemented the 
full {{compareTo()}} method already in the current patch for YARN-4074.


> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.POC.1.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735533#comment-14735533
 ] 

Hadoop QA commented on YARN-4132:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 23s | The applied patch generated  3 
new checkstyle issues (total was 211, now 213). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 48s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   0m 22s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 52s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown |
|   | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754710/YARN-4132.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 970daaa |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9041/console |


This message was automatically generated.

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735537#comment-14735537
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1095 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1095/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735497#comment-14735497
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8416 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8416/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.2
>
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735490#comment-14735490
 ] 

zhihai xu commented on YARN-4096:
-

thanks Jason for the contribution! Committed it to branch-2.7.2, branch-2 and 
trunk.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4096:

Hadoop Flags: Reviewed

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735489#comment-14735489
 ] 

Hudson commented on YARN-4096:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #364 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/364/])
YARN-4096. App local logs are leaked if log aggregation fails to initialize for 
the app. Contributed by Jason Lowe. (zxu: rev 
16b9037dc1300b8bdbe54ba7cd47c53fe16e93d8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735479#comment-14735479
 ] 

Sangjin Lee commented on YARN-3901:
---

I think something like the following would work:

{code}
210   long currentMinValue = ((Number) GenericObjectMapper.read(CellUtil
211   .cloneValue(currentMinCell))).longValue();
212   long currentCellValue = ((Number) 
GenericObjectMapper.read(CellUtil
213   .cloneValue(cell))).longValue();
{code}

bq. I am thinking I will need this when the flush/compaction scanner is added 
in. If you'd like, I can move it in as a non-public class for now and then move 
it out if needed.

+1.

bq. I actually needed this in the unit test while checking the 
FlowActivityTable contents, if you want I can take it out and you can add that 
test case in when you add in the RowKey changes?

If it is to help your unit test, it's fine to include it here (as long as it's 
identical to what we have in YARN-4074; that would help my rebasing).

bq. Yeah, I was thinking about that too. Right now, metrics will get their own 
timestamps. For other columns, we'd be using the nanoseconds. I am trying to 
see if we can just use milliseconds.

We do need the timestamps that are generated here to be in nanoseconds as they 
are multiplied by the factor of 1 million in {{TimestampGenerator}}. They 
cannot be converted to milliseconds, or it would defeat the purpose of using 
{{TimestampGenerator}}. The comment was about the concern of always being able 
to distinguish these two types of "timestamps" without confusion.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735471#comment-14735471
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  
14 new checkstyle issues (total was 236, now 242). |
| {color:red}-1{color} | whitespace |   0m  3s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 13s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754296/YARN-3635.7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 970daaa |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9040/console |


This message was automatically generated.

> Get-queue-mapping should be a common interface of YarnScheduler
> ---
>
> Key: YARN-3635
> URL: https://issues.apache.org/jira/browse/YARN-3635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Tan, Wangda
> Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
> YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch, YARN-3635.7.patch
>
>
> Currently, both of fair/capacity scheduler support queue mapping, which makes 
> scheduler can change queue of an application after submitted to scheduler.
> One issue of doing this in specific scheduler is: If the queue after mapping 
> has different maximum_allocation/default-node-label-expression of the 
> original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
> the wrong queue.
> I propose to make the queue mapping as a common interface of scheduler, and 
> RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4126:
---
Attachment: 0003-YARN-4126.patch

Hi [~jianhe]

Attaching patch after testcase updation.
{{TestRMWebServicesDelegationTokens}} havnt corrected yet.
In nonsecure mode what should be the behaviour for 
{{RMWebServicesDelegationTokens}}.

Currently it will be {{500 Internal Error}}

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4132:
---
Attachment: YARN-4132.patch

> Nodemanagers should try harder to connect to the RM
> ---
>
> Key: YARN-4132
> URL: https://issues.apache.org/jira/browse/YARN-4132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4132.patch
>
>
> Being part of the cluster, nodemanagers should try very hard (and possibly 
> never give up) to connect to a resourcemanager. Minimally we should have a 
> separate config to set how aggressively a nodemanager will connect to the RM 
> separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4132) Nodemanagers should try harder to connect to the RM

2015-09-08 Thread Chang Li (JIRA)
Chang Li created YARN-4132:
--

 Summary: Nodemanagers should try harder to connect to the RM
 Key: YARN-4132
 URL: https://issues.apache.org/jira/browse/YARN-4132
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li


Being part of the cluster, nodemanagers should try very hard (and possibly 
never give up) to connect to a resourcemanager. Minimally we should have a 
separate config to set how aggressively a nodemanager will connect to the RM 
separate from what clients will do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735452#comment-14735452
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~sjlee0] for the review!

I will correct the variable ordering for static and private members as well as 
making variables final.

bq. l.210: Strictly speaking, GenericObjectMapper will return an integer if the 
value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with Number instead of long
Comparisons are not allowed for Number datatype. 
{code} 
The operator < is undefined for the argument type(s) java.lang.Number, 
java.lang.Number
{code} 

So I would have to do something like {code} Number d = a.longValue() + 
b.longValue(); {code}  Do you think this is better? 

bq. l.52: Is the TimestampGenerator class going to be used outside 
FlowRunCoprocessor? If not, I would argue that we should make it an inner class 
of FlowRunCoprocessor. At least we should make it non-public to keep it within 
the package. If it would see general use outside this class, then it might be 
better to make it a true public class in the common package. I suspect a 
non-public class might be what we want here.
I am thinking I will need this when the flush/compaction scanner is added in. 
If you'd like, I can move it in as a non-public class for now and then move it 
out if needed. 

bq. It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.
I actually needed this in the unit test while checking the FlowActivityTable 
contents, if you want I can take it out and you can add that test case in when 
you add in the RowKey changes? 

bq. l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.
Yeah, I was thinking about that too. Right now, metrics will get their own 
timestamps. For other columns, we'd be using the nanoseconds. I am trying to 
see if we can just use milliseconds.

bq. it might be good to have short comments on what each method is testing
I did try to make the unit test names themselves descriptive like 
testFlowActivityTable or testWriteFlowRunMinMaxToHBase or 
testWriteFlowRunMetricsOneFlow or testWriteFlowActivityOneFlow but I agree some 
more comments in the unit test will surely help. 

Will upload a new patch shortly, thanks! 


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggreg

[jira] [Commented] (YARN-4096) App local logs are leaked if log aggregation fails to initialize for the app

2015-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735448#comment-14735448
 ] 

zhihai xu commented on YARN-4096:
-

+1. Committing it in.

> App local logs are leaked if log aggregation fails to initialize for the app
> 
>
> Key: YARN-4096
> URL: https://issues.apache.org/jira/browse/YARN-4096
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4096.001.patch
>
>
> If log aggregation fails to initialize for an application then the local logs 
> will never be deleted.  This is similar to YARN-3476 except this is a failure 
> when log aggregation tries to initialize the app-specific log aggregator 
> rather than a failure during a log upload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735384#comment-14735384
 ] 

Sangjin Lee commented on YARN-3901:
---

Thanks for the updated patch [~vrushalic]! I went over the new patch, and the 
following is the quick feedback. I'll also apply it with YARN-4074, and test it 
a little more.

(HBaseTimelineWriterImpl.java)
- l.141-155: the whole thing could be inside {{if (isApplication)...}}
- l.264: this null check is not needed

(FlowRunCoprocessor.java)
- l.52: Is the {{TimestampGenerator}} class going to be used outside 
{{FlowRunCoprocessor}}? If not, I would argue that we should make it an inner 
class of {{FlowRunCoprocessor}}. At least we should make it non-public to keep 
it within the package. If it would see general use outside this class, then it 
might be better to make it a true public class in the common package. I suspect 
a non-public class might be what we want here.
- l.52: let's make it final
- l.54: style nit: I think the common style is to place the static variables 
before instance variables
- Also, overall it seems we're using both the diamond operator (<>) and the old 
style generic declaration. It might be good to stick with one style (in which 
case the diamond operator might be better).
- l.144: This would mean that some cell timestamps would have the unit of the 
milliseconds and others would be in nanoseconds. I'm a little bit concerned if 
we ever interpret these timestamps incorrectly. Could there be a more explicit 
way of clearly differentiating them? I don't have good suggestions at the 
moment.

(FlowScanner.java)
- variable ordering
- l.210: Strictly speaking, {{GenericObjectMapper}} will return an integer if 
the value fits within an integer; so it's not exactly a concern for min/max 
(timestamps) but for caution we might want to stay with {{Number}} instead of 
long.

(TimestampGenerator.java)
- l.29: make it final
- variable ordering
- see above for the public/non-public comment

(FlowActivityRowKey.java)
- It's up to you, but you could leave the row key improvement to YARN-4074. 
That might be easier to manage the changes between yours and mine. I'm 
restructuring all *RowKey classes uniformly.

(TestHBaseTimelineWriterImplFlowRun.java)
- it might be good to have short comments on what each method is testing


> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735349#comment-14735349
 ] 

Junping Du commented on YARN-4131:
--

bq. For exit codes, I'd like to be able to have the AM think that the container 
crashed, was pre-empted or went OOM, so we can test the different codepaths.
Oh. I see. I think we can add an enum in KillContainerRequest and passed to RM. 
To keep it simple, CLI may be only support the only option (preempted)?

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-09-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735347#comment-14735347
 ] 

Arun Suresh commented on YARN-4086:
---

Thanks for the patch Robert.

It Looks good save a minor nit :
* In the 'testFetchApplictionLogsHar', when asserting the value of contents of 
the sysoutStream, you may want to just check if the output contains some 
impartant/relevant strings rather than matching the whole output, else the 
testcase would end up quite brittle, and requiring constant changes (especially 
if the output format changes)

+1 post jenkins

> Allow Aggregated Log readers to handle HAR files
> 
>
> Key: YARN-4086
> URL: https://issues.apache.org/jira/browse/YARN-4086
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-4086.001.patch
>
>
> This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
> web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735301#comment-14735301
 ] 

Steve Loughran commented on YARN-4131:
--

—I was thinking I need even less than what you'd done. I just want to kill a 
container and wait for the AM to react, or kill the AM and wait for it to 
restart: no need for synchronous operations.

For exit codes, I'd like to be able to have the AM think that the container 
crashed, was pre-empted or went OOM, so we can test the different codepaths.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735278#comment-14735278
 ] 

Hadoop QA commented on YARN-4131:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 21s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:red}-1{color} | javac |   4m 51s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754675/YARN-4131-demo-2.patch 
|
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 970daaa |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9039/console |


This message was automatically generated.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735266#comment-14735266
 ] 

Wangda Tan commented on YARN-4091:
--

Thanks [~sunilg].

I can understand why you have this proposal, but I'm not sure if your approach 
works in following scenario. I feel getting a over-all state of an app and a 
last-container-assignment-state may not works well for them:

- App wants only a small proportion of a cluster (such as hard locality)
- Similar to above, app want to run on specific partition only
- App's leafqueue or parent queue beyond its limit
- App asks mappers in one partition (A), and reducers in another partition(B), 
when A has little available resource and B has more available resource. User 
wants to see why mappers allocation is slow.

And also, we cannot get order of allocation with your approach, which is an 
important thing to look at when we enable fairness/priority scheduling for apps.

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735252#comment-14735252
 ] 

Allen Wittenauer commented on YARN-4126:


That sounds like a bigger set of bugs than not issuing delegation tokens

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735238#comment-14735238
 ] 

Junping Du commented on YARN-4131:
--

bq.  I'd actually leave out the waiting for the operation to complete: make it 
fully async and let caller wait if they want to.
The logic in YarnClientImpl should demonstrate the async way of consuming this 
API. It basically call killContainer() and looping to check return code (true 
means container is still active, so kill event get sent). Anything else to do, 
like: put some explicit async tag on this API?

bq.  is there any way to set the exit code? I'd like to signal pre-emption and 
out of memory events at some point.
Do you mean how we can know if container get killed successfully? Basically two 
ways, one is just like mentioned above, call killContainer() return false means 
container is gone; or call getContainerReport() or getContainers() in 
ApplicationBaseProtocol which return active containers only.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4113) RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER

2015-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735215#comment-14735215
 ] 

Wangda Tan commented on YARN-4113:
--

Created HADOOP-12386 to track RETRY_FOREVER changes.

> RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER
> --
>
> Key: YARN-4113
> URL: https://issues.apache.org/jira/browse/YARN-4113
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
>
> Found one issue in RMProxy how to initialize RetryPolicy: In 
> RMProxy#createRetryPolicy. When rmConnectWaitMS is set to -1 (wait forever), 
> it uses RetryPolicies.RETRY_FOREVER which doesn't respect 
> {{yarn.resourcemanager.connect.retry-interval.ms}} setting.
> RetryPolicies.RETRY_FOREVER uses 0 as the interval, when I run the test 
> without properly setup localhost name: 
> {{TestYarnClient#testShouldNotRetryForeverForNonNetworkExceptions}}, it wrote 
> 14G DEBUG exception message to system before it dies. This will be very bad 
> if we do the same thing in a production cluster.
> We should fix two places:
> - Make RETRY_FOREVER can take retry-interval as constructor parameter.
> - Respect retry-interval when we uses RETRY_FOREVER policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4131:
-
Attachment: YARN-4131-demo-2.patch

Sounds like we just involve a new class "MockResourceManagerFacade" on trunk 
that cause previous patch get build failure. demo-2 should fix it.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735207#comment-14735207
 ] 

Wangda Tan commented on YARN-4106:
--

Thanks update [~bibinchundatt], and comments from [~Naganarasimha]. Few minor 
comments:

1) Make failLabelResendInterval to final before we can configure it.
2) testConfigTimer sleep time is too much, I don't know if Clock can be used in 
Timer, I think you can set NM_NODE_LABELS_PROVIDER_FETCH_INTERVAL_MS to lower 
value, like 1000, and sleep 1500 ms.
3) With changes in your patch, testNodeLabelsFromConfig doesn't need sleep any 
more?

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735194#comment-14735194
 ] 

Steve Loughran commented on YARN-4131:
--

LGTM. 

# I'd actually leave out the waiting for the operation to complete: make it 
fully async and let caller wait if they want to
# is there any way to set the exit code? I'd like to signal pre-emption and out 
of memory events at some point

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735172#comment-14735172
 ] 

Steve Loughran commented on YARN-3337:
--

I'm happy with an operation that doesn't return whether or not the container 
gets killed...could also declare that it's potentially async and callers should 
poll for the container going away

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735170#comment-14735170
 ] 

Jian He commented on YARN-4126:
---

Yes, there is. For example, oozie grabs this token in insecure mode and pass 
the token around in insecure mode which actually breaks in some places.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM

2015-09-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4106:
-
Priority: Major  (was: Blocker)

> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> ---
>
> Key: YARN-4106
> URL: https://issues.apache.org/jira/browse/YARN-4106
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, 
> 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch
>
>
> NodeLabels for NM in distributed mode is not updated even after 
> clusterNodelabel addition in RM
> Steps to reproduce
> ===
> # Configure nodelabel in distributed mode
> yarn.node-labels.configuration-type=distributed
> provider = config
> yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms
> # Start RM the NM
> # Once NM is registration is done add nodelabels in RM
> Nodelabels not getting updated in RM side 
> *This jira also handles the below issue too*
> Timer Task not getting triggered in Nodemanager for Label update in 
> nodemanager for distributed scheduling
> Task is supposed to trigger every 
> {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2015-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735132#comment-14735132
 ] 

Allen Wittenauer commented on YARN-4126:


Is there any actual harm in returning a useless delegation token?  I know on 
the HDFS side of the house, returning null tokens has been extremely beneficial 
in streamlining the code.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735128#comment-14735128
 ] 

Hadoop QA commented on YARN-3943:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 39s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  56m 14s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754660/YARN-3943.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 090d266 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9037/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9037/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9037/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9037/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9037/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9037/console |


This message was automatically generated.

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735127#comment-14735127
 ] 

Hadoop QA commented on YARN-4131:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m  5s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754664/YARN-4131-demo.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 090d266 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9038/console |


This message was automatically generated.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.

2015-09-08 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735070#comment-14735070
 ] 

MENG DING commented on YARN-1651:
-

Hi, [~leftnoteasy]

I am ok with most of the reply comments. Thanks.

bq. It seems no need to do the null check here. When it becomes null? I prefer 
to keep it as-is and it will throw NPE if any fatal issue happens.
The {{updateContainerAndNMToken}} may return null:
{code}
  Container updatedContainer =
  updateContainerAndNMToken(rmContainer, false, increase);
  returnContainerList.add(updatedContainer);
{code}

I only mention this because {{pullNewlyAllocatedContainers()}} has a check for 
null for the same logic, so I think we may want to make it consistent?

Some remaining comments:
* As you mentioned in the code, currently reserved resource increase request 
does not participate in the continuous reservation looking logic. So, based on 
my understanding, if an application has reserved some resource for a container 
resource increase request on a node, that amount of resource should never be 
unreserved in order for the application to allocate a regular container on some 
other node. But that doesn't seem to be the case right now? Can you confirm?
If so, I am thinking a simple solution would be to *exclude* resources reserved 
for increased containers when trying to find an unreserved container for 
regular container allocation.
{code:title=RegularContainerAllocator.assignContainer}
  ...
  ...
  unreservedContainer =
  application.findNodeToUnreserve(clusterResource, node, priority,  
<= Don't consider resources reserved for container increase request
  resourceNeedToUnReserve);
  ...
{code}
* I think it will be desirable to implement a {{pendingDecrease}} set in 
{{SchedulerApplicationAttempt}}, and corresponding logic, just like 
{{SchedulerApplicationAttempt.pendingRelease}}. This is to guard against the 
situation *when decrease requests are received while RM is in the middle of 
recovery, and has not received all container statuses from NM yet*.

* Some nits
** Comments in {{NMReportedContainerChangeIsDoneTransition}} doesn't seem right.
** IncreaseContainerAllocator: {{LOG.debug("  Headroom is satisifed, 
skip..");}} --> {{LOG.debug("  Headroom is not satisfied, skip..");}}

> CapacityScheduler side changes to support increase/decrease container 
> resource.
> ---
>
> Key: YARN-1651
> URL: https://issues.apache.org/jira/browse/YARN-1651
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-1651-1.YARN-1197.patch, 
> YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)
Junping Du created YARN-4131:


 Summary: Add API and CLI to kill container on given containerId
 Key: YARN-4131
 URL: https://issues.apache.org/jira/browse/YARN-4131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications, client
Reporter: Junping Du
Assignee: Junping Du


Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735063#comment-14735063
 ] 

Junping Du commented on YARN-3337:
--

Put a demo patch on sub JIRA. [~ste...@apache.org], mind take a quick look if 
this is also something in your mind? I can do more polish work on that patch 
later.

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4131) Add API and CLI to kill container on given containerId

2015-09-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4131:
-
Attachment: YARN-4131-demo.patch

Attach a demo patch, more test work is still needed.

> Add API and CLI to kill container on given containerId
> --
>
> Key: YARN-4131
> URL: https://issues.apache.org/jira/browse/YARN-4131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, client
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-4131-demo.patch
>
>
> Per YARN-3337, we need a handy tools to kill container in some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: (was: YARN-3943.000.patch)

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.

2015-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3943:

Attachment: YARN-3943.000.patch

> Use separate threshold configurations for disk-full detection and 
> disk-not-full detection.
> --
>
> Key: YARN-3943
> URL: https://issues.apache.org/jira/browse/YARN-3943
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3943.000.patch
>
>
> Use separate threshold configurations to check when disks become full and 
> when disks become good. Currently the configuration 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are 
> used to check both when disks become full and when disks become good. It will 
> be better to use two configurations: one is used when disks become full from 
> not-full and the other one is used when disks become not-full from full. So 
> we can avoid oscillating frequently.
> For example: we can set the one for disk-full detection higher than the one 
> for disk-not-full detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-09-08 Thread Kishore Chaliparambil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734967#comment-14734967
 ] 

Kishore Chaliparambil commented on YARN-2884:
-

Thanks [~subru]

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Fix For: 2.8.0
>
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-09-08 Thread Kishore Chaliparambil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734966#comment-14734966
 ] 

Kishore Chaliparambil commented on YARN-2884:
-

Thanks Jian!

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Fix For: 2.8.0
>
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V12.patch, YARN-2884-V13.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors

2015-09-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734931#comment-14734931
 ] 

Jason Lowe commented on YARN-2410:
--

Thanks for updating the patch!

bq. The only reason was findbugs which does not allow more than 7 parameters in 
a function call
Normally a builder pattern is used to make the code more readable in those 
situations.  However I don't think we need more than 7.  ReduceContext really 
only needs mapIds, reduceId, channelCtx, user, infoMap, and outputBasePathStr.  
The other two parameters are either known to be zero (should not be passed) and 
can be derived from another (size of mapIds).  As such we don't need 
SendMapOutputParams.

bq. The reduceContext is a variable holds the value set by the setAttachment() 
method and is used by the getAttachment() answer. If I declare it in the test 
method, it needs be final which cannot be done due to it being used by the 
setter.
createMockChannel can simply have a ReduceContext parameter, marked final, and 
that should solve that problem.  But I thought we were getting rid of the use 
of channel attachments and just associating the context with the listener 
directly?

Related to the last comment, we're still using channel attachments.  sendMap 
can just take a ReduceContext parameter, and the listener can provide its 
context when calling it.  No need for channel attachments.

This can NPE since we're checking for null after we already use it:
{noformat}
+nextMap = sendMapOutput(
+reduceContext.getSendMapOutputParams().getCtx(),
+reduceContext.getSendMapOutputParams().getCtx().getChannel(),
+reduceContext.getSendMapOutputParams().getUser(), mapId,
+reduceContext.getSendMapOutputParams().getReduceId(), info);
+nextMap.addListener(new ReduceMapFileCount(reduceContext));
+if (null == nextMap) {
{noformat}

maxSendMapCount should be cached during serviceInit like the other conf-derived 
settings so we aren't doing conf lookups on every shuffle.

The indentation in sendMap isn't correct, as code is indented after a 
conditional block at the same level as the contents of the conditional block.  
There's other places that are over-indented.

MockShuffleHandler only needs to override one thing, getShuffle, but the mock 
that method returns has to override a bunch of stuff.  It makes more sense to 
create a separate class for the mocked Shuffle than the mocked ShuffleHandler.

Should the mock Future stuff be part of creating the mocked channel?  Can 
simply pass the listener list to use as an arg to the method that mocks up the 
channel.

Nit: SHUFFLE_MAX_SEND_COUNT should probably be something like 
SHUFFLE_MAX_SESSION_OPEN_FILES to better match the property name.  Similarly 
maxSendMapCount could have a more appropriate name.

Nit: Format for 80 columns

Nit: There's still instances where we have a class definition immediately after 
variable definitions and a lack of whitespace between classes and methods or 
between methods. Whitespace would help readability in those places.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> 
>
> Key: YARN-2410
> URL: https://issues.apache.org/jira/browse/YARN-2410
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Nathan Roberts
>Assignee: Kuhu Shukla
> Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=24 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3337) Provide YARN chaos monkey

2015-09-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734800#comment-14734800
 ] 

Junping Du commented on YARN-3337:
--

I think there is one difficulty here: it looks like we didn't keep finished 
container info in RM scheduler info but only keep live containers info (in 
SchedulerApplicationAttempt). If no dead container info get preserved in RM, 
the new added API can only send kill container event but no way to know if 
container get killed actually (no way to differentiate a wrong container ID or 
an ID for finished container). CLI could be better as it can query running 
container list first, then kill it and wait container is not active. 
If we want exactly the same semantic as kill apps API, then we have to make RM 
to track info for dead containers which sounds too overkill to me as it force 
RM to track all containers for all applications (complexity become the same as 
MRv1).
May be a better trade-off here is: the semantic for forceKillContainer() only 
means to send kill containers events but not means container get killed or not. 
A boolean value response for forceKillContainer() indicate if we found a live 
container to kill or not. So we could lose Idempotent property for this API?

> Provide YARN chaos monkey
> -
>
> Key: YARN-3337
> URL: https://issues.apache.org/jira/browse/YARN-3337
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>
> To test failure resilience today you either need custom scripts or implement 
> Chaos Monkey-like logic in your application (SLIDER-202). 
> Killing AMs and containers on a schedule & probability is the core activity 
> here, one that could be handled by a CLI App/client lib that does this. 
> # entry point to have a startup delay before acting
> # frequency of chaos wakeup/polling
> # probability to AM failure generation (0-100)
> # probability of non-AM container kill
> # future: other operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734790#comment-14734790
 ] 

Hadoop QA commented on YARN-3771:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m  6s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 15s | The applied patch generated  4 
new checkstyle issues (total was 211, now 201). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   0m 46s | Tests passed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | mapreduce tests | 107m 23s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  4s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| | | 162m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737924/0001-YARN-3771.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9033/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9033/console |


This message was automatically generated.

> "final" behavior is not honored for 
> YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH  since it is a String[]
> 
>
> Key: YARN-3771
> URL: https://issues.apache.org/jira/browse/YARN-3771
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
> Attachments: 0001-YARN-3771.patch
>
>
> i was going through some find bugs rules. One issue reported in that is 
>  public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
> and 
>   public static final String[] 
> DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH=
> is not honoring the final qualifier. The string array contents can be re 
> assigned !
> Simple test
> {code}
> public class TestClass {
>   static final String[] t = { "1", "2" };
>   public static void main(String[] args) {
> System.out.println(12 < 10);
> String[] t1={"u"};
> //t = t1; // this will show compilation  error
> t (1) = t1 (1) ; // But this works
>   }
> }
> {code}
> One option is to use Collections.unmodifiableList
> any thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4130) Duplicate declaration of ApplicationId in RMAppManager

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734776#comment-14734776
 ] 

Hadoop QA commented on YARN-4130:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 31s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754621/YARN-4130.00.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9036/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9036/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9036/console |


This message was automatically generated.

> Duplicate declaration of ApplicationId in RMAppManager
> --
>
> Key: YARN-4130
> URL: https://issues.apache.org/jira/browse/YARN-4130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Trivial
>  Labels: resourcemanager
> Attachments: YARN-4130.00.patch
>
>
> ApplicationId is declared double in {{RMAppManager}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734751#comment-14734751
 ] 

Hadoop QA commented on YARN-4022:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  
11 new checkstyle issues (total was 85, now 94). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m  7s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12754618/YARN-4022.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 435f935 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9035/console |


This message was automatically generated.

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >