[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006883#comment-15006883
 ] 

Junping Du commented on YARN-4354:
--

bq. I don't think there's anything magical about localization vs. the other 
things the NM is doing. The async dispatcher will only exit if an exception 
leaks up to the top, and when it does that's a programming error since it 
doesn't handle an exception properly.
I agree there are no much different in overall. However, back to this case: 
from a user's prospective, an occasional NPE localization exception for a 
resource being cancelled could be better to be ignored (but get logged) rather 
than crash the NM. The price of ignoring the exception here could be 
potentially leaking file half localized (could be removed later) but the gain 
is the NM can be survival and keep working. We should at least provide this 
trade-off as a configurable choice to user. Isn't it?

bq.  If we're willing for NPEs in localization to not take down the NM, why are 
we willing to do the same if it happens in another NM subsystem that also uses 
the AsyncDispatcher? IMHO we should be consistent about the unexpected 
exception handling.
I am not against to keep consistent for localization event handling with other 
subsystems, but not sure if ignoring other exceptional events could potentially 
cause NM ends up in a bad state. I think that is motivation we separate 
SchedulerEventDispatcher from RM dispatcher for general events with different 
setting/behavior. No?

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4354:
-
Comment: was deleted

(was: Sorry for making a mistake. I was paying more attentions to other 
conflicts rather than this change...
Thanks [~jlowe] for fixing this.)

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006780#comment-15006780
 ] 

Junping Du commented on YARN-4354:
--

Sorry for making a mistake. I was paying more attentions to other conflicts 
rather than this change...
Thanks [~jlowe] for fixing this.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006779#comment-15006779
 ] 

Junping Du commented on YARN-4354:
--

Sorry for making a mistake. I was paying more attentions to other conflicts 
rather than this change...
Thanks [~jlowe] for fixing this.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4354:
-
Attachment: YARN-4354-branch-2.7.002.patch

The commit to branch-2.7 broke the build because the LocalResourcesTrackerImpl 
constructor arguments are different than in branch-2.  Attached is the version 
of the patch I committed to branch-2.7.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006683#comment-15006683
 ] 

Jason Lowe commented on YARN-4354:
--

bq. To make NM more robust, I think we should tolerate this kind of 
failure/exception in LocalResourcesTracker rather than making NM's dispatch to 
crash and exit. May be we can make LocalResourcesTracker have a separated 
AsyncDispatcher to set "DISPATCHER_EXIT_ON_ERROR_KEY" to false like what we do 
in RM for SchedulerEventDispatcher?

I don't think there's anything magical about localization vs. the other things 
the NM is doing.  The async dispatcher will only exit if an exception leaks up 
to the top, and when it does that's a programming error since it doesn't handle 
an exception properly.  If we're willing for NPEs in localization to not take 
down the NM, why are we willing to do the same if it happens in another NM 
subsystem that also uses the AsyncDispatcher?  IMHO we should be consistent 
about the unexpected exception handling.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-unittest.patch, YARN-4354.001.patch, 
> YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007106#comment-15007106
 ] 

Wangda Tan commented on YARN-4290:
--

[~sunilg]/[~Naganarasimha],
There're several resources in the output I think they're little confusing, how 
about call them:
- Configured Resources: MEM/CPU
- Allocated Resources: MEM/CPU
- Used Resources by NM node: MEM/CPU
- Used Resources by launched containers: MEM/CPU

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007077#comment-15007077
 ] 

Sangjin Lee commented on YARN-4350:
---

Thanks [~Naganarasimha] for the quick patch. I applied the patch on our branch, 
and the test passes now.

Are these changes specific to YARN-2928 only, or is there part of these changes 
required for the trunk as well (i.e. YARN-2859)?

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.008.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007076#comment-15007076
 ] 

Jason Lowe commented on YARN-3840:
--

Did anyone do a performance test with this change?  Recently I'm seeing the 
main RM UI page taking a _lot_ longer to load with a full set of 10,000+ 
applications, and it appears to all be client-side processing.  It takes so 
long the client-side browser is complaining:
{noformat}
A script on this page may be busy, or it may have stopped responding. You can 
stop the script now, open the script in the debugger, or let the script 
continue.

Script: http://rmhost:8088/static/dt-plugin-1.10.7/sorting/natural.js
{noformat}


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007083#comment-15007083
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne], thanks for update:

bq. Would it be more efficient to just do the following? ... 
The problem is getUserResourceLimit is not always updated by scheduler. If a 
queue is not traversed by scheduler OR apps of a queue-user have long heartbeat 
interval, the user resource limit could be staled.

I found 0005 patch for trunk is computing user-limit every time and 0005 patch 
for 2.7 is using getUserResourceLimit.

Thoughts? 

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769-branch-2.7.005.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, 
> YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-11-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007082#comment-15007082
 ] 

Sangjin Lee commented on YARN-2859:
---

Just to be sure, those 3 issues you mentioned impact only the timeline service 
v.2 (YARN-2928) branch and do not impact trunk, correct?

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-2859-addendum.txt, YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007129#comment-15007129
 ] 

Wangda Tan commented on YARN-4287:
--

Thanks for update [~nroberts], I tried this patch on 2.7, all CS tests passed 
with this patch. I will commit this to branch-2.7 today if no opposite opinions.

> Capacity Scheduler: Rack Locality improvement
> -
>
> Key: YARN-4287
> URL: https://issues.apache.org/jira/browse/YARN-4287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.8.0
>
> Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, 
> YARN-4287-minimal-v4-branch-2.7.patch, YARN-4287-minimal-v4.patch, 
> YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, 
> YARN-4287-v4.patch, YARN-4287.patch
>
>
> YARN-4189 does an excellent job describing the issues with the current delay 
> scheduling algorithms within the capacity scheduler. The design proposal also 
> seems like a good direction.
> This jira proposes a simple interim solution to the key issue we've been 
> experiencing on a regular basis:
>  - rackLocal assignments trickle out due to nodeLocalityDelay. This can have 
> significant impact on things like CombineFileInputFormat which targets very 
> specific nodes in its split calculations.
> I'm not sure when YARN-4189 will become reality so I thought a simple interim 
> patch might make sense. The basic idea is simple: 
> 1) Separate delays for rackLocal, and OffSwitch (today there is only 1)
> 2) When we're getting rackLocal assignments, subsequent rackLocal assignments 
> should not be delayed
> Patch will be uploaded shortly. No big deal if the consensus is to go 
> straight to YARN-4189. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-11-16 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Attachment: YARN-4225.001.patch

Attching YARN-4225.001.patch for both trunk and branch-2.8

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007172#comment-15007172
 ] 

Wangda Tan commented on YARN-3849:
--

[~sunilg], I tried to apply patch to branch-2.7 but failed, could you update 
patch for branch-2.7?

Thanks,

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007196#comment-15007196
 ] 

Xuan Gong commented on YARN-4234:
-

Fix the testcase failures and whitespace warning.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3454:
---
Description: 
The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
match interval removals, but does not handles correctly partial overlaps. 
In the context of this fix, we also introduced static methods to "merge" two 
RLESparseResourceAllocation, while applying an operator in the process 
(add/subtract/min/max/subtractTestPositive)

  was:The RLESparseResourceAllocation.removeInterval(...) method handles well 
exact match interval removals, but does not handles correctly partial overlaps. 


> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007518#comment-15007518
 ] 

Carlo Curino commented on YARN-3454:


The new "merge" functionalities we added has been used both to substitute the 
previously buggy logic that add/remove intervals, as well as to provide new 
functionalities. 
It is now easy to take two RLESparseResourceAllocation and subtract one from 
the other, or subtract and test the result is not negative (which we plan to 
use to improve the Plan/ReservationAgent interactions), as well as computing 
max/min etc.
 

> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007357#comment-15007357
 ] 

Hadoop QA commented on YARN-4234:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 41 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 235, now 273). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 49s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 52s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense 

[jira] [Updated] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3454:
---
Summary: RLESparseResourceAllocation does not handle removal of partial 
intervals (+ introducing support for efficient "merge" operations)   (was: 
RLESparseResourceAllocation does not handle removal of partial intervals)

> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007499#comment-15007499
 ] 

Hadoop QA commented on YARN-4297:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
4s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 26s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 17s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 43s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 57s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 50s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 33s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 7 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || 

[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007543#comment-15007543
 ] 

Naganarasimha G R commented on YARN-4183:
-

Hi [~sjlee0],
Thanks for sharing your thoughts
bq. If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled 
should not be set to true if the timeline service is disabled", then it only 
makes it clear that yarn.resourcemanager.system-metrics-publisher.enabled=true 
implies yarn.timeline-service.enabled=true. Then we should check it explicitly. 
Thoughts?
{{yarn.timeline-service.enabled}} doesn't imply that timeline service is 
running as we can enable it and still if the timeline server is not running 
then we face the same problem. We can configure 
{{yarn.timeline-service.enabled}} to be false and still start the AHS / 
timelineserver, IIUC Its not used any where in the AHS / timelineserver while 
starting up. Hence i interpreted it to be a client side config which tries to 
indicate the yarnclient that i am trying to use the timelineserver. My initial 
thoughts about this configuration was similar to your approach but it will have 
flaws because just by another configuration we cannot guarantee that timeline 
service is running in the cluster. 
May be we can try to fail the RM startup if SMP is not able to connect to 
Timelineserver,but IIUC  [~jeagles] and [~jlowe] in other ATS 1.5 jira were 
informing that cluster should run though the timelineserver is not running 
hence its not desirable to fail the RM startup. Max we can do is to update the 
document that *"yarn.resourcemanager.system-metrics-publisher.enabled"* 
requires Timelineservice to be running 

bq. but the way I view it is that it should act as a "master switch" for the 
timeline service; i.e. the highest level switch that toggles the feature on and 
off on all sides
This implies that when start the timeline service daemon then we need to check 
for  {{yarn.timeline-service.enabled}} and if false we need to get it down ? 
but in none of the other daemons we have it in that way, so will that be ok ? 
Also if the configurations used for the timelineserver has it but not for the 
other daemons then again we face issue.

bq. Also, consider the fact that the system metrics publisher may not be the 
only server-side component that interacts with the timeline service. There may 
be others and there will be more with the timeline service v.2 (e.g. NM 
collector service, etc.).
I agree to fact that we will be having lot of additions in the future versions, 
but we will be having further configurations like ATS version and on top of 
this having one more configuration like {{yarn.timeline-service.enabled}} will 
it be of use ?.

These are my views but if we still want to go ahead with  
{{yarn.timeline-service.enabled}} then we might need to come up with a *new 
configuration* to indicate that client wants to use the timeline server hence 
create the timeline client and the timelineserver delegation tokens. If as used 
in the current approach then we will meet the issues as mentioned by 
[~jeagles]. i.e. server configurations are copied to all clients and if 
timeline server is enabled then delegation tokens is created. So each client 
would require to explicitly reset the {{yarn.timeline-service.enabled}} 
configuration to false if they don't want to use it.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-11-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4140:
-
Target Version/s: 2.7.3

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007360#comment-15007360
 ] 

Wangda Tan commented on YARN-4140:
--

I think we may need to consider to put this to next 2.7 release, this solved 
allocation issue when using MR (or other locality required application under 
node label). Thoughts? [~bibinchundatt]/[~sunilg]/[~Naganarasimha].

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, 

[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007392#comment-15007392
 ] 

Naganarasimha G R commented on YARN-4350:
-

Thanks [~sjlee0] for verifying it,
But few points:
* Though race condition gets fixed, the approach i had taken had other impacts 
and i had rectified it in  YARN-3127, so  YARN-3127 should go in first . I can 
rebase the patch but i would require your support to get that in first and it 
too is a long pending one.
* And for the port one we need to check what approach will be taken in 
YARN-2859. based on it i can rework here.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.008.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007493#comment-15007493
 ] 

Li Lu commented on YARN-4234:
-

One more problem: Looking into the writer, I can see the writer maintains one 
file descriptor per app attempt:
private Map entityLogFDs;
While this is totally fine for summary logs, it will cause entities belong to 
different entity groups to be redirected to the wrong file.
Maybe we need to maintain a mapping between active entity groups (will there be 
too many?) to opened file descriptors? In this way we can find the right file 
to write for each entity group id. 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007358#comment-15007358
 ] 

Hadoop QA commented on YARN-4053:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
8s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 35s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 48s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 36s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-16 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12772577/YARN-4053-feature-YARN-2928.04.patch
 |
| JIRA Issue | YARN-4053 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2f046c7ddf23 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007382#comment-15007382
 ] 

Naganarasimha G R commented on YARN-2859:
-

Hi [~sjlee0],
Ok to reword it , suppose we enable 
{{yarn.resourcemanager.system-metrics-publisher.enabled}} in 2.7.2 and run 
TestDistributedShell then we face the same problem(SMP will fail to connect to 
timeline server) with the miniyarn cluster. Hence this fix is not complete fix 
but none of the test cases fail. So if *MINI YARN cluster is required to be 
used with system-metrics-publisher enabled* then its required to fix and its 
faster to fix in my way.  If MINIYARN cluster doesn't require 
*system-metrics-publisher enabled* then for ATS v2 we will have to change it as 
part of YARN-4350.

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-2859-addendum.txt, YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-3454:
--

Assignee: Carlo Curino

> RLESparseResourceAllocation does not handle removal of partial intervals
> 
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007329#comment-15007329
 ] 

Jason Lowe commented on YARN-3840:
--

Filed YARN-4357 to track the performance issue with the apps page.

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4357) Applications page loads very slowly when there are lots of applications

2015-11-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007326#comment-15007326
 ] 

Jason Lowe commented on YARN-4357:
--

The extra time appears to be all client-side processing in the browser.  The 
browser is complaining about a script taking too long:
{noformat}
A script on this page may be busy, or it may have stopped responding. You can 
stop the script now, open the script in the debugger, or let the script 
continue.

Script: http://rmhost:8088/static/dt-plugin-1.10.7/sorting/natural.js
{noformat}

Looks like this was introduced by YARN-3840.

> Applications page loads very slowly when there are lots of applications
> ---
>
> Key: YARN-4357
> URL: https://issues.apache.org/jira/browse/YARN-4357
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Priority: Critical
>
> It takes a long time (on the order of minutes) to load the application page 
> when there are many applications (e.g.: 10,000 or so).  This page used to 
> load much faster (on the order of a few seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007208#comment-15007208
 ] 

Li Lu commented on YARN-4234:
-

Hi [~xgong], not sure if you've noticed [~sjlee0]'s comments in YATN-4183:

bq. (yarn.timeline-service.version)
bq. I'd like to point out an interesting possibility raised in another JIRA by 
Joep Rottinghuis. With v.2 and especially early on with v.2, it would be rather 
useful to be able to enable both v.1 (or v.1.5) and v.2. That would provide a 
useful verification and comparison environment with a single cluster. The way 
it's being discussed right now, it sounds like the version would be a single 
value (mutually exclusive). Wouldn't it be good to have a possibility to be 
able to enable more than one version? Thoughts?

I think the proposal is still compatible with the current design here, with 
some simple changes on the sanity check side. How about implement this 
proposal? Thanks! 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007242#comment-15007242
 ] 

Sangjin Lee commented on YARN-4183:
---

I agree we probably shouldn't put too many points of discussion here that may 
not be core to this JIRA at hand. I'd like to focus on the 
SystemMetricsPublisher and 
yarn.resourcemanager.system-metrics-publisher.enabled and 
yarn.timeline-service.enabled.

bq. as far as 2.7.2 is concerned i feel 
yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be 
configured.

I'm not sure if that is desirable. Here is a key question. Suppose the timeline 
service is disabled, and no timeline daemons are running. And suppose 
yarn.resourcemanager.system-metrics-publisher.enabled is *true*, and we changed 
SystemMetricsPublisher to check only that flag. What would happen? AFAICT, the 
SystemMetricsPublisher will fire up the timeline client, and will try to send 
all the events actively to the timeline server. But since the timeline server 
is down, it will lead to continuous failures of writing to the timeline server, 
right? IMO, this type of very late failures is deeply unsatisfying and 
problematic.

If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should 
not be set to true if the timeline service is disabled", then it only makes it 
clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies 
yarn.timeline-service.enabled=true. Then we should check it explicitly. 
Thoughts?

bq. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it 
should be more to signify client requires the timeline service's delegation 
token. Which will not be a server side config. Thoughts?

I'm not sure if that's how it's currently interpreted, but the way I view it is 
that it should act as a "master switch" for the timeline service; i.e. the 
highest level switch that toggles the feature on and off on all sides. There 
can be "sub-switches" that can control finer-grained parts of the feature (e.g. 
the system metrics publisher). But those subfeatures should always check the 
master switch before checking their own. This will lead to a clean and 
consistent pattern of using the feature everywhere.

Also, consider the fact that the system metrics publisher may not be the only 
server-side component that interacts with the timeline service. There may be 
others and there will be more with the timeline service v.2 (e.g. NM collector 
service, etc.). If they all handle the failure case of the timeline server not 
being up in their own way, it would be quite confusing and error-prone. It 
would be consistent and easy to handle if everyone checks the master switch 
(and possibly their own subfeature switch), and wires off the feature as early 
as possible. So I would argue that yarn.timeline-service.enabled should be 
interpreted as such a "master switch", both for server-side and client-side.

I'd like to hear your thoughts. Thanks!

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4053:
---
Attachment: (was: YARN-4053-feature-YARN-2928.04.patch)

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch, YARN-4053-feature-YARN-2928.03.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"

2015-11-16 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4334:
---
Attachment: YARN-4334.wip.2.patch

Thanks [~jlowe] for review! I have updated .2 prototype patch, please try it 
out. On rm state store expired, rm recovery will recover previous running app 
and app attempts to killed. .2 prototype patch also address your other 
concerns. Will work on implementation of other statestore soon.

> Ability to avoid ResourceManager recovery if state store is "too old"
> -
>
> Key: YARN-4334
> URL: https://issues.apache.org/jira/browse/YARN-4334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4334.wip.2.patch, YARN-4334.wip.patch
>
>
> There are times when a ResourceManager has been down long enough that 
> ApplicationMasters and potentially external client-side monitoring mechanisms 
> have given up completely.  If the ResourceManager starts back up and tries to 
> recover we can get into situations where the RM launches new application 
> attempts for the AMs that gave up, but then the client _also_ launches 
> another instance of the app because it assumed everything was dead.
> It would be nice if the RM could be optionally configured to avoid trying to 
> recover if the state store was "too old."  The RM would come up without any 
> applications recovered, but we would avoid a double-submission situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007344#comment-15007344
 ] 

Varun Saxena commented on YARN-4297:


Rebased the patch.
Have also handled issue raised in YARN-3407 here as its a single line fix.

> TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 
> branch
> ---
>
> Key: YARN-4297
> URL: https://issues.apache.org/jira/browse/YARN-4297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4297-YARN-2928.01.patch, 
> YARN-4297-feature-YARN-2928.02.patch
>
>
> {noformat}
> Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 0.11 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495)
> {noformat}
> {noformat}
> testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.649 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> Tests in error: 
>   TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast 
> org.a...
>   TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer
>   TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 
> » ClassCast
>   TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast 
> org.apache.hado...
>   TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast 
> org.apache.hadoo...
>   TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 
> » ClassCast
>   TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast 
> org.ap...
>   TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » 
> NullPointer
>   TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast 
> org.apache.ha...
>   TestRMContainerAllocator.testResource:390 » ClassCast 
> org.apache.hadoop.mapred...
>   TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast 
> org.apache.hadoop.m...
>   TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » 
> ClassCast
>   TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast 
> org.apache
>   TestRMContainerAllocator.testSimple:219 » ClassCast 
> org.apache.hadoop.mapreduc...
>   
> TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 
> » ClassCast
>   TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast 
> org.apache.hadoop...
>   
> TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489
>  » ClassCast
> Tests run: 26, Failures: 0, Errors: 17, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4349) Support CallerContext in YARN

2015-11-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4349:
-
Attachment: YARN-4349.3.patch

Attached ver.3 patch, make MR app-master's caller context starts with 
"mr_appmaster_".

> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4349.1.patch, YARN-4349.2.patch, YARN-4349.3.patch
>
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> From MR's perspective:
> - Set AppMaster container's context to YARN's application Id
> - Set Mapper/Reducer containers' context to task attempt id
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4354) Public resource localization fails with NPE

2015-11-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007289#comment-15007289
 ] 

Jason Lowe commented on YARN-4354:
--

I committed the 2.7 patch to branch-2.7.2 as well, since it was missing from 
that release branch.

bq. I am not against to keep consistent for localization event handling with 
other subsystems, but not sure if ignoring other exceptional events could 
potentially cause NM ends up in a bad state.

>From my perspective, any escaped exception at the Async Dispatcher level is 
>capable of leaving the NM in a bad state.  Since it's escaped we don't know 
>where it occurred and what we were trying to do at the time.  That's why I 
>think it's a bit dangerous to assume the decisions we will make from that bad 
>state are better than crashing.  Anyway if we want to do this then we should 
>take up the discussion in a JIRA targeting that feature.

> Public resource localization fails with NPE
> ---
>
> Key: YARN-4354
> URL: https://issues.apache.org/jira/browse/YARN-4354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-4354-branch-2.7.002.patch, 
> YARN-4354-unittest.patch, YARN-4354.001.patch, YARN-4354.002.patch
>
>
> I saw public localization on nodemanagers get stuck because it was constantly 
> rejecting requests to the thread pool executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234-2015-11-16.1.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015.2.patch, YARN-4234.1.patch, 
> YARN-4234.2.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4053:
---
Attachment: YARN-4053-feature-YARN-2928.04.patch

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch, YARN-4053-feature-YARN-2928.03.patch, 
> YARN-4053-feature-YARN-2928.04.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4053:
---
Attachment: YARN-4053-feature-YARN-2928.04.patch

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch, YARN-4053-feature-YARN-2928.03.patch, 
> YARN-4053-feature-YARN-2928.04.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4357) Applications page loads very slowly when there are lots of applications

2015-11-16 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4357:


 Summary: Applications page loads very slowly when there are lots 
of applications
 Key: YARN-4357
 URL: https://issues.apache.org/jira/browse/YARN-4357
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.2
Reporter: Jason Lowe
Priority: Critical


It takes a long time (on the order of minutes) to load the application page 
when there are many applications (e.g.: 10,000 or so).  This page used to load 
much faster (on the order of a few seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4297:
---
Attachment: (was: YARN-4297-YARN-2928.02.patch)

> TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 
> branch
> ---
>
> Key: YARN-4297
> URL: https://issues.apache.org/jira/browse/YARN-4297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4297-YARN-2928.01.patch, 
> YARN-4297-feature-YARN-2928.02.patch
>
>
> {noformat}
> Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 0.11 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495)
> {noformat}
> {noformat}
> testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.649 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> Tests in error: 
>   TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast 
> org.a...
>   TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer
>   TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 
> » ClassCast
>   TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast 
> org.apache.hado...
>   TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast 
> org.apache.hadoo...
>   TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 
> » ClassCast
>   TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast 
> org.ap...
>   TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » 
> NullPointer
>   TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast 
> org.apache.ha...
>   TestRMContainerAllocator.testResource:390 » ClassCast 
> org.apache.hadoop.mapred...
>   TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast 
> org.apache.hadoop.m...
>   TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » 
> ClassCast
>   TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast 
> org.apache
>   TestRMContainerAllocator.testSimple:219 » ClassCast 
> org.apache.hadoop.mapreduc...
>   
> TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 
> » ClassCast
>   TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast 
> org.apache.hadoop...
>   
> TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489
>  » ClassCast
> Tests run: 26, Failures: 0, Errors: 17, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4297:
---
Attachment: YARN-4297-feature-YARN-2928.02.patch

> TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 
> branch
> ---
>
> Key: YARN-4297
> URL: https://issues.apache.org/jira/browse/YARN-4297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4297-YARN-2928.01.patch, 
> YARN-4297-feature-YARN-2928.02.patch
>
>
> {noformat}
> Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 0.11 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495)
> {noformat}
> {noformat}
> testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.649 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> Tests in error: 
>   TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast 
> org.a...
>   TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer
>   TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 
> » ClassCast
>   TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast 
> org.apache.hado...
>   TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast 
> org.apache.hadoo...
>   TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 
> » ClassCast
>   TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast 
> org.ap...
>   TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » 
> NullPointer
>   TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast 
> org.apache.ha...
>   TestRMContainerAllocator.testResource:390 » ClassCast 
> org.apache.hadoop.mapred...
>   TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast 
> org.apache.hadoop.m...
>   TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » 
> ClassCast
>   TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast 
> org.apache
>   TestRMContainerAllocator.testSimple:219 » ClassCast 
> org.apache.hadoop.mapreduc...
>   
> TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 
> » ClassCast
>   TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast 
> org.apache.hadoop...
>   
> TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489
>  » ClassCast
> Tests run: 26, Failures: 0, Errors: 17, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007623#comment-15007623
 ] 

Hadoop QA commented on YARN-3454:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 23, now 27). {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 17 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 0s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_79
 with JDK v1.7.0_79 generated 4 new issues (was 2, now 6). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007588#comment-15007588
 ] 

Naganarasimha G R commented on YARN-4140:
-

Yes [~wangda], we need to atleast selectively pick few jira's from the list i 
shared in forum, if not usability will be an issue for the guys who are using 
based on 2.7 versions. i think even Webui indicating the partition information 
and the REST are all basic requirements to use NodeLabels.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: 

[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007589#comment-15007589
 ] 

Naganarasimha G R commented on YARN-3216:
-

Hi [~wangda] and [~sunilg], What about this for 2.7.3 ?

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007603#comment-15007603
 ] 

Carlo Curino commented on YARN-3454:


We also removed the minAlloc input parameter that was unused (and misleading, 
as it was easy to confuse it for the clusterResource needed for 
ResourceCalculator computations). 

> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-3454.patch
>
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3454:
---
Attachment: YARN-3454.patch

> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-3454.patch
>
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-16 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007618#comment-15007618
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

YARN-4306 for the assertion error of getProxy.

I'll check related test failure, TestZKRMStateStore.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-4348-branch-2.7.002.patch, YARN-4348.001.patch, 
> YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007628#comment-15007628
 ] 

Naganarasimha G R commented on YARN-4290:
-

Yes [~wangda], it would be better given in that way as just putting in the 
braces will not signify its the configured resources
{{0MB (8192MB)}} and anyway the title is for capturing the detail information 
of the node and not just usage information. 
But only one thing, we can put  {{Used Resources by NM node: MEM/CPU}} at the 
end and we need to some how word it such that *Used Resources by NM node* is 
not just for the NM process but overall to the node. At first i presumed it to 
be of NM process only


> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4350:

Attachment: (was: YARN-4350-feature-YARN-2928.008.patch)

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4350:

Attachment: YARN-4350-feature-YARN-2928.001.patch

updated the name of the patch file

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-16 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4358:
--

 Summary: Improve relationship between SharingPolicy and 
ReservationAgent
 Key: YARN-4358
 URL: https://issues.apache.org/jira/browse/YARN-4358
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino


At the moment an agent places based on available resources, but has no 
visibility to extra constraints imposed by the SharingPolicy. While not all 
constraints are easily represented some (e.g., max-instantaneous resources) are 
easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007665#comment-15007665
 ] 

Hadoop QA commented on YARN-4349:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 10 new checkstyle issues in root (total 
was 488, now 491). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 30s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 30s {color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 51s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s 
{color} | 

[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234-2015-11-16.2.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007736#comment-15007736
 ] 

Xuan Gong commented on YARN-4234:
-

bq. While this is totally fine for summary logs, it will cause entities belong 
to different entity groups to be redirected to the wrong file.

Right.. The new patch addressed this issue. Also added a testcase to test this 

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-11-16 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007967#comment-15007967
 ] 

Bibin A Chundatt commented on YARN-4140:


+1 for merging in 2.7 .

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> 

[jira] [Created] (YARN-4361) Total resource count mistake:NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the newNode.getTotalCapability() in Multi-thread model

2015-11-16 Thread jialei weng (JIRA)
jialei weng created YARN-4361:
-

 Summary: Total resource count mistake:NodeRemovedSchedulerEvent in 
ReconnectNodeTransition will reduce the newNode.getTotalCapability() in 
Multi-thread model
 Key: YARN-4361
 URL: https://issues.apache.org/jira/browse/YARN-4361
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: jialei weng


Total resource count mistake:
NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
scheduler in different queue. So it cannot guarantee the remove-update-add 
operation in sequence. Usually the total resource will reduce the 
newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007996#comment-15007996
 ] 

Jian He commented on YARN-3840:
---

I verified locally by simulating 10k apps. the patch does affect the browser 
performance. 
[~mohdshahidkhan], would you like to take a crack at it?

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008015#comment-15008015
 ] 

Hadoop QA commented on YARN-4350:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 20s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
15s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
54s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 44s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 14s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 53s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 14s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 43s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   

[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008019#comment-15008019
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8811 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8811/])
Revert "YARN-3840. Resource Manager web ui issue when sorting (jianhe: rev 
fcd7888029a8e07cb0b22d1f47f9e7df82c4a304)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008038#comment-15008038
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #684 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/684/])
Revert "YARN-3840. Resource Manager web ui issue when sorting (jianhe: rev 
fcd7888029a8e07cb0b22d1f47f9e7df82c4a304)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-16 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008041#comment-15008041
 ] 

Joep Rottinghuis commented on YARN-4053:


Looks good [~varun_saxena] this is a nice separation of the conversion and the 
numeric conversion / comparison and general numeric manipulation.

Nit: LongConverter.compare and LongConverter.add should probably handle null 
values. 
Question any reason you don't simply have
public interface NumericValueConverter extends ValueConverter, 
Comparable ?


> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch, YARN-4053-feature-YARN-2928.03.patch, 
> YARN-4053-feature-YARN-2928.04.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4362) Too many preemption activity when nodelabels are non exclusive

2015-11-16 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4362:
--

 Summary: Too many preemption activity when nodelabels are non 
exclusive
 Key: YARN-4362
 URL: https://issues.apache.org/jira/browse/YARN-4362
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Priority: Critical


Steps to reproduce
===
1.Configure HA cluster with 6 nodes and 3 partition(1,2,3) all non exclusive

*Partition configuration is as follows*

1,2 NM's mapped with Label 1
NM 3  to  label 2
4,5 NM's mapped to Label 3
NM 6 in DEFAULT partition

In capacity scheduler the queue are linked only to 1,3 partition.
The NM 3 with label 2 is a backup node for any partition whenever required will 
change the label.

Submit and application/job with 200 containers to default queue.
All containers that gets assigned to partition 2 gets preempted 

The application/map task execution is taking more time since 30-40 task gets 
assigned to partition 2 then gets preempted and all of them needs to be 
relaunched.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008056#comment-15008056
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1408 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1408/])
Revert "YARN-3840. Resource Manager web ui issue when sorting (jianhe: rev 
fcd7888029a8e07cb0b22d1f47f9e7df82c4a304)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4362) Too many preemption activity when nodelabels are non exclusive

2015-11-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4362:
--

Assignee: Varun Saxena

> Too many preemption activity when nodelabels are non exclusive
> --
>
> Key: YARN-4362
> URL: https://issues.apache.org/jira/browse/YARN-4362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Critical
>
> Steps to reproduce
> ===
> 1.Configure HA cluster with 6 nodes and 3 partition(1,2,3) all non exclusive
> *Partition configuration is as follows*
> 1,2 NM's mapped with Label 1
> NM 3  to  label 2
> 4,5 NM's mapped to Label 3
> NM 6 in DEFAULT partition
> In capacity scheduler the queue are linked only to 1,3 partition.
> The NM 3 with label 2 is a backup node for any partition whenever required 
> will change the label.
> Submit and application/job with 200 containers to default queue.
> All containers that gets assigned to partition 2 gets preempted 
> The application/map task execution is taking more time since 30-40 task gets 
> assigned to partition 2 then gets preempted and all of them needs to be 
> relaunched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4362) Too many preemption activity when nodelabels are non exclusive

2015-11-16 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4362:
---
Attachment: capacity-scheduler.xml
Preemptedpartition.log
ProportionalPolicy.log
ProportionalDefaultQueue.log

> Too many preemption activity when nodelabels are non exclusive
> --
>
> Key: YARN-4362
> URL: https://issues.apache.org/jira/browse/YARN-4362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: Preemptedpartition.log, ProportionalDefaultQueue.log, 
> ProportionalPolicy.log, capacity-scheduler.xml
>
>
> Steps to reproduce
> ===
> 1.Configure HA cluster with 6 nodes and 3 partition(1,2,3) all non exclusive
> *Partition configuration is as follows*
> 1,2 NM's mapped with Label 1
> NM 3  to  label 2
> 4,5 NM's mapped to Label 3
> NM 6 in DEFAULT partition
> In capacity scheduler the queue are linked only to 1,3 partition.
> The NM 3 with label 2 is a backup node for any partition whenever required 
> will change the label.
> Submit and application/job with 200 containers to default queue.
> All containers that gets assigned to partition 2 gets preempted 
> The application/map task execution is taking more time since 30-40 task gets 
> assigned to partition 2 then gets preempted and all of them needs to be 
> relaunched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007649#comment-15007649
 ] 

Hadoop QA commented on YARN-4350:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 12m 53s 
{color} | {color:red} Docker failed to build yetus/hadoop:date2015-11-16. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12772402/YARN-4350-feature-YARN-2928.008.patch
 |
| JIRA Issue | YARN-4350 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9702/console |


This message was automatically generated.



> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007652#comment-15007652
 ] 

Allen Wittenauer commented on YARN-4350:


Already looking at it. ;)

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals (+ introducing support for efficient "merge" operations)

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3454:
---
Attachment: YARN-3454.1.patch

Updating including missing class

> RLESparseResourceAllocation does not handle removal of partial intervals (+ 
> introducing support for efficient "merge" operations) 
> --
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-3454.1.patch, YARN-3454.patch
>
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 
> In the context of this fix, we also introduced static methods to "merge" two 
> RLESparseResourceAllocation, while applying an operator in the process 
> (add/subtract/min/max/subtractTestPositive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007657#comment-15007657
 ] 

Carlo Curino commented on YARN-4358:


In this patch, we propose to change the relationship between 
ReservationAgent/Plan/SharingPolicy for two reasons:
 * agents ignore constraints that the SharingPolicy knows about (e.g., 
instantaneous max Resource)
 * agents repeatedly query the plan to obtain a view of resources (and remove 
the used from the total available)

We introduce a new API in the plan, that provides a compact 
RLESparseResourceAllocation based view of available resources.
Furhtermore, the InMemoryPlan interacts with the SharingPolicy to "customize" 
the view so that the agent only sees the resources
that are available for the given tenant at the given time. 

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2015-11-16 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4360:
--

 Summary: Improve GreedyReservationAgent to support "early" 
allocations, and performance improvements 
 Key: YARN-4360
 URL: https://issues.apache.org/jira/browse/YARN-4360
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino


The GreedyReservationAgent allocates "as late as possible". Per various 
conversations, it seems useful to have a mirror behavior that allocates as 
early as possible. Also in the process we leverage improvements from YARN-4358, 
and implement an RLE-aware StageAllocatorGreedy(RLE), which significantly 
speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reopened YARN-3840:
---

[~jlowe], thanks for reporting this. I did local test with a small number of 
apps only. 
I'm going to revert this patch to avoid regression.

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007744#comment-15007744
 ] 

Hadoop QA commented on YARN-4225:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 46s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 50, now 50). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 56s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007767#comment-15007767
 ] 

Sunil G commented on YARN-4140:
---

This ticket is good candidate for 2.7  line as we will be running in to this 
problem mostly with MR cases. +1.

As for the list, I feel most of them are fine. But we can try avoid some big 
patches from that as it may not be tested thoroughly. WebUI and REST related 
tickets wil be very useful for 2.7 line of release as it gives more details and 
clarity for labels (easy to understand allocation details and for debugging)

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 

[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-11-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007793#comment-15007793
 ] 

Wangda Tan commented on YARN-3216:
--

I feel this could be risky regarding to complexity of this patch. And I'm not 
very sure if all dependencies of this patch is in branch-2.7. I would prefer to 
make decision after we have a branch-2.7 patch for this.

Will review YARN-4304 tomorrow.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007846#comment-15007846
 ] 

Naganarasimha G R commented on YARN-4350:
-

thanks for the fast reply, I only saw it now ! :)

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2015-11-16 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4359:
--

 Summary: Update LowCost agents logic to take advantage of YARN-4358
 Key: YARN-4359
 URL: https://issues.apache.org/jira/browse/YARN-4359
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Ishai Menache


Given the improvements of YARN-4358, the LowCost agent should be improved to 
leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007759#comment-15007759
 ] 

Sunil G commented on YARN-4290:
---

Thanks [~leftnoteasy] for the input and thanks [~Naganarasimha Garla] for the 
suggestions.

Yes, this gives more clarity in terms of what we are displaying and arranging 
it in a ordered manner. I am also thinking that whether we need a suffix for 
the title as *(MEM/CPU)*, could we simply show as {{}}. Same information is prefixed before showing resource value. In this 
case, for any new resource addition (may be with resource profiles too), we can 
show in similar syntax. Will this be fine?

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1500#comment-1500
 ] 

Sunil G commented on YARN-3216:
---

Yes [~Naganarasimha Garla]. It will be covering a good fix for AM allocation. 
However we need YARN-4304 along with this, else UI will not give correct 
information. I missed adding some screen shots there, which I ll add today.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007780#comment-15007780
 ] 

Sunil G commented on YARN-3849:
---

Yes. Is  not applying there. i will share a 2.7 patch for same. Thank You 
[~leftnoteasy].

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4349) Support CallerContext in YARN

2015-11-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007829#comment-15007829
 ] 

Jian He commented on YARN-4349:
---

lgtm

> Support CallerContext in YARN
> -
>
> Key: YARN-4349
> URL: https://issues.apache.org/jira/browse/YARN-4349
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4349.1.patch, YARN-4349.2.patch, YARN-4349.3.patch
>
>
> More details about CallerContext please refer to description of HDFS-9184.
> From YARN's perspective, we should make following changes:
> - RMAuditLogger logs application's caller context when application submit by 
> user
> - Add caller context to application's data in ATS along with application 
> creation event
> From MR's perspective:
> - Set AppMaster container's context to YARN's application Id
> - Set Mapper/Reducer containers' context to task attempt id
> Protocol and RPC changes are done in HDFS-9184.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007836#comment-15007836
 ] 

Naganarasimha G R commented on YARN-4290:
-

Hi [~sunilg], 
i was assuming *(MEM/CPU)* was just to mention that the mem and cpu values are 
specified like resource. tostring !

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007841#comment-15007841
 ] 

Hadoop QA commented on YARN-4234:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 11s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 27s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 41 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 235, now 273). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 56s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 22s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | 

[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007855#comment-15007855
 ] 

Sunil G commented on YARN-3216:
---

Yes, patch covers major changes in scheduler, and depends on few metric items 
in NodeLabel some of which seems is not present in 2.7 line. I can double 
confirm the same.
I will work on 2.7 patch for same, and will put the same here. Thank you 
[~leftnoteasy].

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-11-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007851#comment-15007851
 ] 

Sunil G commented on YARN-4290:
---

Ok. I feel Resource.toString is not separated by "/". From {{Resource}} class 
{{toString}} is like,
{code}
""
{code}
Am I missing something?

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-11-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007864#comment-15007864
 ] 

Naganarasimha G R commented on YARN-3216:
-

That would good [~sunilg], 
Please try once, as i feel this is one of the important limitation for using 
Node Label feature !

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, 
> 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, 
> 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4358:
---
Attachment: YARN-4358.patch

This patch is marked-as-ready, but depends on YARN-3454 to apply correctly.

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007882#comment-15007882
 ] 

Carlo Curino commented on YARN-4358:


As side effects, this patch also: 
* introduces tracking of reservation count (useful for future sharingpolicies, 
which limit the number of parallel jobs), 
* provides support for user-keyed queries of the plan
* fixes some issue with comparators of reservations (spotted in the context of 
other related patches) 

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-11-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007896#comment-15007896
 ] 

Carlo Curino commented on YARN-4358:


 This patch fixes the behavior for the GreedyReservationAgent. YARN-4359 will 
fix the LowCost agent to leverage this structural change. 

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008064#comment-15008064
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2613 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2613/])
Revert "YARN-3840. Resource Manager web ui issue when sorting (jianhe: rev 
fcd7888029a8e07cb0b22d1f47f9e7df82c4a304)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4362) Too many preemption activity when nodelabels are non exclusive

2015-11-16 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008072#comment-15008072
 ] 

Bibin A Chundatt commented on YARN-4362:


Attached logs and xml.

Looks like the guaranteed resource for partition 2 for queue default will be 
always zero. So any container assigned to partition 2 will get preempted from 
ProportionalCapacityPreemptionPolicy even when no other application is running.

We should restrict assigning to partition 2.
Thoughts?

> Too many preemption activity when nodelabels are non exclusive
> --
>
> Key: YARN-4362
> URL: https://issues.apache.org/jira/browse/YARN-4362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: Preemptedpartition.log, ProportionalDefaultQueue.log, 
> ProportionalPolicy.log, capacity-scheduler.xml
>
>
> Steps to reproduce
> ===
> 1.Configure HA cluster with 6 nodes and 3 partition(1,2,3) all non exclusive
> *Partition configuration is as follows*
> 1,2 NM's mapped with Label 1
> NM 3  to  label 2
> 4,5 NM's mapped to Label 3
> NM 6 in DEFAULT partition
> In capacity scheduler the queue are linked only to 1,3 partition.
> The NM 3 with label 2 is a backup node for any partition whenever required 
> will change the label.
> Submit and application/job with 200 containers to default queue.
> All containers that gets assigned to partition 2 gets preempted 
> The application/map task execution is taking more time since 30-40 task gets 
> assigned to partition 2 then gets preempted and all of them needs to be 
> relaunched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008085#comment-15008085
 ] 

Hudson commented on YARN-3840:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #672 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/672/])
Revert "YARN-3840. Resource Manager web ui issue when sorting (jianhe: rev 
fcd7888029a8e07cb0b22d1f47f9e7df82c4a304)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java


> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Fix For: 2.8.0, 2.7.3
>
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4361) Total resource count mistake:NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the newNode.getTotalCapability() in Multi-thread model

2015-11-16 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4361:
--
Attachment: 0001-Fix-Total-resource-count-mistake-NodeRemovedSchedule.patch

A appropriate way to handle this issue. Just remove the if logic.

> Total resource count mistake:NodeRemovedSchedulerEvent in 
> ReconnectNodeTransition will reduce the newNode.getTotalCapability() in 
> Multi-thread model
> 
>
> Key: YARN-4361
> URL: https://issues.apache.org/jira/browse/YARN-4361
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.2
>Reporter: jialei weng
>  Labels: patch
>
> Total resource count mistake:
> NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
> newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
> scheduler in different queue. So it cannot guarantee the remove-update-add 
> operation in sequence. Usually the total resource will reduce the 
> newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Updated] (YARN-4361) Total resource count mistake:NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the newNode.getTotalCapability() in Multi-thread model

2015-11-16 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4361:
--
Attachment: (was: 
0001-Fix-Total-resource-count-mistake-NodeRemovedSchedule.patch)

> Total resource count mistake:NodeRemovedSchedulerEvent in 
> ReconnectNodeTransition will reduce the newNode.getTotalCapability() in 
> Multi-thread model
> 
>
> Key: YARN-4361
> URL: https://issues.apache.org/jira/browse/YARN-4361
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.2
>Reporter: jialei weng
>  Labels: patch
>
> Total resource count mistake:
> NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
> newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
> scheduler in different queue. So it cannot guarantee the remove-update-add 
> operation in sequence. Usually the total resource will reduce the 
> newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4361) Total resource count mistake:NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the newNode.getTotalCapability() in Multi-thread model

2015-11-16 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4361:
--
Attachment: YARN-4361v1.patch

A appropriate way to solve the issue. Just remove the 'if' logic.

> Total resource count mistake:NodeRemovedSchedulerEvent in 
> ReconnectNodeTransition will reduce the newNode.getTotalCapability() in 
> Multi-thread model
> 
>
> Key: YARN-4361
> URL: https://issues.apache.org/jira/browse/YARN-4361
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.2
>Reporter: jialei weng
>  Labels: patch
> Attachments: YARN-4361v1.patch
>
>
> Total resource count mistake:
> NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
> newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
> scheduler in different queue. So it cannot guarantee the remove-update-add 
> operation in sequence. Usually the total resource will reduce the 
> newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4361) Total resource count mistake:NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the newNode.getTotalCapability() in Multi-thread model

2015-11-16 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4361:
--
Description: 
Total resource count mistake:
NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
scheduler in different queue. So it cannot guarantee the remove-update-add 
operation in sequence. Sometimes the total resource will reduce the 
newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.

  was:
Total resource count mistake:
NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
scheduler in different queue. So it cannot guarantee the remove-update-add 
operation in sequence. Usually the total resource will reduce the 
newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.


> Total resource count mistake:NodeRemovedSchedulerEvent in 
> ReconnectNodeTransition will reduce the newNode.getTotalCapability() in 
> Multi-thread model
> 
>
> Key: YARN-4361
> URL: https://issues.apache.org/jira/browse/YARN-4361
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.2
>Reporter: jialei weng
>  Labels: patch
> Attachments: YARN-4361v1.patch
>
>
> Total resource count mistake:
> NodeRemovedSchedulerEvent in ReconnectNodeTransition will reduce the 
> newNode.getTotalCapability() in Multi-thread model. Since the RMNode and 
> scheduler in different queue. So it cannot guarantee the remove-update-add 
> operation in sequence. Sometimes the total resource will reduce the 
> newNode.getTotalCapability() when handling NodeRemovedSchedulerEvent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006454#comment-15006454
 ] 

Hadoop QA commented on YARN-4348:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 7m 24s 
{color} | {color:red} root in branch-2.7 failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in branch-2.7 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 229, now 230). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 2s 
{color} | {color:red} The patch has 700 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 19s 
{color} | {color:red} The patch has 95 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 44m 16s 
{color} | {color:red} Patch generated 71 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 167m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit 

[jira] [Created] (YARN-4363) In TestFairScheduler, testcase should not create FairScheduler redundantly

2015-11-16 Thread Tao Jie (JIRA)
Tao Jie created YARN-4363:
-

 Summary: In TestFairScheduler, testcase should not create 
FairScheduler redundantly
 Key: YARN-4363
 URL: https://issues.apache.org/jira/browse/YARN-4363
 Project: Hadoop YARN
  Issue Type: Test
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Tao Jie
Priority: Trivial


I am trying to make some improvement on fairscheduler, but get some test 
failure on TestFairScheduler, due to redundant FairScheduler creation:
In TestFairScheduler, FairScheduler and RM is created, then set RMContext of RM 
to scheduler.
{code}
@Before
  public void setUp() throws IOException {
scheduler = new FairScheduler();
conf = createConfiguration();
resourceManager = new MockRM(conf);
scheduler.setRMContext(resourceManager.getRMContext());
  }
{code}
However in several case, scheduler is renewed, as a result RMcontext in 
scheduler is null.
{code}
 @Test  
  public void testMinZeroResourcesSettings() throws IOException {  
scheduler = new FairScheduler();
YarnConfiguration conf = new YarnConfiguration();
...
scheduler.init(conf);
{code}
Then do scheduler.init(conf), I get a NPE(I try to get something from RMContext 
in scheduler initialization).
So FairScheduler should not be renewed in test block.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4363) In TestFairScheduler, testcase should not create FairScheduler redundantly

2015-11-16 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4363:
--
Attachment: YARN-4363.001.patch

> In TestFairScheduler, testcase should not create FairScheduler redundantly
> --
>
> Key: YARN-4363
> URL: https://issues.apache.org/jira/browse/YARN-4363
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Priority: Trivial
> Attachments: YARN-4363.001.patch
>
>
> I am trying to make some improvement on fairscheduler, but get some test 
> failure on TestFairScheduler, due to redundant FairScheduler creation:
> In TestFairScheduler, FairScheduler and RM is created, then set RMContext of 
> RM to scheduler.
> {code}
> @Before
>   public void setUp() throws IOException {
> scheduler = new FairScheduler();
> conf = createConfiguration();
> resourceManager = new MockRM(conf);
> scheduler.setRMContext(resourceManager.getRMContext());
>   }
> {code}
> However in several case, scheduler is renewed, as a result RMcontext in 
> scheduler is null.
> {code}
>  @Test  
>   public void testMinZeroResourcesSettings() throws IOException {  
> scheduler = new FairScheduler();
> YarnConfiguration conf = new YarnConfiguration();
> ...
> scheduler.init(conf);
> {code}
> Then do scheduler.init(conf), I get a NPE(I try to get something from 
> RMContext in scheduler initialization).
> So FairScheduler should not be renewed in test block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >