date:20150805


[ 
https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655239#comment-14655239
 ] 

Junping Du commented on YARN-4019:
--

Hi [~rkanter], thanks for contributing the patch! The Patch LGTM in overall. A 
small comment is: can we move below line from serviceInit() to serviceStart()?
{code}
+  pauseMonitor.start();
{code}
Other looks fine to me. I think the test failure is not related to your patch. 
Isn't it?

 Add JvmPauseMonitor to ResourceManager and NodeManager
 --

 Key: YARN-4019
 URL: https://issues.apache.org/jira/browse/YARN-4019
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-4019.001.patch, YARN-4019.002.patch


 We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager 
 and NodeManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

[
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658386#comment-14658386
]

Junping Du commented on YARN-3045:
--

Thanks [~sjlee0] and [~Naganarasimha] for quickly reply.
bq. If these events are attributes of applications, then they should be on the
application entities. If I want to find out all events for some application,
then I should be able to query only the application entity and get all events.
Some of these events are related to both application and NodeManager. We can
claim that it belongs to application but we can see that some events are too
detailed to application but could be more interested for YARN daemons. I can
understand that our design is more application centric now but should be
generic enough to store/retrival YARN daemon centric entities later. Anyway,
before making NM/RM onboard as the first class consumer of ATSv2, I am fine
with making them as application events.

bq. The need to have NodeManagerEntity is something different IMO. Note that
today there are challenges in emitting data without any application context
(e.g. node manager's configuration) as we discussed a few times. If we need to
support that, that needs a different discussion.
I see. I remember to see a JIRA work is to get ride of application context but
cannot find it now. In case we don't have it, how about move this discussion to
YARN-3959? The original scope of that JIRA is application related configuration
only but we could extend it to include daemon configuration if necessary.

bq. my assumption was that the sync/async distinction from the client
perspective mapped to whether the writer may be flushed or not. If not, then we
need to support a 2x2 matrix of possibilities: sync put w/ flush, sync put w/o
flush, async put w/ flush, and async put w/o flush. I thought it would be a
simplifying assumption to align those dimensions.
I think we can simplify 2x2 matrix by omitting the case of sync put w/o flush
as I cannot think a valid case that ack from TimelineCollector without flush
can help on. Rest of three cases sounds solid to me. To make TimelineCollector
can identify flush strategies with async calls, we may need to set severity on
entities need to put and TimelineCollector is configured to flush entities only
above specific severity just like log level does.

bq. I was under the impression that YARN-3367 is only for invoking REST calls
in nonblocking way and thus avoiding threads in the clients. Is it also related
to flush when called only putEntities and not on putEntitiesAsync?
You are right that the goal of YARN-3367 is to get rid of blocking call to put
entities, no matter it calls putEntities() or something else.
putEntitiesAsync() is exactly what we need, and it should be rare case to use
putEntities() once we have putEntitiesAsync except client logic rely on return
results tightly.

bq. I see currently async parameter as part of REST request is ignored now,
so i thought based on this param we may need to further flush the writer or is
your thoughts similar to support 2*2 matrix as Sangjin was informing?
Actually, from my above comments, I would prefer the way of (2*2 - 1). :) To
speed up this JIRA's progress, I am fine with keep ignoring sync/async
parameter and do everything async for now and left it out to a dedicated JIRA
to figure out.

Will look at latest patch soon.

[Event producers] Implement NM writing container lifecycle events to ATS

Key: YARN-3045
URL: https://issues.apache.org/jira/browse/YARN-3045
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3045-YARN-2928.002.patch,
YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch,
YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch,
YARN-3045.20150420-1.patch

Per design in YARN-2928, implement NM writing container lifecycle events and
container system metrics to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3952) Fix new findbugs warnings in resourcemanager in YARN-2928 branch


[ 
https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658419#comment-14658419
 ] 

Junping Du commented on YARN-3952:
--

Great. Thx! Which jira patch you plan to have this fix? We can resolve this as 
duplicated to that one.

 Fix new findbugs warnings in resourcemanager in YARN-2928 branch
 

 Key: YARN-3952
 URL: https://issues.apache.org/jira/browse/YARN-3952
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3952-YARN-2928.01.patch


 {noformat}
 file 
 classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='79'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='76'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='73'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='67'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='70'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='82'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='85'/
 /file
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3952) Fix new findbugs warnings in resourcemanager in YARN-2928 branch


[ 
https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658420#comment-14658420
 ] 

Junping Du commented on YARN-3952:
--

Great. Thx! Which jira patch you plan to have this fix? We can resolve this as 
duplicated to that one.

 Fix new findbugs warnings in resourcemanager in YARN-2928 branch
 

 Key: YARN-3952
 URL: https://issues.apache.org/jira/browse/YARN-3952
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3952-YARN-2928.01.patch


 {noformat}
 file 
 classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='79'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='76'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='73'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='67'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='70'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='82'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='85'/
 /file
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (YARN-3952) Fix new findbugs warnings in resourcemanager in YARN-2928 branch


 [ 
https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3952:
-
Comment: was deleted

(was: Great. Thx! Which jira patch you plan to have this fix? We can resolve 
this as duplicated to that one.)

 Fix new findbugs warnings in resourcemanager in YARN-2928 branch
 

 Key: YARN-3952
 URL: https://issues.apache.org/jira/browse/YARN-3952
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3952-YARN-2928.01.patch


 {noformat}
 file 
 classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='79'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='76'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='73'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='67'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent
  in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='70'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='82'/
 BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
 message='Unchecked/unconfirmed cast from 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent 
 in 
 org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
  lineNumber='85'/
 /file
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-08-05 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3992:
--
Attachment: 0003-YARN-3992.patch

Thanks [~rohithsharma]. Change method signature so that existing test cases can 
use old method itself.

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch, 
 0003-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658535#comment-14658535
 ] 

Hadoop QA commented on YARN-3992:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   7m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  54m 20s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  74m 29s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748870/0003-YARN-3992.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 52f3525 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8772/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8772/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8772/console |


This message was automatically generated.

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch, 
 0003-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2672) Improve Gridmix (synthetic generator + reservation support)

2015-08-05 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2672:
---
Attachment: YARN-2672.1.patch

Moving the patch forward to trunk as is. It will need more cleaning before 
being committed, but people are using it / interested in it. So it is good to 
have it up to date.

 Improve Gridmix (synthetic generator + reservation support)
 ---

 Key: YARN-2672
 URL: https://issues.apache.org/jira/browse/YARN-2672
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: BB2015-05-TBR
 Attachments: YARN-2672.1.patch, YARN-2672.patch


 This JIRA proposes an enhancement of Gridmix that contains:
 1) a synthetic generator to produce load without the need of a trace, but 
 based on distributions
 2) include negotiation of reservations (to test YARN-1051). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3487) CapacityScheduler scheduler lock obtained unnecessarily when calling getQueue


 [ 
https://issues.apache.org/jira/browse/YARN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3487:
-
Labels: 2.6.1-candidate  (was: )

 CapacityScheduler scheduler lock obtained unnecessarily when calling getQueue
 -

 Key: YARN-3487
 URL: https://issues.apache.org/jira/browse/YARN-3487
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: YARN-3487.001.patch, YARN-3487.002.patch, 
 YARN-3487.003.patch


 Recently saw a significant slowdown of applications on a large cluster, and 
 we noticed there were a large number of blocked threads on the RM.  Most of 
 the blocked threads were waiting for the CapacityScheduler lock while calling 
 getQueueInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2922) ConcurrentModificationException in CapacityScheduler's LeafQueue


 [ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2922:
-
Labels: 2.6.1-candidate  (was: )

 ConcurrentModificationException in CapacityScheduler's LeafQueue
 

 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Rohith Sharma K S
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: 0001-YARN-2922.patch, 0001-YARN-2922.patch


 java.util.ConcurrentModificationException
 at 
 java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3487) CapacityScheduler scheduler lock obtained unnecessarily when calling getQueue


 [ 
https://issues.apache.org/jira/browse/YARN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3487:
-
Labels:   (was: 2.6.1-candidate)

 CapacityScheduler scheduler lock obtained unnecessarily when calling getQueue
 -

 Key: YARN-3487
 URL: https://issues.apache.org/jira/browse/YARN-3487
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 2.7.1

 Attachments: YARN-3487.001.patch, YARN-3487.002.patch, 
 YARN-3487.003.patch


 Recently saw a significant slowdown of applications on a large cluster, and 
 we noticed there were a large number of blocked threads on the RM.  Most of 
 the blocked threads were waiting for the CapacityScheduler lock while calling 
 getQueueInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2922) ConcurrentModificationException in CapacityScheduler's LeafQueue


 [ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2922:
-
Labels:   (was: 2.6.1-candidate)

 ConcurrentModificationException in CapacityScheduler's LeafQueue
 

 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Rohith Sharma K S
 Fix For: 2.7.0

 Attachments: 0001-YARN-2922.patch, 0001-YARN-2922.patch


 java.util.ConcurrentModificationException
 at 
 java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3988) DockerContainerExecutor should allow user specify docker run parameters

2015-08-05 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658592#comment-14658592
 ] 

Chen He commented on YARN-3988:
---

I will post WIP patch, working on unit test, it requires docker ready on the 
compiling machine. 

 DockerContainerExecutor should allow user specify docker run parameters
 -

 Key: YARN-3988
 URL: https://issues.apache.org/jira/browse/YARN-3988
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chen He
Assignee: Chen He

 In current DockerContainerExecutor, the docker run command has fixed 
 parameters:
 String commandStr = commands.append(dockerExecutor)
   .append( )
   .append(run)
   .append( )
   .append(--rm --net=host)
   .append( )
   .append( --name  + containerIdStr)
   .append(localDirMount)
   .append(logDirMount)
   .append(containerWorkDirMount)
   .append( )
   .append(containerImageName)
   .toString();
 For example, it is not flexible if users want to start a docker container 
 with attaching extra volume(s) and other docker run parameters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658619#comment-14658619
 ] 

Anubhav Dhoot commented on YARN-3736:
-

Findbugs result is clean 
https://builds.apache.org/job/PreCommit-YARN-Build/8765/artifact/patchprocess/patchFindbugsWarningshadoop-yarn-server-resourcemanager.html.

 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
 YARN-3736.005.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2672) Improve Gridmix (synthetic generator + reservation support)


[ 
https://issues.apache.org/jira/browse/YARN-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658648#comment-14658648
 ] 

Hadoop QA commented on YARN-2672:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   8m 18s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 23s | The applied patch generated 
7 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 31s | The applied patch generated  
177 new checkstyle issues (total was 126, now 300). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 14  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m  1s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | tools/hadoop tests |  11m 11s | Tests failed in 
hadoop-gridmix. |
| | |  50m 12s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-gridmix |
| Failed unit tests | hadoop.mapred.gridmix.TestGridmixMemoryEmulation |
|   | hadoop.mapred.gridmix.TestSleepJob |
|   | hadoop.mapred.gridmix.TestHighRamJob |
| Timed out tests | org.apache.hadoop.mapred.gridmix.TestGridmixSubmission |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748880/YARN-2672.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4ab49a4 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8773/artifact/patchprocess/diffcheckstylehadoop-gridmix.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/artifact/patchprocess/newPatchFindbugsWarningshadoop-gridmix.html
 |
| hadoop-gridmix test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/artifact/patchprocess/testrun_hadoop-gridmix.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8773/console |


This message was automatically generated.

 Improve Gridmix (synthetic generator + reservation support)
 ---

 Key: YARN-2672
 URL: https://issues.apache.org/jira/browse/YARN-2672
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: BB2015-05-TBR
 Attachments: YARN-2672.1.patch, YARN-2672.patch


 This JIRA proposes an enhancement of Gridmix that contains:
 1) a synthetic generator to produce load without the need of a trace, but 
 based on distributions
 2) include negotiation of reservations (to test YARN-1051). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-08-05 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658665#comment-14658665
 ] 

Carlo Curino commented on YARN-4003:


I think you are correct, but given that the size of the allocation for a 
ReservationQueue can fluctuate substantially, I think we need to rely on either 
hard-limits on the number of applications running per queue, or careful user 
submissions will need to protect from the bad scenario you describe. 
Picking the opposite strategy (using actual reservation size to limit number of 
AMs) prevents us to scavange resources early in the reservation, and I am not 
sure even works properly if the reservation shrinks (does ProportionalCPP 
kill AMs running in a queue, if the capacity is shrinked?).

I agree with your concerns, but I don't see a better way out.



 ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
 not consistent
 

 Key: YARN-4003
 URL: https://issues.apache.org/jira/browse/YARN-4003
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
 Attachments: YARN-4003.patch


 The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
 good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-05 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658671#comment-14658671
 ] 

Joep Rottinghuis commented on YARN-3045:


Yeah, the discussion thread here seems to be rather deep. The first patch was 
put up in April and later versions of the patch seem to have been +1'ed already 
by several folks.
If there is a fundamental problem with the patch, we should address it.

It would be good to keep the comments here really focused on this patch and 
open separate jiras for separate topics to that we can keep making progress.

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-05 Thread Matthew Jacobs (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658674#comment-14658674
 ] 

Matthew Jacobs commented on YARN-3920:
--

[~adhoot]
{quote}
The problem is if you get that too high (such that it exceeds maximum resource 
allocation) one can accidentally disable reservation.
{quote}
That might be desired in some circumstances, no?

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: yARN-3920.001.patch, yARN-3920.002.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager

2015-08-05 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4019:

Attachment: YARN-4019.003.patch

Sure.  The 003 patch moves the start lines.  The test failure was unrelated.

 Add JvmPauseMonitor to ResourceManager and NodeManager
 --

 Key: YARN-4019
 URL: https://issues.apache.org/jira/browse/YARN-4019
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-4019.001.patch, YARN-4019.002.patch, 
 YARN-4019.003.patch


 We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager 
 and NodeManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658765#comment-14658765
 ] 

Arun Suresh commented on YARN-3736:
---

Looks good. Thanks for the patch [~adhoot] and the reviews [~subru]
Will be committing this shortly..

 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
 YARN-3736.005.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658773#comment-14658773
 ] 

Arun Suresh commented on YARN-3736:
---

Committed to trunk and branch-2

 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
 YARN-3736.005.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658798#comment-14658798
 ] 

Bikas Saha commented on YARN-3736:
--

Folks, how much load is this going to add to the state store (in terms of 
actual data volume and rate of stores/updates)? The original design of the RM 
state store assumes low volume of data and updates (proportional to number of 
new applications entering the system). So asking. Please let me know if this 
has been discussed and I missed that part.

 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
 YARN-3736.005.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3961) Expose queue container information (pending, running, reserved) in REST api and yarn top

2015-08-05 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-3961:
--
Attachment: YARN-3961.001.patch

Re-uploading last patch to kick jenkins

 Expose queue container information (pending, running, reserved) in REST api 
 and yarn top
 

 Key: YARN-3961
 URL: https://issues.apache.org/jira/browse/YARN-3961
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, webapp
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: Screen Shot 2015-07-22 at 6.17.38 PM.png, Screen Shot 
 2015-07-22 at 6.19.31 PM.png, Screen Shot 2015-07-22 at 6.28.05 PM.png, 
 YARN-3961.001.patch, YARN-3961.001.patch


 It would be nice to expose container (allocated, pending, reserved) 
 information in the rest API and in yarn top tool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658829#comment-14658829
 ] 

Hudson commented on YARN-3736:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8265/])
YARN-3736. Add RMStateStore apis to store and load accepted reservations for 
failover (adhoot via asuresh) (Arun Suresh: rev 
f271d377357ad680924d19f07e6c8315e7c89bae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreStoreReservationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 Add RMStateStore apis to store and load accepted reservations for failover
 --

 Key: YARN-3736
 URL: https://issues.apache.org/jira/browse/YARN-3736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3736.001.patch, YARN-3736.001.patch, 
 YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch, 
 YARN-3736.005.patch


 We need to persist the current state of the plan, i.e. the accepted 
 ReservationAllocations  corresponding RLESpareseResourceAllocations  to the 
 RMStateStore so that we can recover them on RM failover. This involves making 
 all the reservation system data structures protobuf friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-05 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-221:
-
Attachment: YARN-221-8.patch

The javac warning isn't related to this patch. That is due to 
{{TestAuxServices}} cast Object to ArrayListInteger. Updated the patch to 
take care of that anyway. The new patch also addresses the checkstyle and the 
whitespace issue.

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, 
 YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

2015-08-05 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658899#comment-14658899
]

Jian He commented on YARN-3983:
---

failed test pass locally, not related
committing

Make CapacityScheduler to easier extend application allocation logic

Key: YARN-3983
URL: https://issues.apache.org/jira/browse/YARN-3983
Project: Hadoop YARN
Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Fix For: 2.8.0

Attachments: YARN-3983.1.patch, YARN-3983.2.patch, YARN-3983.3.patch,
YARN-3983.4.patch

While working on YARN-1651 (resource allocation for increasing container), I
found it is very hard to extend existing CapacityScheduler resource
allocation logic to support different types of resource allocation.
For example, there's a lot of differences between increasing a container and
allocating a container:
- Increasing a container doesn't need to check locality delay.
- Increasing a container doesn't need to build/modify a resource request tree
(ANY-RACK/HOST).
- Increasing a container doesn't need to check allocation/reservation
starvation (see {{shouldAllocOrReserveNewContainer}}).
- After increasing a container is approved by scheduler, it need to update an
existing container token instead of creating new container.
And there're lots of similarities when allocating different types of
resources.
- User-limit/queue-limit will be enforced for both of them.
- Both of them needs resource reservation logic. (Maybe continuous
reservation looking is needed for both of them).
The purpose of this JIRA is to make easier extending CapacityScheduler
resource allocation logic to support different types of resource allocation,
make common code reusable, and also better code organization.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4019) Add JvmPauseMonitor to ResourceManager and NodeManager


[ 
https://issues.apache.org/jira/browse/YARN-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658943#comment-14658943
 ] 

Hadoop QA commented on YARN-4019:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 25s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  1s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   5m 51s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m 53s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 27s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
|   | hadoop.yarn.server.nodemanager.TestDeletionService |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerResync |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748898/YARN-4019.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4ab49a4 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8774/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8774/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8774/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8774/console |


This message was automatically generated.

 Add JvmPauseMonitor to ResourceManager and NodeManager
 --

 Key: YARN-4019
 URL: https://issues.apache.org/jira/browse/YARN-4019
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-4019.001.patch, YARN-4019.002.patch, 
 YARN-4019.003.patch


 We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the ResourceManager 
 and NodeManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

2015-08-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658955#comment-14658955
 ] 

Hudson commented on YARN-3983:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8266 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8266/])
YARN-3983. Refactored CapacityScheduleri#FiCaSchedulerApp to easier extend 
container allocation logic. Contributed by Wangda Tan (jianhe: rev 
ba2313d6145a1234777938a747187373f4cd58d9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/AllocationState.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/ContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/ContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


 Make CapacityScheduler to easier extend application allocation logic
 

 Key: YARN-3983
 URL: https://issues.apache.org/jira/browse/YARN-3983
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3983.1.patch, YARN-3983.2.patch, YARN-3983.3.patch, 
 YARN-3983.4.patch


 While working on YARN-1651 (resource allocation for increasing container), I 
 found it is very hard to extend existing CapacityScheduler resource 
 allocation logic to support different types of resource allocation.
 For example, there's a lot of differences between increasing a container and 
 allocating a container:
 - Increasing a container doesn't need to check locality delay.
 - Increasing a container doesn't need to build/modify a resource request tree 
 (ANY-RACK/HOST).
 - Increasing a container doesn't need to check allocation/reservation 
 starvation (see {{shouldAllocOrReserveNewContainer}}).
 - After increasing a container is approved by scheduler, it need to update an 
 existing container token instead of creating new container.
 And there're lots of similarities when allocating different types of 
 resources.
 - User-limit/queue-limit will be enforced for both of them.
 - Both of them needs resource reservation logic. (Maybe continuous 
 reservation looking is needed for both of them).
 The purpose of this JIRA is to make easier extending CapacityScheduler 
 resource allocation logic to support different types of resource allocation, 
 make common code reusable, and also better code organization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-05 Thread Xuan Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658963#comment-14658963
]

Xuan Gong commented on YARN-221:

Thanks for the latest patch. I think that we are close. The patch looks good
overall. One nit:
* could we modify this doc in AppLogAggregatorImpl, too
{code}
// Create a set of Containers whose logs will be uploaded in this cycle.
// It includes:
// a) all containers in pendingContainers: those containers are finished
//and satisfy the retentionPolicy.
// b) some set of running containers: For all the Running containers,
// we have ContainerLogsRetentionPolicy.AM_AND_FAILED_CONTAINERS_ONLY,
// so simply set wasContainerSuccessful as true to
// bypass FAILED_CONTAINERS check and find the running containers
// which satisfy the retentionPolicy.
{code}

Also, I realized that ContainerTokenIdentifier is used here
{code}
boolean shouldDoLogAggregation(ContainerTokenIdentifier containerToken, int
exitCode);
{code}
Currently, it is fine. But if in future, we might need other information which
the ContainerTokenIdentifier can not provide. So, probably, we could have our
own ContainerLogContext instead of using ContainerTokenIdentifier ? In that
case, if we have requirement to use other information, we could add.

Thoughts ?

NM should provide a way for AM to tell it not to aggregate logs.

Key: YARN-221
URL: https://issues.apache.org/jira/browse/YARN-221
Project: Hadoop YARN
Issue Type: Sub-task
Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch,
YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch,
YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch

The NodeManager should provide a way for an AM to tell it that either the
logs should not be aggregated, that they should be aggregated with a high
priority, or that they should be aggregated but with a lower priority. The
AM should be able to do this in the ContainerLaunch context to provide a
default value, but should also be able to update the value when the container
is released.
This would allow for the NM to not aggregate logs in some cases, and avoid
connection to the NN at all.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-05 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-221:
---
Labels: new  (was: )

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
  Labels: new
 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, 
 YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-05 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-221:
---
Labels:   (was: new)

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, 
 YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659012#comment-14659012
 ] 

Li Lu commented on YARN-3049:
-

Hi [~zjshen], letting HBase implementation locally looks good to me. One minor 
comment for the latest patch is, maybe we want to separate the logic like {{if 
(te.getType().equals(TimelineEntityType.YARN_APPLICATION.toString()))}} in 
HBaseWriterImpl into a separate private method? I think it will be much clearer 
to say something like:
{code}
if (te.getType().equals(TimelineEntityType.YARN_APPLICATION.toString())) {
  updateAppToFlowTable(te);
}
{code}

As [~sjlee0] mentioned above that we may have some other specialization within 
HBaseWriterImpl, so maybe it's helpful to let these special designs stand out? 

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
 YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
 YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659016#comment-14659016
 ] 

Li Lu commented on YARN-3904:
-

Thanks [~sjlee0]! Any other comments from anyone? This JIRA is currently 
blocking the POC patch of YARN-3817. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

[
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659048#comment-14659048
]

Anubhav Dhoot commented on YARN-3920:
-

Yeah that should be possible to do when thats the intention and thats ok. I am
trying to avoid cases when its not the intention to disable reservation but it
happens due to environment+configuration.
In ratio of max case, one can disable reservation only by setting the ratio to
above 1.0.
In the multiple of increment case, one can disable it by setting say setting a
multiple to say 4 times increment and your max on the cluster (which depends
on the min node size of cluster in addition to max config) is 3 times the
increment. That can happen even if it was not your intention to disable it.

FairScheduler Reserving a node for a container should be configurable to
allow it used only for large containers

Key: YARN-3920
URL: https://issues.apache.org/jira/browse/YARN-3920
Project: Hadoop YARN
Issue Type: Improvement
Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Attachments: yARN-3920.001.patch, yARN-3920.002.patch

Reserving a node for a container was designed for preventing large containers
from starvation from small requests that keep getting into a node. Today we
let this be used even for a small container request. This has a huge impact
on scheduling since we block other scheduling requests until that reservation
is fulfilled. We should make this configurable so its impact can be minimized
by limiting it for large container requests as originally intended.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: yARN-3920.003.patch

Attaching patch that updates reservation threshold in sync with changes in the 
max resource

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: yARN-3920.001.patch, yARN-3920.002.patch, 
 yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3961) Expose queue container information (pending, running, reserved) in REST api and yarn top


[ 
https://issues.apache.org/jira/browse/YARN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659383#comment-14659383
 ] 

Hadoop QA commented on YARN-3961:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 15s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 56s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 58s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  53m 26s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 113m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748960/YARN-3961.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f59612e |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8777/console |


This message was automatically generated.

 Expose queue container information (pending, running, reserved) in REST api 
 and yarn top
 

 Key: YARN-3961
 URL: https://issues.apache.org/jira/browse/YARN-3961
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, webapp
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: Screen Shot 2015-07-22 at 6.17.38 PM.png, Screen Shot 
 2015-07-22 at 6.19.31 PM.png, Screen Shot 2015-07-22 at 6.28.05 PM.png, 
 YARN-3961.001.patch, YARN-3961.001.patch, YARN-3961.002.patch


 It would be nice to expose container (allocated, pending, reserved) 
 information in the rest API and in yarn top tool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-08-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659514#comment-14659514
 ] 

Hudson commented on YARN-3992:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8269 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8269/])
YARN-3992. TestApplicationPriority.testApplicationPriorityAllocation fails 
intermittently. (Contributed by Sunil G) (rohithsharmaks: rev 
df9e7280db58baddd02d6e23d3685efb8d5f1b97)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java


 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Fix For: 2.8.0

 Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch, 
 0003-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-08-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659491#comment-14659491
 ] 

Rohith Sharma K S commented on YARN-3992:
-

+1 lgtm

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch, 0002-YARN-3992.patch, 
 0003-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659413#comment-14659413
 ] 

Hadoop QA commented on YARN-3049:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 11s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 17s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 11s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 24s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  43m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748985/YARN-3049-YARN-2928.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 895ccfa |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8779/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8779/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8779/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8779/console |


This message was automatically generated.

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
 YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
 YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, 
 YARN-3049-YARN-2928.7.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3974) Refactor the reservation system test cases to use parameterized base test


[ 
https://issues.apache.org/jira/browse/YARN-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659409#comment-14659409
 ] 

Hadoop QA commented on YARN-3974:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 56s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m 38s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m  6s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748970/YARN-3974-v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f59612e |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8778/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8778/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8778/console |


This message was automatically generated.

 Refactor the reservation system test cases to use parameterized base test
 -

 Key: YARN-3974
 URL: https://issues.apache.org/jira/browse/YARN-3974
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: YARN-3974-v1.patch, YARN-3974-v2.patch, 
 YARN-3974-v3.patch, YARN-3974-v4.patch, YARN-3974-v5.patch


 We have two test suites for testing ReservationSystem against Capacity  Fair 
 scheduler. We should combine them using a parametrized reservation system 
 base test similar to YARN-2797



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-05 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659301#comment-14659301
 ] 

Naganarasimha G R commented on YARN-3045:
-

[~djp]  [~sjlee0]
Waiting for comments on the approach taken and wanted what exact aspects of 
localization needs to be captured. Is it related to 
{{ResourceLocalizationService events, i.e. 
LocalizationEventType.INIT_CONTAINER_RESOURCES  
CONTAINER_RESOURCES_LOCALIZED}} / {{events of each individual Localized 
Resource i.e. ResourceEventType.REQUEST, LOCALIZED  LOCALIZATION_FAILED}} / 
{{ContainerEventType.RESOURCE_LOCALIZED   RESOURCE_FAILED}} is sufficient to 
be captured ? 
In my opinion last option is sufficient or shall we handle localization events 
in another jira. Please share your thoughts ?

bq. . Anyway, before making NM/RM onboard as the first class consumer of ATSv2, 
I am fine with making them as application events.
Ok, going ahead to capture them under Application Entity.


 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-05 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659323#comment-14659323
 ] 

Sangjin Lee commented on YARN-3045:
---

bq. In my opinion last option is sufficient or shall we handle localization 
events in another jira. Please share your thoughts ?

Sorry but it's not clear what the 2 options are. Could you kindly rephrase the 
options?

Also, one clarifying question. Are some of these events already existing 
container events? If so, they shouldn't be repeated as application events 
redundantly, right? What would be the application-specific events that are 
*not* captured by container events?

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-08-05 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-YARN-2928.7.patch

Attach a new patch:

1. Rebase against YARN-3984.
2. Address Sangjin and Li's comments.

There's still a remaining issue: the timestamp will not be ser/des correctly by 
using UTF-8. I didn't figure the reason, but I did an experiment that the bytes 
were converted into string and then bytes, and they became different. Still 
need to do more investigation about this problem.

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, 
 YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, 
 YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, 
 YARN-3049-YARN-2928.7.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-08-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659183#comment-14659183
 ] 

Zhijie Shen commented on YARN-3984:
---

bq. If the info map is not empty, this record would be redundant and will take 
up storage space.

Make sense. The patch looks good to me. Will commit it.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3984-YARN-2928.001.patch


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4020) Exception happens while stopContainer in AM

2015-08-05 Thread sandflee (JIRA)

sandflee created YARN-4020:
--

 Summary: Exception happens while stopContainer in AM
 Key: YARN-4020
 URL: https://issues.apache.org/jira/browse/YARN-4020
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: sandflee


work-preserving-enable, RM HA, NM restart all enabled. The AM has running for a 
few weeks and when using NMClient.stopContainer, the following exception 
happens:

WARN ipc.Client: Exception encountered while connecting to the server : 
org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
DIGEST-MD5: digest response format violation. Mismatched response.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.


[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659204#comment-14659204
 ] 

Hadoop QA commented on YARN-221:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 21s | The applied patch generated  1 
new checkstyle issues (total was 212, now 212). |
| {color:green}+1{color} | whitespace |   1m 49s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 29s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 35s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  52m 48s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 140m 16s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748921/YARN-221-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ba2313d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8776/console |


This message was automatically generated.

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, 
 YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4021) RuntimeException/YarnRuntimeException sent over to the client can cause client to assume a local fatal failure

Anubhav Dhoot created YARN-4021:
---

 Summary: RuntimeException/YarnRuntimeException sent over to the 
client can cause client to assume a local fatal failure 
 Key: YARN-4021
 URL: https://issues.apache.org/jira/browse/YARN-4021
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Currently RuntimeException and its derived types such as YarnRuntimeExceptions 
are serialized over to the client and thrown at the client after YARN-731. This 
can cause issues like MAPREDUCE-6439 where we assume a local fatal exception 
has happened. 
Instead we should have a way to distinguish local RuntimeException versus 
remote RuntimeException to avoid these issues. We need to go over all the 
current client side code that is expecting a remote RuntimeException inorder to 
make it work with this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover

2015-08-05 Thread Subru Krishnan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659080#comment-14659080
]

Subru Krishnan commented on YARN-3736:
--

Thanks [~adhoot] for the patch and [~asuresh] for reviewing/committing the
patch.

[~bikassaha], we did have this discussion. The number of new reservations will
be bounded by the number of applications as having a reservation per
application is the worst case scenario. Practically you will have a reservation
per job pipeline as a production pipeline is rarely composed of one job.
Similarly Hive/Pig/Tez DAGs which are composed of multiple YARN apps can also
be represented as a single reservation. Moreover not all new applications will
use reservations, just those which need SLA and those generally tend to be
bigger but fewer. Considering all this, our approach is aligned with the design
of the state store. Makes sense?

Add RMStateStore apis to store and load accepted reservations for failover
--

Key: YARN-3736
URL: https://issues.apache.org/jira/browse/YARN-3736
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot
Fix For: 2.8.0

Attachments: YARN-3736.001.patch, YARN-3736.001.patch,
YARN-3736.002.patch, YARN-3736.003.patch, YARN-3736.004.patch,
YARN-3736.005.patch

We need to persist the current state of the plan, i.e. the accepted
ReservationAllocations corresponding RLESpareseResourceAllocations to the
RMStateStore so that we can recover them on RM failover. This involves making
all the reservation system data structures protobuf friendly.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3961) Expose queue container information (pending, running, reserved) in REST api and yarn top


[ 
https://issues.apache.org/jira/browse/YARN-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659109#comment-14659109
 ] 

Hadoop QA commented on YARN-3961:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 57s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m 10s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  54m 49s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 112m  4s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748913/YARN-3961.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f271d37 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8775/console |


This message was automatically generated.

 Expose queue container information (pending, running, reserved) in REST api 
 and yarn top
 

 Key: YARN-3961
 URL: https://issues.apache.org/jira/browse/YARN-3961
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, webapp
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: Screen Shot 2015-07-22 at 6.17.38 PM.png, Screen Shot 
 2015-07-22 at 6.19.31 PM.png, Screen Shot 2015-07-22 at 6.28.05 PM.png, 
 YARN-3961.001.patch, YARN-3961.001.patch


 It would be nice to expose container (allocated, pending, reserved) 
 information in the rest API and in yarn top tool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659121#comment-14659121
 ] 

Li Lu commented on YARN-3045:
-

bq. To speed up this JIRA's progress, I am fine with keep ignoring sync/async 
parameter and do everything async for now and left it out to a dedicated JIRA 
to figure out.

+1. This JIRA has been hanging there for quite a while. Let's move forward with 
pending storage API problems addressed in separate JIRAs. 

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3961) Expose queue container information (pending, running, reserved) in REST api and yarn top