date:20150713


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624370#comment-14624370
 ] 

Hadoop QA commented on YARN-3381:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  4s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |   9m  8s | Tests failed in 
hadoop-mapreduce-client-app. |
| {color:red}-1{color} | yarn tests |   6m 53s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   6m  4s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m 22s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 128m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator |
|   | hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart |
|   | hadoop.yarn.server.nodemanager.TestDeletionService |
|   | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer |
|   | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744987/YARN-3381-011.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e04faf8 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8520/console |


This message was automatically generated.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch,

[jira] [Updated] (YARN-3381) Fix typo InvalidStateTransitonException

2015-07-13 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3381:

Labels:   (was: BB2015-05-TBR)

 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler

2015-07-13 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1462#comment-1462
 ] 

Rohith Sharma K S commented on YARN-2003:
-

All the above test failures are related to YARN-3916

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-13 Thread Naganarasimha G R (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624472#comment-14624472
]

Naganarasimha G R commented on YARN-3045:
-

Hi [~djp],
Sorry for the delayed response! and some points to discuss for your queries as
follows :
bq why we hook the track of container start event in ContainerManagerImpl, but
for container finished event, we do it inside of ContainerImpl? We should try
to keep NMTimelinePublisher get referenced in one place if no necessary for
other places.
This was intentionally done to avoid resending of timelineevents during
recovery. In RM's case also it was happening(which is being handled in
YARN-3127) hence to avoid duplicate events have kept it there. If any better
ways to avoid, i am open for it .

Other comments will take care, some of it are due to missing to revert the code
while testing ...

[Event producers] Implement NM writing container lifecycle events to ATS

Key: YARN-3045
URL: https://issues.apache.org/jira/browse/YARN-3045
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3045-YARN-2928.002.patch,
YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch

Per design in YARN-2928, implement NM writing container lifecycle events and
container system metrics to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624506#comment-14624506
 ] 

Hudson commented on YARN-3381:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624503#comment-14624503
 ] 

Hudson commented on YARN-3069:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624505#comment-14624505
 ] 

Hudson commented on YARN-3894:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException

2015-07-13 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624469#comment-14624469
 ] 

Brahma Reddy Battula commented on YARN-3381:


[~ajisakaa] thanks a lot for review and commit,and thanks all who discussed the 
problem.

 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3381) Fix typo InvalidStateTransitonException

2015-07-13 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3381:

Priority: Minor  (was: Major)
 Summary: Fix typo InvalidStateTransitonException  (was: A typographical 
error in InvalidStateTransitonException)

 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException

2015-07-13 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624380#comment-14624380
 ] 

Akira AJISAKA commented on YARN-3381:
-

+1, the test failures look unrelated to the patch. Thanks Brahma.

 A typographical error in InvalidStateTransitonException
 -

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
  Labels: BB2015-05-TBR
 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624394#comment-14624394
 ] 

Hudson commented on YARN-3381:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8156 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8156/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler

2015-07-13 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624465#comment-14624465
 ] 

Rohith Sharma K S commented on YARN-2003:
-

Oho, did not see Sunil's comment earlier!!

 Support for Application priority : Changes in RM and Capacity Scheduler
 ---

 Key: YARN-2003
 URL: https://issues.apache.org/jira/browse/YARN-2003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
  Labels: BB2015-05-TBR
 Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 
 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 
 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 
 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 
 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 
 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 
 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch


 AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
 Submission Context and store.
 Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-07-13 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624477#comment-14624477
 ] 

Naganarasimha G R commented on YARN-2934:
-

Hi [~jira.shegalov], 
Sorry for the long gap in handling this, but had a query related to this, tail 
can only work in case of linux systems so was wondering how to keep it as a 
neutral implementation ? May be RandomAccessFile, thoughts ?

 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty

2015-07-13 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624476#comment-14624476
 ] 

Naganarasimha G R commented on YARN-3911:
-

Hi [~bikassaha], YARN-2688 and YARN-2934, are with similar intentions, if 
required can finish YARN-2934... but tail can only work in case of linux 
systems so was wondering how to keep it as a neutral implementations ? May be 
RandomAccessFile, thoughts ?

 Add tail of stderr to diagnostics if container fails to launch or it 
 container logs are empty
 -

 Key: YARN-3911
 URL: https://issues.apache.org/jira/browse/YARN-3911
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha

 The stderr may have useful info in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624513#comment-14624513
 ] 

Hudson commented on YARN-3381:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/985/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624512#comment-14624512
 ] 

Hudson commented on YARN-3894:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/985/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624510#comment-14624510
 ] 

Hudson commented on YARN-3069:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/985/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625732#comment-14625732
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  
14 new checkstyle issues (total was 234, now 241). |
| {color:red}-1{color} | whitespace |   0m  4s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 23s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745158/YARN-3635.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a431ed9 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8526/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8526/console |


This message was automatically generated.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625826#comment-14625826
 ] 

Varun Saxena commented on YARN-3878:


[~jianhe], will update a patch soon.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: (was: YARN-3878-addendum.patch)

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624703#comment-14624703
 ] 

Hudson commented on YARN-3381:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624700#comment-14624700
 ] 

Hudson commented on YARN-3069:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624713#comment-14624713
 ] 

MENG DING commented on YARN-1449:
-

[~jianhe], I think the {{initContainersToIncrease()}} is still needed, because 
it may need to create the {{this.containersToIncrease}} if it is null.
The {{AllocateRequestPBImpl.setAskList()}} uses the same logic. It seemed a 
little awkward to me too at first though.

 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, 
 yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624720#comment-14624720
 ] 

Hudson commented on YARN-3069:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624722#comment-14624722
 ] 

Hudson commented on YARN-3894:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* hadoop-yarn-project/CHANGES.txt


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Updated] (YARN-3866) AM-RM protocol changes to support container resizing


 [ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-3866:

Attachment: YARN-3866.3.patch

Thanks all for the review and comments!

Updated the patch to:
* Mark all new API methods Unstable
* Reuse the Container object for decreased/increased containers



 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624702#comment-14624702
 ] 

Hudson commented on YARN-3894:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624723#comment-14624723
 ] 

Hudson commented on YARN-3381:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625866#comment-14625866
 ] 

Varun Saxena commented on YARN-3893:


*Reinitialization of Active Services is required*. When you call stop active 
services, service state for all the services will change to STOPPED.
If this RM were to become active again, we will try to start all the active 
services and services cant transition to START state from STOPPED state. They 
can only do so when services are in INIT state.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-13 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625689#comment-14625689
 ] 

Anubhav Dhoot commented on YARN-2005:
-

The actual blacklist is already available in the REST API for RM. 
http://localhost:23188/ws/v1/cluster/apps/application_1436839322176_0001/appattempts.
 We can add a metric if you still feel its needed.

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-07-13 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625703#comment-14625703
 ] 

Junping Du commented on YARN-3815:
--

Hi [~sjlee0], sorry for replying your comments late. Just busy in delivering a 
quick poc patch for app level aggregation (system metrics only, not include 
conflict idea part) in YARN-3816. Will back to your questions  when figure that 
out.

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf, aggregation-design-discussion.pdf, 
 hbase-schema-proposal-for-aggregation.pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-07-13 Thread JIRA

[
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625207#comment-14625207
]

Bartosz Ługowski commented on YARN-1621:

I would appreciate if anyone could review this patch. Thanks.

Add CLI to list rows of task attempt ID, container ID, host of container,
state of container
--

Key: YARN-1621
URL: https://issues.apache.org/jira/browse/YARN-1621
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ługowski
Labels: BB2015-05-TBR
Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch,
YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch

As more applications are moved to YARN, we need generic CLI to list rows of
task attempt ID, container ID, host of container, state of container. Today
if YARN application running in a container does hang, there is no way to find
out more info because a user does not know where each attempt is running in.
For each running application, it is useful to differentiate between
running/succeeded/failed/killed containers.

{code:title=proposed yarn cli}
$ yarn application -list-containers -applicationId appId [-containerState
state of container]
where containerState is optional filter to list container in given state only.
container state can be running/succeeded/killed/failed/all.
A user can specify more than one container state at once e.g. KILLED,FAILED.
task attempt ID container ID host of container state of container
{code}
CLI should work with running application/completed application. If a
container runs many task attempts, all attempts should be shown. That will
likely be the case of Tez container-reuse application.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-07-13 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3816:
-
Attachment: YARN-3816-poc-v1.patch

Upload a quick POC patch for app level aggregation with aggregating application 
metrics at YARN system metrics level (not including framework specific 
counts/metrics).
Please note that the patch hasn't been ready for review as lacking of basic 
polish and end to end testing. Significant changes will happen later, at least 
include:
- separate calculations (SUM, AVG) out of TimelineMetrics as static methods
- writing aggregated data should be moved from entity table to a separated 
application table
- new added/modified APIs need to more considerable refactor
- more key/completed unit tests should be added

 [Aggregation] App-level Aggregation for YARN system metrics
 ---

 Key: YARN-3816
 URL: https://issues.apache.org/jira/browse/YARN-3816
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: Application Level Aggregation of Timeline Data.pdf, 
 YARN-3816-poc-v1.patch


 We need application level aggregation of Timeline data:
 - To present end user aggregated states for each application, include: 
 resource (CPU, Memory) consumption across all containers, number of 
 containers launched/completed/failed, etc. We need this for apps while they 
 are running as well as when they are done.
 - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
 aggregated to show details of states in framework level.
 - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
 on Application-level aggregations rather than raw entity-level data as much 
 less raws need to scan (with filter out non-aggregated entities, like: 
 events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625137#comment-14625137
 ] 

Wangda Tan commented on YARN-3866:
--

Latest patch looks good, [~mding], could you set status of this JIRA to patch 
available to kick Jenkins? 

 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader


 [ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3814:
---
Summary: REST API implementation for getting raw entities in TimelineReader 
 (was: REST API implementation for TimelineReader)

 REST API implementation for getting raw entities in TimelineReader
 --

 Key: YARN-3814
 URL: https://issues.apache.org/jira/browse/YARN-3814
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy


[ 
https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625163#comment-14625163
 ] 

Wangda Tan commented on YARN-3873:
--

Hi [~sunilg],
I can understand the value of supporting customized comparator for pending 
applications (for example priority-based activation), but I'm not sure if using 
same comparator of orderingPolicy is also valid for pendingApplications.

For example, fair comparator considers demand resources, this may not make 
sense when comparing pending applications. It makes more sense to me if we 
activate application by its submission order instead of demand (size of AM 
container resource request).

How about change the JIRA purpose to be: support customized comparator to 
activate applications. You can do this by adding a getActivateIterator to 
OrderingPolicy or creating new interface for it.

I also suggest to put this as a sub JIRA of YARN-3306 for better tracking.

Thoughts?

 pendingApplications in LeafQueue should also use OrderingPolicy
 ---

 Key: YARN-3873
 URL: https://issues.apache.org/jira/browse/YARN-3873
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch


 Currently *pendingApplications* in LeafQueue is using 
 {{applicationComparator}} from CapacityScheduler. This can be changed and 
 pendingApplications can use the OrderingPolicy configured in Queue level 
 (Fifo/Fair as configured). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend

2015-07-13 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3049:
--
Attachment: YARN-3049-WIP.1.patch

Attache a WIP patch so that the community can take a look while I still need to 
add the app-flow mapping and some missing fields.

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625206#comment-14625206
 ] 

Hadoop QA commented on YARN-3904:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 21s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   7m 58s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  44m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl |
|   | 
hadoop.yarn.server.timelineservice.aggregation.timebased.TestPhoenixAggregatorWriter
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745086/YARN-3904-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 2d4a8f4 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8522/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8522/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8522/console |


This message was automatically generated.

 Adopt PhoenixTimelineWriter into time-based aggregation storage
 ---

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. This JIRA 
 proposes to move the Phoenix storage implementation from 
 o.a.h.yarn.server.timelineservice.storage to 
 o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
 devoted writer for time-based aggregation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-07-13 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625212#comment-14625212
 ] 

Subru Krishnan commented on YARN-3116:
--

Thanks [~zjshen] for reviewing and committing the patch

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Fix For: 2.8.0

 Attachments: YARN-3116.patch, YARN-3116.v10.patch, 
 YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, 
 YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, 
 YARN-3116.v8.patch, YARN-3116.v9.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage

2015-07-13 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.003.patch

Fix findbugs warnings and some code formatting. 

 Adopt PhoenixTimelineWriter into time-based aggregation storage
 ---

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. This JIRA 
 proposes to move the Phoenix storage implementation from 
 o.a.h.yarn.server.timelineservice.storage to 
 o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
 devoted writer for time-based aggregation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-07-13 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625088#comment-14625088
 ] 

Varun Vasudev commented on YARN-3644:
-

One other minor comment - can you please change 
yarn.nodemanager.shutdown.on.RM.connection.failures to 
yarn.nodemanager.shutdown-on-rm-connection-failures? 

 Node manager shuts down if unable to connect with RM
 

 Key: YARN-3644
 URL: https://issues.apache.org/jira/browse/YARN-3644
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Srikanth Sundarrajan
Assignee: Raju Bairishetti
 Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
 YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch


 When NM is unable to connect to RM, NM shuts itself down.
 {code}
   } catch (ConnectException e) {
 //catch and throw the exception if tried MAX wait time to connect 
 RM
 dispatcher.getEventHandler().handle(
 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
 throw new YarnRuntimeException(e);
 {code}
 In large clusters, if RM is down for maintenance for longer period, all the 
 NMs shuts themselves down, requiring additional work to bring up the NMs.
 Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
 effects, where non connection failures are being retried infinitely by all 
 YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-13 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625019#comment-14625019
 ] 

Arun Suresh commented on YARN-3453:
---

Thanks for the reviews [~kasha], [~ashwinshankar77] and [~peng.zhang]
Will be committing this shortly..

 Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even 
 in DRF mode causing thrashing
 ---

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch, YARN-3453.5.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean


[ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625050#comment-14625050
 ] 

Hadoop QA commented on YARN-3844:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | yarn tests |   6m  2s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  21m  8s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.container.TestContainer |
|   | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745078/YARN-3844.006.patch |
| Optional Tests | javac unit |
| git revision | trunk / 19295b3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8521/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8521/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8521/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8521/console |


This message was automatically generated.

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.006.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions


[ 
https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624782#comment-14624782
 ] 

Varun Saxena commented on YARN-3877:


[~ste...@apache.org] / [~chris.douglas] kindly review

 YarnClientImpl.submitApplication swallows exceptions
 

 Key: YARN-3877
 URL: https://issues.apache.org/jira/browse/YARN-3877
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.7.2
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3877.01.patch


 When {{YarnClientImpl.submitApplication}} spins waiting for the application 
 to be accepted, any interruption during its Sleep() calls are logged and 
 swallowed.
 this makes it hard to interrupt the thread during shutdown. Really it should 
 throw some form of exception and let the caller deal with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624815#comment-14624815
 ] 

Hudson commented on YARN-3381:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624973#comment-14624973
 ] 

Sangjin Lee commented on YARN-3908:
---

I would appreciate your review on this. Thanks!

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-13 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3453:
---
Summary: Fair Scheduler: Parts of preemption logic uses 
DefaultResourceCalculator even in DRF mode causing thrashing  (was: Fair 
Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in 
DRF mode causing thrashing)

 Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even 
 in DRF mode causing thrashing
 ---

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch, YARN-3453.5.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-07-13 Thread Gera Shegalov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625060#comment-14625060
 ] 

Gera Shegalov commented on YARN-2934:
-

Hi [~Naganarasimha], yes I was thinking the same, we should try to do it in the 
java land. I'd prefer using RawLocalFileSytem#read(buf, off, len)  in order not 
to mix in java.io API. Since the NM webUI can read logs, we should have no 
problems accessing them from the NM JVM.

 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean

2015-07-13 Thread Alan Burlison (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison updated YARN-3844:

Attachment: (was: YARN-3844.005.patch)

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.006.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-07-13 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625000#comment-14625000
 ] 

Karthik Kambatla commented on YARN-3453:


+1

 Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even 
 in DRF mode causing thrashing
 ---

 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar
Assignee: Arun Suresh
 Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, 
 YARN-3453.4.patch, YARN-3453.5.patch


 There are two places in preemption code flow where DefaultResourceCalculator 
 is used, even in DRF mode.
 Which basically results in more resources getting preempted than needed, and 
 those extra preempted containers aren’t even getting to the “starved” queue 
 since scheduling logic is based on DRF's Calculator.
 Following are the two places :
 1. {code:title=FSLeafQueue.java|borderStyle=solid}
 private boolean isStarved(Resource share)
 {code}
 A queue shouldn’t be marked as “starved” if the dominant resource usage
 is =  fair/minshare.
 2. {code:title=FairScheduler.java|borderStyle=solid}
 protected Resource resToPreempt(FSLeafQueue sched, long curTime)
 {code}
 --
 One more thing that I believe needs to change in DRF mode is : during a 
 preemption round,if preempting a few containers results in satisfying needs 
 of a resource type, then we should exit that preemption round, since the 
 containers that we just preempted should bring the dominant resource usage to 
 min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean

2015-07-13 Thread Alan Burlison (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison updated YARN-3844:

Attachment: YARN-3844.006.patch

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.006.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624769#comment-14624769
 ] 

Varun Saxena commented on YARN-3878:


[~jianhe] / [~kasha], added an addendum patch.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624818#comment-14624818
 ] 

MENG DING commented on YARN-1449:
-

This patch will not build by itself, as it has dependency on YARN-3866 (in 
particular, {{IncreaseContainersResourceRequestProto}}, 
{{IncreaseContainersResourceResponseProto}}). It is very difficult to cleanly 
separate out each patch. Currently I generate a big patch and split it into 
multiple ones based on files.

If YARN-3866 passes initial review, maybe I can combine YARN-3866 and YARN-1449 
into one patch and submit that for pre-commit build?

 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, 
 yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: (was: YARN-3878-addendum.patch)

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-07-13 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624909#comment-14624909
 ] 

Varun Vasudev commented on YARN-3644:
-

[~raju.bairishetti] the latest patch also conflicts with a recent commit in 
NodeStatusUpdaterImpl. Can you please check it out? Thanks!

 Node manager shuts down if unable to connect with RM
 

 Key: YARN-3644
 URL: https://issues.apache.org/jira/browse/YARN-3644
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Srikanth Sundarrajan
Assignee: Raju Bairishetti
 Attachments: YARN-3644.001.patch, YARN-3644.001.patch, 
 YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch


 When NM is unable to connect to RM, NM shuts itself down.
 {code}
   } catch (ConnectException e) {
 //catch and throw the exception if tried MAX wait time to connect 
 RM
 dispatcher.getEventHandler().handle(
 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
 throw new YarnRuntimeException(e);
 {code}
 In large clusters, if RM is down for maintenance for longer period, all the 
 NMs shuts themselves down, requiring additional work to bring up the NMs.
 Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
 effects, where non connection failures are being retried infinitely by all 
 YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624805#comment-14624805
 ] 

Hudson commented on YARN-3069:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException


[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624808#comment-14624808
 ] 

Hudson commented on YARN-3381:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/])
YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy 
Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java


 Fix typo InvalidStateTransitonException
 ---

 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Xiaoshuang LU
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, 
 YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, 
 YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, 
 YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch


 Appears that InvalidStateTransitonException should be 
 InvalidStateTransitionException.  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624807#comment-14624807
 ] 

Hudson commented on YARN-3894:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878-addendum.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878-addendum.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624834#comment-14624834
 ] 

Varun Saxena commented on YARN-3878:


Should I reopen the issue ?

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at

[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-07-13 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--
Assignee: Tan, Wangda  (was: Chen He)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith Sharma K S
Assignee: Tan, Wangda
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing

[
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624921#comment-14624921
]

MENG DING commented on YARN-1645:
-

Thanks for the review [~jianhe] !

bq. This check should not be needed, because AM should be able to resize an
existing container no matter RM restarted or not.

I have some concerns regarding this that I hope to get some clarifications.
According to the work-preserving RM restart documentation
(http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html):

bq. RM recovers its runing state by taking advantage of the container statuses
sent from all NMs. NM will not kill the containers when it re-syncs with the
restarted RM. It continues managing the containers and send the container
statuses across to RM when it re-registers. RM reconstructs the container
instances and the associated applications’ scheduling status by absorbing these
containers’ information

Consider this scenario:
* RM approves a container resource increase request and sends an increase token
to AM.
* Before AM actually increases the resource on NM, RM crashes and then
restarts. Because of the work preserving recovery, RM re-constructs the
container resource based on the information sent by NM, and it is still the old
resource allocation for the container before the increase.
* Now AM does the increase action on NM. If NM doesn't reject this, it will
start to enforce the container with increased resource. Now the views of
resource allocation between RM and NM are inconsistent.

Thoughts?

bq. A lot of code is duplicate between authorizeStartRequest and
authorizeResourceIncreaseRequest - could you refactor the code to share the
same code ?
Will do

bq. Portion of the code belongs to YARN-1644 and the patch won't compile.
This is the same situations with YARN-1449. Everything is intertwined :-( May
need to combine everything into a big patch to submit for jenkins build.

ContainerManager implementation to support container resizing
-

Key: YARN-1645
URL: https://issues.apache.org/jira/browse/YARN-1645
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
Attachments: YARN-1645.1.patch, YARN-1645.2.patch, yarn-1645.1.patch

Implementation of ContainerManager for container resize, including:
1) ContainerManager resize logic
2) Relevant test cases

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration


[ 
https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624814#comment-14624814
 ] 

Hudson commented on YARN-3894:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/])
YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity 
configuration. (Bibin A Chundatt via wangda) (wangda: rev 
5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 RM startup should fail for wrong CS xml NodeLabel capacity configuration 
 -

 Key: YARN-3894
 URL: https://issues.apache.org/jira/browse/YARN-3894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, 
 capacity-scheduler.xml


 Currently in capacity Scheduler when capacity configuration is wrong
 RM will shutdown, but not incase of NodeLabels capacity mismatch
 In {{CapacityScheduler#initializeQueues}}
 {code}
   private void initializeQueues(CapacitySchedulerConfiguration conf)
 throws IOException {   
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 labelManager.reinitializeQueueLabels(getQueueToLabels());
 root = 
 parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
 queues, queues, noop);
 LOG.info(Initialized root queue  + root);
 initializeQueueMappings();
 setQueueAcls(authorizer, queues);
   }
 {code}
 {{labelManager}} is initialized from queues and calculation for Label level 
 capacity mismatch happens in {{parseQueue}} . So during initialization 
 {{parseQueue}} the labels will be empty . 
 *Steps to reproduce*
 # Configure RM with capacity scheduler
 # Add one or two node label from rmadmin
 # Configure capacity xml with nodelabel but issue with capacity configuration 
 for already added label
 # Restart both RM
 # Check on service init of capacity scheduler node label list is populated 
 *Expected*
 RM should not start 
 *Current exception on reintialize check*
 {code}
 2015-07-07 19:18:25,655 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, 
 usedResources=memory:0, vCores:0, usedCapacity=0.0, 
 absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 2015-07-07 19:18:25,656 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh 
 queues.
 java.io.IOException: Failed to re-init queues
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for 
 children of queue root for label=node2
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
 ... 8 more
 2015-07-07 19:18:25,656 WARN

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624812#comment-14624812
 ] 

Hudson commented on YARN-3069:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/])
YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray 
Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad)
* hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: supportability
 Fix For: 2.8.0

 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, 
 YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, 
 YARN-3069.012.patch, YARN-3069.013.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

2015-07-13 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625238#comment-14625238
 ] 

Arun Suresh commented on YARN-3535:
---

Thanks for working on this [~peng.zhang].
We seem to be hitting this on our scale clusters as well.. so would be good to 
get this in soon.
In our case the NM re-registration was caused by YARN-3842

The Patch looks good to me. Any idea why the tests failed ?

  ResourceRequest should be restored back to scheduler when RMContainer is 
 killed at ALLOCATED
 -

 Key: YARN-3535
 URL: https://issues.apache.org/jira/browse/YARN-3535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Peng Zhang
Assignee: Peng Zhang
Priority: Critical
 Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch, 
 YARN-3535-002.patch, syslog.tgz, yarn-app.log


 During rolling update of NM, AM start of container on NM failed. 
 And then job hang there.
 Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean

2015-07-13 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625339#comment-14625339
 ] 

Colin Patrick McCabe commented on YARN-3844:


{code}
  snprintf(pid_buf, sizeof(pid_buf), %ld, (int64_t)pid);
{code}
This will generate a warning on 32-bit

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.006.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


[ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625302#comment-14625302
 ] 

Wangda Tan commented on YARN-3885:
--

Increase to blocker since this is a bad bug. 

Thanks for working on this [~ajithshetty].

I think patch looks generally good except one thing also mentioned by 
[~xinxianyin]:
bq. for non-leaf queues, min(sum of children's PREEMOTABLEs, extra) -(this 
is what CHILDRENPREEMPTABLE does in the patch)

Specifying {{ret.preemptableExtra = childrensPreemptable;}} seems not enough. 
Could you add a test to verify: when sum(queue.children.preemptable)  extra, 
we can still get correct preemption result?

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-13 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625304#comment-14625304
 ] 

Bibin A Chundatt commented on YARN-3893:


[~varun_saxena] and [~sunilg] . Only need to call 
{{rm.transitionToStandby(false)}} on exception .
Since it handles  transition to  standby in rm context,Stop active services and 
not reinitializing queues.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-07-13 Thread Anubhav Dhoot (JIRA)

Anubhav Dhoot created YARN-3920:
---

 Summary: FairScheduler Reserving a node for a container should be 
configurable to allow it used only for large containers
 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Reserving a node for a container was designed for preventing large containers 
from starvation from small requests that keep getting into a node. Today we let 
this be used even for a small container request. This has a huge impact on 
scheduling since we block other scheduling requests until that reservation is 
fulfilled. We should make this configurable so its impact can be minimized by 
limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

2015-07-13 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625307#comment-14625307
 ] 

Bibin A Chundatt commented on YARN-3884:


Please review patch attached

 RMContainerImpl transition from RESERVED to KILL apphistory status not updated
 --

 Key: YARN-3884
 URL: https://issues.apache.org/jira/browse/YARN-3884
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: Suse11 Sp3
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
 Elapsed Time.jpg, Test Result-Container status.jpg


 Setup
 ===
 1 NM 3072 16 cores each
 Steps to reproduce
 ===
 1.Submit apps  to Queue 1 with 512 mb 1 core
 2.Submit apps  to Queue 2 with 512 mb and 5 core
 lots of containers get reserved and unreserved in this case 
 {code}
 2015-07-02 20:45:31,169 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to 
 RESERVED
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 Reserved container  application=application_1435849994778_0002 
 resource=memory:512, vCores:5 queue=QueueA: capacity=0.4, 
 absoluteCapacity=0.4, usedResources=memory:2560, vCores:21, 
 usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
 numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
 used=memory:2560, vCores:21 cluster=memory:6144, vCores:32
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
 absoluteCapacity=0.4, usedResources=memory:3072, vCores:26, 
 usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
 numContainers=6
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.96875 
 absoluteUsedCapacity=0.96875 used=memory:5632, vCores:31 
 cluster=memory:6144, vCores:32
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to 
 ALLOCATED
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
 RESULT=SUCCESS  APPID=application_1435849994778_0001
 CONTAINERID=container_e24_1435849994778_0001_01_14
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_e24_1435849994778_0001_01_14 of capacity 
 memory:512, vCores:1 on host host-10-19-92-117:64318, which has 6 
 containers, memory:3072, vCores:14 used and memory:0, vCores:2 available 
 after allocation
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 assignedContainer application attempt=appattempt_1435849994778_0001_01 
 container=Container: [ContainerId: 
 container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, 
 NodeHttpAddress: host-10-19-92-117:65321, Resource: memory:512, vCores:1, 
 Priority: 20, Token: null, ] queue=default: capacity=0.2, 
 absoluteCapacity=0.2, usedResources=memory:2560, vCores:5, 
 usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, 
 numContainers=5 clusterResource=memory:6144, vCores:32
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
 absoluteCapacity=0.2, usedResources=memory:3072, vCores:6, 
 usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
 used=memory:6144, vCores:32 cluster=memory:6144, vCores:32
 2015-07-02 20:45:32,143 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0001_01_14 Container Transitioned from 
 ALLOCATED to ACQUIRED
 2015-07-02 20:45:32,174 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Trying to fulfill reservation for application application_1435849994778_0002 
 on node: host-10-19-92-143:64318
 2015-07-02 20:45:32,174 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 Reserved container

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625369#comment-14625369
 ] 

Zhijie Shen commented on YARN-3908:
---

[~vrushalic] and [~sjlee0], thanks for helping fix the problems. I've two 
questions:

1. In fact, I'm wondering if we should but info and events into a separate 
column family like what we did for configs/metrics?

2. We don't want to store the metric type, do we?

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level


 [ 
https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3885:
-
Priority: Blocker  (was: Critical)

 ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 
 level
 --

 Key: YARN-3885
 URL: https://issues.apache.org/jira/browse/YARN-3885
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: Ajith S
Assignee: Ajith S
Priority: Blocker
 Attachments: YARN-3885.02.patch, YARN-3885.03.patch, 
 YARN-3885.04.patch, YARN-3885.patch


 when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}}
 this piece of code, to calculate {{untoucable}} doesnt consider al the 
 children, it considers only immediate childern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3866) AM-RM protocol changes to support container resizing


 [ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-3866:

Attachment: YARN-3866-YARN-1197.4.patch

YARN-3866.3.patch doesn't build by itself because the 
{{IncreaseContainersResourceRequest}} and 
{{IncreaseContainersResourceResponse}} in {{TestPBImplRecords.java}} are 
defined in YARN-1449. 

I have removed them from {{TestPBImplRecords.java}} and generated new patch 
YARN-3866-YARN-1197.4.patch. I will add them back in YARN-1449.

 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, 
 YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625331#comment-14625331
 ] 

Wangda Tan commented on YARN-1449:
--

[~mding], you can submit them together to trigger pre-commit build, I think 
YARN-3866 will go first. 

 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, 
 yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean

2015-07-13 Thread Alan Burlison (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625371#comment-14625371
 ] 

Alan Burlison commented on YARN-3844:
-

Oops yes, good catch - thanks. I'll change it to use PRId64 like the others, 
and I'll check to see if there are other instances elsewhere.

 Make hadoop-yarn-project Native code -Wall-clean
 

 Key: YARN-3844
 URL: https://issues.apache.org/jira/browse/YARN-3844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: As we specify -Wall as a default compilation flag, it 
 would be helpful if the Native code was -Wall-clean
Reporter: Alan Burlison
Assignee: Alan Burlison
 Attachments: YARN-3844.001.patch, YARN-3844.002.patch, 
 YARN-3844.006.patch


 As we specify -Wall as a default compilation flag, it would be helpful if the 
 Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled


[ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624773#comment-14624773
 ] 

Varun Saxena commented on YARN-3916:


[~jianhe], I have an added and addendum patch on YARN-3878.
It adds the previous drained flag, reset it on InterruptedException and kept 
the bits related to YARN-3878 which were required.

 DrainDispatcher#await should wait till event has been completely handled
 

 Key: YARN-3916
 URL: https://issues.apache.org/jira/browse/YARN-3916
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3916.01.patch, YARN-3916.02.patch


 DrainDispatcher#await should wait till event has been completely handled.
 Currently it only checks for whether event queue has become empty.
 And in many tests we directly check for a state to be changed after calling 
 await.
 Sometimes, the states do not change by the time we check them as event has 
 not been completely handled.
 *This is causing test failures* such as YARN-3909 and YARN-3910 and may cause 
 other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-07-13 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624917#comment-14624917
 ] 

Chen He commented on YARN-1680:
---

Sorry for the lateness. [~wangda], I have assigned this issue to you. 

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith Sharma K S
Assignee: Tan, Wangda
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625521#comment-14625521
 ] 

Wangda Tan commented on YARN-3866:
--

Rekicked Jenkins to see if the problem still exists.

 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, 
 YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625387#comment-14625387
 ] 

Sangjin Lee commented on YARN-3908:
---

bq. 2. We don't want to store the metric type, do we?

Maybe I was mistaken when I read your comment that said I also realized that 
the metric type is not persisted too. I took it to mean that you're suggesting 
that we persist it. I also had an offline chat with [~vrushalic], and she 
clarified that we probably do not need to persist the metric type and that we 
can distinguish between a single value metric vs. time series as you described.

We still need to make some changes to ensure that the hbase writer sets the 
right min version when it writes a single-value metric (currently it's not 
being done).

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster

Zack Marsh created YARN-3921:


 Summary: Permission denied errors for local usercache directories 
when attempting to run MapReduce job on Kerberos enabled cluster 
 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh


Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
MapReduce example as the Linux user 'tdatuser':
{code}
iripiri1:~ # su tdatuser
tdatuser@piripiri1:/root yarn jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
16 1
Number of Maps  = 16
Samples per Map = 1
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
http:/   s/v1/timeline/
15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
over to
15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process 
: 16
15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_14
15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
application_14
15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
http://piripiri3   cation_1436821014431_0003/
15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in 
uber mode : false
15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed 
successfully
15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=358
FILE: Number of bytes written=2249017
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4198
HDFS: Number of bytes written=215
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=16
Launched reduce tasks=1
Data-local map tasks=16
Total time spent by all maps in occupied slots (ms)=160498
Total time spent by all reduces in occupied slots (ms)=27302
Total time spent by all map tasks (ms)=80249
Total time spent by all reduce tasks (ms)=13651
Total vcore-seconds taken by all map tasks=80249
Total vcore-seconds taken by all reduce tasks=13651
Total megabyte-seconds taken by all map tasks=246524928
Total megabyte-seconds taken by all reduce tasks=41935872
Map-Reduce Framework
Map input records=16
Map output records=32
Map output bytes=288
Map output materialized bytes=448
Input split bytes=2310
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=448
Reduce input records=32
Reduce output records=0
Spilled Records=64
Shuffled Maps

[jira] [Comment Edited] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625438#comment-14625438
 ] 

Allen Wittenauer edited comment on YARN-3921 at 7/13/15 9:57 PM:
-

AFAIK, YARN won't change the permissions on the work dirs when you switch 
modes.  The assumption is the ops folks/tools will handle this as part of the 
transition.

amabari-qa's dir changing seems to be more related to something else (are these 
machines being managed via ambari and, like a naughty child, ambari is putting 
these where they don't belong?) given that you didn't say that a job belonging 
to the user ambari-qa job was run...


was (Author: aw):
AFAIK, YARN won't change the permissions on the work dirs when you switch 
modes.  The assumption is the ops folks/tools will handle this as part of the 
transition.

amabari-qa's dir changing seems to be more related to something else (are these 
machines being managed via ambari and, like a naughty child, ambari is putting 
these were they don't belong?) given that you didn't say that a job belonging 
to the user ambari-qa job was run...

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3

[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625438#comment-14625438
 ] 

Allen Wittenauer commented on YARN-3921:


AFAIK, YARN won't change the permissions on the work dirs when you switch 
modes.  The assumption is the ops folks/tools will handle this as part of the 
transition.

amabari-qa's dir changing seems to be more related to something else (are these 
machines being managed via ambari and, like a naughty child, ambari is putting 
these were they don't belong?) given that you didn't say that a job belonging 
to the user ambari-qa job was run...

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map

[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625448#comment-14625448
 ] 

Zack Marsh commented on YARN-3921:
--

Yes this cluster is being manager by Ambari, and yes there were jobs belonging 
to the user ambari-qa ran before and after enabling Kerberos.

Given what you said, it sounds like the fix for this issue is just changing the 
ownersip on these usercache directories and the directories/files within to the 
appropriate user.

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map tasks=80249
 Total vcore-seconds taken by all reduce tasks=13651
 Total megabyte-seconds taken by

[jira] [Resolved] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


 [ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zack Marsh resolved YARN-3921.
--
Resolution: Not A Problem

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map tasks=80249
 Total vcore-seconds taken by all reduce tasks=13651
 Total megabyte-seconds taken by all map tasks=246524928
 Total megabyte-seconds taken by all reduce tasks=41935872
 Map-Reduce Framework
 Map input records=16
 Map output records=32
 Map output bytes=288
 Map output materialized bytes=448
 Input

[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625480#comment-14625480
 ] 

Zack Marsh commented on YARN-3921:
--

Okay, thanks for the responses. I've  created AMBARI-12402 for this issue (I 
don't think I have the permissions to move this issue to the Ambari project).

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map tasks=80249
 Total vcore-seconds taken by all reduce tasks=13651
 Total megabyte-seconds taken by all map tasks=246524928
 Total megabyte-seconds taken by all reduce tasks=41935872
 Map-Reduce Framework
 Map input

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625513#comment-14625513
 ] 

MENG DING commented on YARN-3866:
-

* The FindBugs doesn't seem to be working in YARN-1197 branch (My own tests 
work fine though)

* The checkstyle issues are false alarms. They are needed for Javadocs to work

{code:title=ContainerResourceChangeRequest.java|borderStyle=solid}
+import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceStability.Unstable;
+import org.apache.hadoop.yarn.api.ApplicationMasterProtocol;--- 
(Needed for Javadoc related method)
+import org.apache.hadoop.yarn.util.Records;
+
+/**
+ * {@code ContainerResourceChangeRequest} represents the request made by an
+ * application to the {@code ResourceManager} to change resource allocation of
+ * a running {@code Container}.
+ * p
+ * It includes:
+ * ul
+ *   li{@link ContainerId} for the container./li
+ *   li
+ * {@link Resource} capability of the container after the resource change
+ * is completed.
+ *   /li
+ * /ul
+ *
+ * @see 
ApplicationMasterProtocol#allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest)
 (can't break this otherwise Javadoc won't work.)
+ */
{code}

 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, 
 YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625396#comment-14625396
 ] 

Jian He commented on YARN-3878:
---

re-opened this, also reverted the previous patch.  
[~varun_saxena], could you upload a clean delta patch for this ? thanks !

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625397#comment-14625397
 ] 

Zhijie Shen commented on YARN-3908:
---

Yeah, but the method based on metric value number is not guaranteed, are we 
okay with it?

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


 [ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zack Marsh updated YARN-3921:
-
Description: 
Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
MapReduce example as the Linux user 'tdatuser':
{code}
iripiri1:~ # su tdatuser
tdatuser@piripiri1:/root yarn jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
16 1
Number of Maps  = 16
Samples per Map = 1
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
http:/   s/v1/timeline/
15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
over to
15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process 
: 16
15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_14
15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
application_14
15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
http://piripiri3   cation_1436821014431_0003/
15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in 
uber mode : false
15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed 
successfully
15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=358
FILE: Number of bytes written=2249017
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4198
HDFS: Number of bytes written=215
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=16
Launched reduce tasks=1
Data-local map tasks=16
Total time spent by all maps in occupied slots (ms)=160498
Total time spent by all reduces in occupied slots (ms)=27302
Total time spent by all map tasks (ms)=80249
Total time spent by all reduce tasks (ms)=13651
Total vcore-seconds taken by all map tasks=80249
Total vcore-seconds taken by all reduce tasks=13651
Total megabyte-seconds taken by all map tasks=246524928
Total megabyte-seconds taken by all reduce tasks=41935872
Map-Reduce Framework
Map input records=16
Map output records=32
Map output bytes=288
Map output materialized bytes=448
Input split bytes=2310
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=448
Reduce input records=32
Reduce output records=0
Spilled Records=64
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=1501
CPU time spent (ms)=13670
Physical memory (bytes) snapshot=13480296448

[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625455#comment-14625455
 ] 

Allen Wittenauer commented on YARN-3921:


Yeah. In general when switching from non-K to K, people move to 
LinuxContainerExecutor first.  When they do that, a general going over of all 
permissions is done first since things like temp dirs tend to require 
rwx+sticky.  LCE is what is almost certainly forcing the permissions failures 
in your jobs.

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map tasks=80249
 Total vcore-seconds taken by all reduce tasks=13651
 Total megabyte-seconds taken by all

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625459#comment-14625459
 ] 

Jian He commented on YARN-1449:
---

bq. because it may need to create the this.containersToIncrease if it is null.
If this.containersToIncrease is null, wont this call return upfront  ?
{code}
if (containersToIncrease == null) {
  return;
}
{code}
bq. The AllocateRequestPBImpl.setAskList() uses the same logic.
Actually, many places use this logic. No need to respect that. The PB 
implementations are often past-and-copy code that people pay less attention to 
it.


 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, 
 yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625486#comment-14625486
 ] 

Hadoop QA commented on YARN-3866:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 48s | Findbugs (version ) appears to 
be broken on YARN-1197. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 31s | The applied patch generated  2 
new checkstyle issues (total was 22, now 15). |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 16s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m  5s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 53s | Tests passed in 
hadoop-yarn-common. |
| | |  54m  1s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-mapreduce-client-app |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745125/YARN-3866-YARN-1197.4.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-1197 / 47f4c54 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8523/console |


This message was automatically generated.

 AM-RM protocol changes to support container resizing
 

 Key: YARN-3866
 URL: https://issues.apache.org/jira/browse/YARN-3866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: MENG DING
Assignee: MENG DING
 Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, 
 YARN-3866.2.patch, YARN-3866.3.patch


 YARN-1447 and YARN-1448 are outdated. 
 This ticket deals with AM-RM Protocol changes to support container resize 
 according to the latest design in YARN-1197.
 1) Add increase/decrease requests in AllocateRequest
 2) Get approved increase/decrease requests from RM in AllocateResponse
 3) Add relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled


[ 
https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625382#comment-14625382
 ] 

Jian He commented on YARN-3916:
---

thanks [~varun_saxena] ! re-open YARN-3878 and close this as a dup of that. 
will look at YARN-3878.

 DrainDispatcher#await should wait till event has been completely handled
 

 Key: YARN-3916
 URL: https://issues.apache.org/jira/browse/YARN-3916
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3916.01.patch, YARN-3916.02.patch


 DrainDispatcher#await should wait till event has been completely handled.
 Currently it only checks for whether event queue has become empty.
 And in many tests we directly check for a state to be changed after calling 
 await.
 Sometimes, the states do not change by the time we check them as event has 
 not been completely handled.
 *This is causing test failures* such as YARN-3909 and YARN-3910 and may cause 
 other test failures as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reopened YARN-3878:
---

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at

[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625414#comment-14625414
 ] 

Sangjin Lee commented on YARN-3908:
---

When a time series data expires after the TTL (except for the latest value), it 
will only contain a single value. For all practical purposes, the metric at 
that point would act like a single value. We thought that it would be fine.

Do you see a situation where (probably on the read path) we need to recognize 
some metric as a time series and do something different *even though* there is 
only one value in the column?

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop


[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625436#comment-14625436
 ] 

Hudson commented on YARN-3878:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8157 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8157/])
Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured 
for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 
2466460d4cd13ad5837c044476b26e63082c1d37)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java


 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a

[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster


[ 
https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625463#comment-14625463
 ] 

Allen Wittenauer commented on YARN-3921:


So, this is probably a bug in Ambari, really.

 Permission denied errors for local usercache directories when attempting to 
 run MapReduce job on Kerberos enabled cluster 
 --

 Key: YARN-3921
 URL: https://issues.apache.org/jira/browse/YARN-3921
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: sles11sp3
Reporter: Zack Marsh

 Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple 
 MapReduce example as the Linux user 'tdatuser':
 {code}
 iripiri1:~ # su tdatuser
 tdatuser@piripiri1:/root yarn jar 
 /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 
 16 1
 Number of Maps  = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: 
 http:/   s/v1/timeline/
 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
 over to
 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to 
 process : 16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_14
 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application 
 application_14
 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: 
 http://piripiri3   cation_1436821014431_0003/
 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running 
 in uber mode : false
 15/07/13 17:05:50 INFO mapreduce.Job:  map 0% reduce 0%
 15/07/13 17:05:56 INFO mapreduce.Job:  map 6% reduce 0%
 15/07/13 17:06:00 INFO mapreduce.Job:  map 13% reduce 0%
 15/07/13 17:06:01 INFO mapreduce.Job:  map 38% reduce 0%
 15/07/13 17:06:05 INFO mapreduce.Job:  map 44% reduce 0%
 15/07/13 17:06:07 INFO mapreduce.Job:  map 63% reduce 0%
 15/07/13 17:06:09 INFO mapreduce.Job:  map 69% reduce 0%
 15/07/13 17:06:11 INFO mapreduce.Job:  map 75% reduce 0%
 15/07/13 17:06:12 INFO mapreduce.Job:  map 81% reduce 0%
 15/07/13 17:06:13 INFO mapreduce.Job:  map 81% reduce 25%
 15/07/13 17:06:14 INFO mapreduce.Job:  map 94% reduce 25%
 15/07/13 17:06:16 INFO mapreduce.Job:  map 100% reduce 31%
 15/07/13 17:06:17 INFO mapreduce.Job:  map 100% reduce 100%
 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 
 completed successfully
 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
 File System Counters
 FILE: Number of bytes read=358
 FILE: Number of bytes written=2249017
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=4198
 HDFS: Number of bytes written=215
 HDFS: Number of read operations=67
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=3
 Job Counters
 Launched map tasks=16
 Launched reduce tasks=1
 Data-local map tasks=16
 Total time spent by all maps in occupied slots (ms)=160498
 Total time spent by all reduces in occupied slots 
 (ms)=27302
 Total time spent by all map tasks (ms)=80249
 Total time spent by all reduce tasks (ms)=13651
 Total vcore-seconds taken by all map tasks=80249
 Total vcore-seconds taken by all reduce tasks=13651
 Total megabyte-seconds taken by all map tasks=246524928
 Total megabyte-seconds taken by all reduce tasks=41935872
 Map-Reduce Framework
 Map input records=16
 Map output records=32
 Map output bytes=288

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing


[ 
https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625472#comment-14625472
 ] 

MENG DING commented on YARN-1449:
-

Oh, the diff file doesn't show the entire context. The {{containersToIncrease}} 
refers to the parameter being passed in, so it is only in the scope of the 
{{setContainersToIncrease}} function.

{code}
+  @Override
+  public void setContainersToIncrease(ListToken containersToIncrease) {
+if (containersToIncrease == null) {
+  return;
+}
+initContainersToIncrease();
+this.containersToIncrease.clear();
+this.containersToIncrease.addAll(containersToIncrease);
+  }
{code}

 AM-NM protocol changes to support container resizing
 

 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan (No longer used)
Assignee: MENG DING
 Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, 
 yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch


 AM-NM protocol changes to support container resizing
 1) IncreaseContainersResourceRequest and 
 IncreaseContainersResourceResponse PB protocol and implementation
 2) increaseContainersResources method in ContainerManagementProtocol
 3) Update ContainerStatus protocol to include Resource
 4) Relevant test cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing