[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142404#comment-14142404
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Thanks for updating, I took a look at your latest patch, some comments,
bq. Even if a leaf queue could inherit it's parent's disable preemption value, 
there will likely be cases where part of the parent queue's over-capacity 
resources are untouchable and part of them are preemptable.
I'm not quite understand about this, is there any issue if we let 
disable-preemption of parent queue is only a default value, and its children 
can overwrite it?

bq. I collected untouchableExtra instead of preemptableExtra at the TempQueue 
level. in computeFixpointAllocation,
Make sense to me

With
bq. In TempQueue#offer, one of the calculations is current + pending - 
idealAssigned ... and TempQueue#offer would end up assigning more to that queue 
than it should.
bq. I looped through each queue, and if one has any untouchableExtra, then the 
queue's idealAssigned = guaranteed + untouchableExtra
A problem I can see is, a negtive example:

{code}
root has A,B,C, total capacity = 90
A.guaranteed = 30, A.pending = 20, A.current = 40
B.guaranteed = 30, B.pending = 0, B.current = 50
C.guaranteed = 30, C.pending = 0, C.current = 0

initial:
A.idealAssigned = 40
B.idealAssigned = 0
unassigned = 50 (90 - 40)

1st round (A/B/C's normalized_guarantee = 1/3)
A.idealAssigned = 40 (unchanged because changes of TempQueue#offer 
implementation, A will be removed)
B.idealAssigned = 17 (50/3)
C.idealAssigned = 0 (satisfied already, C will be removed)
unassigned = 33

2nd round (A/B's normalized_guarantee = 0.5)
B.idealAssigned = 17 + 33 = 50
unassigned = 0
{code}

However, with A/B has same guarantee, they should end up with both 
ideal_assigned = 45. And A will be able to preempt 5 from B. So in a short 
word, the problem is, a disable_preemption queue cannot preempt more resource 
than its current when some queues in the cluster don't consume resource. Do you 
have any idea to solve this problem?

Thanks,
Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-21 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142488#comment-14142488
 ] 

Jonathan Maron commented on YARN-2554:
--

Let me try to address each of these concerns:

- the key-store needs to present on all machine to distribute certificates: AMs 
may come up anywhere.

The client trust store appears to an already defined keystore in the secure 
deployments I've seen so far.

- the key-store used by Hadoop daemons CANNOT be shared with AMs: AMs run 
user-code as the user
- the key-store cannot be shared across AMs of different users: Assuming I am 
running three different Slider apps as three different users, you don't want to 
have a single key-store instance accessible by all Slider AMs.

Your concern about sharing elements between daemon processes and users is 
correct in the context of kerberos - principals are used for authentication and 
authorization and it wouldn't be prudent to have those shared.  When code is 
running within an AM, the identity that is established and governs its 
authorized interactions is done via kerberos.  But SSL is a security offering 
targeted for the transport layer.  Typically with SSL, for client C and server 
S, SSL only establishes that:

C believes S is who it intended to contact
C believes it has a secure connection to S
S believes it has a secure connection to C

In addition, typically SSL ensures authentication of the server alone (the CN 
is generally the host name), so the trust between C and S is more along the 
lines of trusting the C is actually talking to host S.  The session key for the 
RM (C) and AM (S), given the host certificate of the AM, will be unique - it is 
created by the RM, encrypted using the AM cert (using the AM public key), and 
sent to the AM to be decrypted by the AM's private key.  I guess I don't see 
the issue with hadoop applications, whether daemon or AM, using the available 
hadoop keystores.  The tranport layer mechanism doesn't play a role in the 
application level authorization since it has no role to play in establishing 
the user's subject etc.

Having said all that, I may not be aware of some uses of the keystores in a 
hadoop cluster (or perhaps my assumption about the availability of  a client 
trust store is wrong).  In addition, it'd be nice to get the perspective of 
some of the security folks about the role of SSL in the security layers 
provided in a secure cluster.


 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler

2014-09-21 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142513#comment-14142513
 ] 

Sunil G commented on YARN-2498:
---

Hi [~wangda]

Great work Wangda. Very good design to solve the issues. I have few doubts 
though :)

1. 
bq. 2G resource in node1 with label x, and queueA has label y, so queueA cannot 
access node1
A scenario where *node1* has more than 50% (say 60) of cluster resources, and 
queue A is given 50% in CS. IN that case, is there any chance of under 
utilization?

2. When two or more nodes are merged to a single resource tree, and if these 
nodes share some common labels, i could see a common total sum is been finally 
stored per label in the consolidated tree.

Here I feel, we may need to split up the resource of label in each node level. 

For eg; label *x* and *y* are needed for *queueA*. label *y* and *z* are needed 
for *queueB*. And if the label *y* is shared between 2 or more nodes, I am not 
sure whether it will cause some pblm.

3. 
bq.(container.resource can be used by pendingResourceLeaf)

For preemption, we just calculate to match the totalResourceToPreempt from the 
over utilized queues. But whether this container is from which node, and also 
under which label, and whether this label is coming under which queue. Do we 
need to do this check for each container?


 [YARN-796] Respect labels in preemption policy of capacity scheduler
 

 Key: YARN-2498
 URL: https://issues.apache.org/jira/browse/YARN-2498
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
 yarn-2498-implementation-notes.pdf


 There're 3 stages in ProportionalCapacityPreemptionPolicy,
 # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
 available resource, resource used/pending in each queue and guaranteed 
 capacity of each queue.
 # Mark to-be preempted containers: For each over-satisfied queue, it will 
 mark some containers will be preempted.
 # Notify scheduler about to-be preempted container.
 We need respect labels in the cluster for both #1 and #2:
 For #1, when there're some resource available in the cluster, we shouldn't 
 assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
 access such labels
 For #2, when we make decision about whether we need preempt a container, we 
 need make sure, resource this container is *possibly* usable by a queue which 
 is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2452) TestRMApplicationHistoryWriter fails with FairScheduler

2014-09-21 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2452:
---
Summary: TestRMApplicationHistoryWriter fails with FairScheduler  (was: 
TestRMApplicationHistoryWriter is failed for FairScheduler)

 TestRMApplicationHistoryWriter fails with FairScheduler
 ---

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2452.000.patch, YARN-2452.001.patch, 
 YARN-2452.002.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler

2014-09-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142667#comment-14142667
 ] 

Karthik Kambatla commented on YARN-2452:


Patch looks reasonable to me, can't think of an easier way to get the test to 
work with FS. +1. Committing this. 

 TestRMApplicationHistoryWriter is failed for FairScheduler
 --

 Key: YARN-2452
 URL: https://issues.apache.org/jira/browse/YARN-2452
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2452.000.patch, YARN-2452.001.patch, 
 YARN-2452.002.patch


 TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is 
 the following:
 T E S T S
 ---
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 66.261 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:200
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-21 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2252:
---
Attachment: yarn-2252-2.patch

Here is a patch that modifies Ratandeep's patch. 

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch, yarn-2252-2.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142684#comment-14142684
 ] 

Karthik Kambatla commented on YARN-2453:


As of now, CapacityScheduler is the only scheduler that instantiates 
PreemptableResourceScheduler. How about, we set the scheduler explicitly in the 
setup method. The test should just reuse the configuration instead of creating 
another instance. 

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142718#comment-14142718
 ] 

Hadoop QA commented on YARN-2252:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670318/yarn-2252-2.patch
  against trunk revision c50fc92.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5066//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5066//console

This message is automatically generated.

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch, yarn-2252-2.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142720#comment-14142720
 ] 

Karthik Kambatla commented on YARN-2252:


The patch touches only TestFairScheduler, and couldn't have caused all these 
test failures. 

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch, yarn-2252-2.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-21 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142745#comment-14142745
 ] 

Wei Yan commented on YARN-2252:
---

Patch looks good to me.

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch, yarn-2252-2.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2453:

Attachment: YARN-2453.001.patch

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2453:

Attachment: (was: YARN-2453.001.patch)

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2453:

Attachment: YARN-2453.001.patch

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142758#comment-14142758
 ] 

zhihai xu commented on YARN-2453:
-

Hi [~kasha], Your suggestion is good. I made the change to set 
CapacityScheduler as the scheduler for this test.
I attached a new patch YARN-2453.001.patch, please review it. thanks

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142768#comment-14142768
 ] 

Karthik Kambatla commented on YARN-2453:


Thanks for the quick update, Zhihai. Should we set the scheduler to 
CapacityScheduler in the setup method so the entire class can use it? 



 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-09-21 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142770#comment-14142770
 ] 

Chuan Liu commented on YARN-2190:
-

[~vvasudev], thanks for the review! Please see my answers below.

bq. 1. What is the behaviour of a process that tries to exceed the allocated 
memory? Will it start swapping or will it be killed?
OS will limit the process's memory to the specified value. After the limit is 
reached, further API calls to memory allocation, e.g. {{malloc()}}, will get an 
error. The process behavior depends. In JVM, the application will get an OOM 
error and terminate. In C#, I was told CLR can detect memory pressure. I think 
swapping is an orthogonal OS feature and not related to this.

bq. 2. Your code assumes a 1-1 mapping of physical cores to vcores. This 
assumption is/will be problematic, especially in heterogeneous clusters. You're 
better off using the ratio of (container-vcores/node-vcores) to determine cpu 
limits.
The implementation follows the existing 
{{org.apache.hadoop.yarn.api.records.Resource}}. In the existing doc, it says 
A node's capacity should be configured with virtual cores equal to its number 
of physical cores. (Source: 
http://hadoop.apache.org/docs/current/api/index.html) I checked existing 
{{CgroupsLCEResourcesHandler}} implementation, which also does not use 
node-vcores, i.e. {{yarn.nodemanager.resource.cpu-vcores}}, to determine CPU 
shares.

bq. 3. Can you explain why you are modifying DefaultContainerExecutor? You've 
added a method for the old signature in ContainerExecutor.
I need addtional {{Resource}} which does not exist in existing 
{{getRunCommand()}} method parameters. I added a method of the old signature to 
maintain backward compatibility as this is an abstract class and there are 
other child implementations out there.

bq. 4. Can you modify the comments/usage to specify the units of memory(bytes, 
MB, GB)?
This is already documented in existing patch, which I have cited as follows. 
Where exactly do you want to see more comments? 
{noformat}
+ OPTIONS: -c [cores] set virtual core limits on the job object.\n\
+  -m [memory] set the memory limit on the job object.\n\
+ The core limit is an integral value of number of cores. The memory\n\
+ limit is an integral number of memory in MB. The definition\n\
+ follows the org.apache.hadoop.yarn.api.records.Resource model.\n\
+ The limit will not be set if 0 or negative value is passed in as\n\
+ parameter(s).\n\
{noformat}




 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch, YARN-2190.3.patch, YARN-2190.4.patch, YARN-2190.5.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142774#comment-14142774
 ] 

Hadoop QA commented on YARN-2453:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670327/YARN-2453.001.patch
  against trunk revision c50fc92.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5067//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5067//console

This message is automatically generated.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142775#comment-14142775
 ] 

Hadoop QA commented on YARN-2453:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670328/YARN-2453.001.patch
  against trunk revision c50fc92.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5068//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5068//console

This message is automatically generated.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2453:

Attachment: YARN-2453.002.patch

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142826#comment-14142826
 ] 

zhihai xu commented on YARN-2453:
-

Hi [~Karthik Kambatla], I moved all the configuration to setup in 
YARN-2453.002.patch. thanks for the quick response.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2453) TestProportionalCapacityPreemptionPolicy is failed for FairScheduler

2014-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142836#comment-14142836
 ] 

Hadoop QA commented on YARN-2453:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670339/YARN-2453.002.patch
  against trunk revision c50fc92.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5069//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5069//console

This message is automatically generated.

 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler
 

 Key: YARN-2453
 URL: https://issues.apache.org/jira/browse/YARN-2453
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2453.000.patch, YARN-2453.001.patch, 
 YARN-2453.002.patch


 TestProportionalCapacityPreemptionPolicy is failed for FairScheduler.
 The following is error message:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.94 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 testPolicyInitializeAfterSchedulerInitialized(org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy)
   Time elapsed: 1.61 sec   FAILURE!
 java.lang.AssertionError: Failed to find SchedulingMonitor service, please 
 check what happened
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy.testPolicyInitializeAfterSchedulerInitialized(TestProportionalCapacityPreemptionPolicy.java:469)
 This test should only work for capacity scheduler because the following 
 source code in ResourceManager.java prove it will only work for capacity 
 scheduler.
 {code}
 if (scheduler instanceof PreemptableResourceScheduler
conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
   YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
 {code}
 Because CapacityScheduler is instance of PreemptableResourceScheduler and 
 FairScheduler is not  instance of PreemptableResourceScheduler.
 I will upload a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142875#comment-14142875
 ] 

Vinod Kumar Vavilapalli commented on YARN-913:
--

I started looking at this. I have done my fair share of reviews in my life, but 
this is a monster patch. Can we please split this to have a realistic timeline 
for commit to 2.6? I propose the following sub-tasks
 - The addition of registry as a stand alone module. Which in itself is massive 
and can be further split - server side abstraction, ZK based server side 
implementation, client side, web-services etc.
 - Integration of registry into YARN
 - Changing YARN apps like distributed-shell
 - Documentation

I just applied the massive patch against latest trunk and digging through it. 
But if you agree, we should file sub-tasks and divide up the patch. Thanks.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142898#comment-14142898
 ] 

Ray Chiang commented on YARN-2460:
--

Thanks for the help committing.

 Remove obsolete entries from yarn-default.xml
 -

 Key: YARN-2460
 URL: https://issues.apache.org/jira/browse/YARN-2460
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2460-01.patch, YARN-2460-02.patch


 The following properties are defined in yarn-default.xml, but do not exist in 
 YarnConfiguration.
   mapreduce.job.hdfs-servers
   mapreduce.job.jar
   yarn.ipc.exception.factory.class
   yarn.ipc.serializer.type
   yarn.nodemanager.aux-services.mapreduce_shuffle.class
   yarn.nodemanager.hostname
   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
   yarn.nodemanager.resourcemanager.connect.wait.secs
   yarn.resourcemanager.amliveliness-monitor.interval-ms
   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.container.liveness-monitor.interval-ms
   yarn.resourcemanager.nm.liveness-monitor.interval-ms
   yarn.timeline-service.hostname
   yarn.timeline-service.http-authentication.simple.anonymous.allowed
   yarn.timeline-service.http-authentication.type
 Presumably, the mapreduce.* properties are okay.  Similarly, the 
 yarn.timeline-service.* properties are for the future TimelineService.  
 However, the rest are likely fully deprecated.
 Submitting bug for comment/feedback about which other properties should be 
 kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-21 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-2578:
---

 Summary: NM does not failover timely if RM node network connection 
fails
 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg


The NM does not fail over correctly when the network cable of the RM is 
unplugged or the failure is simulated by a service network stop or a firewall 
that drops all traffic on the node. The RM fails over to the standby node when 
the failure is detected as expected. The NM should than re-register with the 
new active RM. This re-register takes a long time (15 minutes or more). Until 
then the cluster has no nodes for processing and applications are stuck.

Reproduction test case which can be used in any environment:
- create a cluster with 3 nodes
node 1: ZK, NN, JN, ZKFC, DN, RM, NM
node 2: ZK, NN, JN, ZKFC, DN, RM, NM
node 3: ZK, JN, DN, NM
- start all services make sure they are in good health
- kill the network connection of the RM that is active using one of the network 
kills from above
- observe the NN and RM failover
- the DN's fail over to the new active NN
- the NM does not recover for a long time
- the logs show a long delay and traces show no change at all

The stack traces of the NM all show the same set of threads. The main thread 
which should be used in the re-register is the Node Status Updater This 
thread is stuck in:
{code}
Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
Object.wait() [0x7f5a51fc1000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.ipc.Client.call(Client.java:1395)
- locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
{code}

The client connection which goes through the proxy can be traced back to the 
ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
should be using a version which takes the RPC timeout (from the configuration) 
as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-09-21 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-2578:

Attachment: YARN-2578.patch

Attached patch for initial review
In the patch the proxy uses the RPC timeout that is set in the configuration to 
timeout the connection.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)