[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-16 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064467#comment-14064467
 ] 

Ratandeep Ratti commented on YARN-2252:
---

Thanks Wei!

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-04 Thread Ratandeep Ratti (JIRA)
Ratandeep Ratti created YARN-2252:
-

 Summary: Intermittent failure for testcase 
TestFairScheduler.testContinuousScheduling
 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti


This test-case is failing sporadically on my machine. I think I have a 
plausible explanation  for this.

It seems that when the Scheduler is being asked for resources, the resource 
requests that are being constructed have no preference for the hosts (nodes).
The two mock hosts constructed, both have a memory of 8192 mb.
The containers(resources) being requested each require a memory of 1024mb, 
hence a single node can execute both the resource requests for the application.
In the end of the test-case it is being asserted that the containers (resource 
requests) be executed on different nodes, but since we haven't specified any 
preferences for nodes when requesting the resources, the scheduler (at times) 
executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-04 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052398#comment-14052398
 ] 

Ratandeep Ratti commented on YARN-2252:
---

Pasting test-case failure message
{quote}
Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 180.775 sec  
FAILURE! - in org.apache.hadoop.yarn.server.resour
tFairScheduler
testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 180.495 se
java.lang.AssertionError: expected:2 but was:1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairSchedul


Results :

Failed tests:
  TestFairScheduler.testContinuousScheduling:2758 expected:2 but was:1
{quote}

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn

 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-06 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053129#comment-14053129
 ] 

Ratandeep Ratti commented on YARN-2252:
---

Thanks for the help Wei. I think I might have found the reason for this 
failure.  Notice that the test-case 
TestFairScheduler.testLoadConfigurationOnInitialize() first starts up the 
continuousScheduling thread (under FairScheduler object). When the execution 
flow reaches the test-case TestFairScheduler.testContinuousScheduling, even 
though we create a new FairScheduler object, the old thread 
(continuousScheduling) is still live, and now, there are now two live 
continuousScheduling threads trying to assign containers to nodes. 

The problem does not seem to occur when I properly stop each thread before 
starting another execution of this test-case.

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn

 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-06 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053280#comment-14053280
 ] 

Ratandeep Ratti commented on YARN-2252:
---

Wei,  while analyzing this failure, I stumbled upon another potential problem.
The method SchedulerApplication.getCurrentConsumption() is not synchronized, 
this may lead to visibility (staleness) problems

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
Assignee: Wei Yan
  Labels: hadoop2, scheduler, yarn

 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-07 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053413#comment-14053413
 ] 

Ratandeep Ratti commented on YARN-2252:
---

Wei, calling fairscheduler.stop() will not stop the threads. It seems that the 
threads continuousScheduling thread and update thread are not handling 
interrupt properly. Though we are doing [schedulingthread | 
updateThread].interrupt(), we do need to keep checking the interrupt flag in 
the while loop of these threads.

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
Assignee: Wei Yan
  Labels: hadoop2, scheduler, yarn

 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-07-07 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated YARN-2252:
--

Attachment: YARN-2252-1.patch

Wei, since I already have a patch for this on my 2.3 hadoop branch (for my 
internal org). I'm also submitting it here. Please have a look to see if it is 
fine by you.

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
Assignee: Wei Yan
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)