[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951785#comment-13951785
 ] 

Jian He commented on YARN-1879:
---

Both register and unregister are checking whether the application has 
registered/unregistered before. If the application has already 
registered/unregistered, duplicate register/unregister attempts can cause 
exceptions. are they still idempotent ?

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1879.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951783#comment-13951783
 ] 

Jian He commented on YARN-1890:
---

+1 for cleaning it up

> Too many unnecessary logs are logged while accessing applicationMaster web UI.
> --
>
> Key: YARN-1890
> URL: https://issues.apache.org/jira/browse/YARN-1890
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
>
> Accessing applicationMaster UI which is redirected from RM UI, logs too many 
> logs in ResourceManager logs and ProxyServer logs. On every refresh, logging 
> is done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are 
> filled with UI information logs which are not really necessary for user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951780#comment-13951780
 ] 

Xuan Gong commented on YARN-1879:
-

[~ozawa] Mind if i take over this ticket. I will make some progress this 
weekend. Feel free to take it back.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: YARN-1879.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1879:


Attachment: YARN-1879.1.patch

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1879.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1879:
---

Assignee: Xuan Gong  (was: Tsuyoshi OZAWA)

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-1879.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951779#comment-13951779
 ] 

Xuan Gong commented on YARN-1879:
-

ApplicationMasterService#registerApplicationMaster does have the lastResponse 
stuff. So, I think that it is OK for us to add the annotation in this ticket.

ApplicationMasterService#finishApplicationMaster is to submit 
RMAppAttemptUnregistrationEvent, and check status. So, adding idempotent 
annotation is OK, too

So, we can fix three of APIs together in this ticket.
Mark
* allocation --> AtMostOnce
* registerApplicationMaster --> Idempotent
* finishApplicationMaster --> Idempotent

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951778#comment-13951778
 ] 

Fengdong Yu commented on YARN-1889:
---

Hi, Zhiguo,
what my comments you addressed in your new patch? I cannot see any change.

1. there are still tabs in the patch
2. move following initialization into the constructor 
{code}
+  private Priority priority = recordFactory.newRecordInstance(Priority.class);
+  private ResourceWeights resourceWeights = new ResourceWeights();
{code}
3: As Sandy said, don't use  recordFactory.newRecordInstance(Priority.class), 
instead, use Priority.newInstance(1)
4: so remove priority.setPriority(1);
{code}
   public Priority getPriority() {
 // Right now per-app priorities are not passed to scheduler,
 // so everyone has the same priority.
-Priority p = recordFactory.newRecordInstance(Priority.class);
-p.setPriority(1);
-return p;
+priority.setPriority(1);
+return priority;
   }
{code}

5: please rename to getResourceWeights()
{code}
+  public ResourceWeights getResourceWeightsObject() {
+   return resourceWeights;
+  }
{code}

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int h

[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951774#comment-13951774
 ] 

Hong Zhiguo commented on YARN-1889:
---

Hi, Sandy, 
During processing NodeUpdate events, the number of GC and the accumulated GC 
time is reduced about half.


> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUpdate in average: %d us\n",
> (end - start)/(hb_num * 1000));
> System.out.println("GC stats after NodeUpdate processing:");
> printGCStats();
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951773#comment-13951773
 ] 

Hong Zhiguo commented on YARN-1889:
---

HI, Fengdong, I'll update the patch according to your comments. Thanks

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUpdate in average: %d us\n",
> (end - start)/(hb_num * 1000));
> System.out.println("GC stats after NodeUpdate processing:");
> printGCStats();
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951704#comment-13951704
 ] 

Vinod Kumar Vavilapalli commented on YARN-1879:
---

*Sigh* I meant "For now, split this JIRA and just make allocate() API 
{{AtMostOnce}}? That is the API that is causing our tests to fail. Thoughts?"

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951700#comment-13951700
 ] 

Vinod Kumar Vavilapalli commented on YARN-1879:
---

That's true. I mentioned retry-cache but didn't say AtMostOnce. My Bad.

But I remember register/unregister APIs didn't have the lastResponse stuff. So 
register/unregister should be enhanced with the retry-cache support.

For now, split this JIRA and just make allocate() API idempotent? That is the 
API that is causing our tests to fail. Thoughts?

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951696#comment-13951696
 ] 

Jian He commented on YARN-1879:
---

allocate() is similar to nodeHeartBeat which returns previous response even 
with multiple retries, so should be AtMostOnce?

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951693#comment-13951693
 ] 

Vinod Kumar Vavilapalli commented on YARN-1879:
---

BTW, +1 for marking all of them as idempotent. We already have retry-cache like 
mechanism built into ResourceManager.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1879:
--

Priority: Blocker  (was: Major)
Target Version/s: 2.4.0  (was: 2.5.0)

Okay, we have been running some tests with RM HA and because of these missing 
annotations, we are running into apps that fail during fail-over.

I think this should be fixed in 2.4.0 itself. [~ozawa]/[~xgong]/[~jianhe], can 
we make progress on this? Thanks!

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1017) Document RM Restart feature

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951647#comment-13951647
 ] 

Hudson commented on YARN-1017:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5429 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5429/])
YARN-1017. Added documentation for ResourceManager Restart. (jianhe) (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582913)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRestart.apt.vm


> Document RM Restart feature
> ---
>
> Key: YARN-1017
> URL: https://issues.apache.org/jira/browse/YARN-1017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, 
> rm-restart-doc-3.patch
>
>
> This should give users a general idea about how RM Restart works and how to 
> use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951644#comment-13951644
 ] 

Jian He commented on YARN-1885:
---

One possible fix is that on NM-resync, today we are ignoring all the container 
statuses except the AM container. We can change it to notify RMAppAttempt about 
the other containers too so that the new attempt knows  the nodes on which the 
previous containers ran. 

> yarn logs command does not provide the application logs for some applications
> -
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>
> During our HA testing we have seen cases where yarn application logs are not 
> available through the cli but i can look at AM logs through the UI. RM was 
> also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port

2014-03-28 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951591#comment-13951591
 ] 

Karthik Kambatla commented on YARN-1888:


Don't think this is a bug. People should be able to run multiple NMs on a node. 
The port 0 is primarily for convenience. 

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have 
> different port
> 
>
> Key: YARN-1888
> URL: https://issues.apache.org/jira/browse/YARN-1888
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: zhaoyunjiong
>Priority: Minor
> Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
> inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951573#comment-13951573
 ] 

Jian He commented on YARN-1885:
---

If this is the last AM retry, no new AM will be created, and NM on re-sync 
after RM restarts won't get notification and won't aggregate the container logs 
for this application at all.

> yarn logs command does not provide the application logs for some applications
> -
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>
> During our HA testing we have seen cases where yarn application logs are not 
> available through the cli but i can look at AM logs through the UI. RM was 
> also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications

2014-03-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951556#comment-13951556
 ] 

Jian He commented on YARN-1885:
---

Talked with Vinod offline, one thing we observed is that after RM restart, if 
the new AM is not scheduled any containers on the nodes on which its previous 
AM's containers were running, this NM won't get the application_completed 
signal from RM when the new AM completes and so this NM won't aggregate the 
logs for the previous containers of this application. But it should only be a 
leak of local container logs.  yarn logs -applicationId  shouldn't return 
nothing. it should at least return the latest AM's logs

> yarn logs command does not provide the application logs for some applications
> -
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>
> During our HA testing we have seen cases where yarn application logs are not 
> available through the cli but i can look at AM logs through the UI. RM was 
> also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1892) Excessive logging in RM

2014-03-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1892:


 Summary: Excessive logging in RM
 Key: YARN-1892
 URL: https://issues.apache.org/jira/browse/YARN-1892
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Minor


Mostly in the CS I believe

{code}
 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
 Application application_1395435468498_0011 reserved container 
container_1395435468498_0011_01_000213 on node host:  #containers=5 
available=4096 used=20960, currently has 1 at priority 4; currentReservation 
4096
{code}

{code}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
hive2 usedResources:  clusterResources:  currentCapacity 0.25 required  
potentialNewCapacity: 0.255 (  max-capacity: 0.25)
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1017) Document RM Restart feature

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951540#comment-13951540
 ] 

Hadoop QA commented on YARN-1017:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12637556/rm-restart-doc-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3487//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3487//console

This message is automatically generated.

> Document RM Restart feature
> ---
>
> Key: YARN-1017
> URL: https://issues.apache.org/jira/browse/YARN-1017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Blocker
> Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, 
> rm-restart-doc-3.patch
>
>
> This should give users a general idea about how RM Restart works and how to 
> use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951485#comment-13951485
 ] 

Hudson commented on YARN-1891:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5427 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5427/])
YARN-1891. Added documentation for NodeManager health-monitoring. Contributed 
by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582891)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManager.apt.vm


> Document NodeManager health-monitoring
> --
>
> Key: YARN-1891
> URL: https://issues.apache.org/jira/browse/YARN-1891
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: apache-yarn-1891.0.patch
>
>
> Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1017) Document RM Restart feature

2014-03-28 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1017:
--

Attachment: rm-restart-doc-3.patch

Thanks Vinod for the review, updated the doc

> Document RM Restart feature
> ---
>
> Key: YARN-1017
> URL: https://issues.apache.org/jira/browse/YARN-1017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Blocker
> Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch, 
> rm-restart-doc-3.patch
>
>
> This should give users a general idea about how RM Restart works and how to 
> use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951459#comment-13951459
 ] 

Vinod Kumar Vavilapalli commented on YARN-1891:
---

Great, this looks good, +1. Checking this in.

> Document NodeManager health-monitoring
> --
>
> Key: YARN-1891
> URL: https://issues.apache.org/jira/browse/YARN-1891
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
> Attachments: apache-yarn-1891.0.patch
>
>
> Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-03-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951458#comment-13951458
 ] 

Sandy Ryza commented on YARN-1864:
--

With your proposal, would it be possible to choose different parent queues 
depending on the submission?  A couple policies we'd like to ultimately be able 
to support would be "choose the parent queue based on the user's primary group, 
and then place the job into a queue underneath it with the user's name" and "if 
a user is any of a defined set of users, put it in a particular parent queue, 
and then create a child queue underneath it with the user's name".

> Fair Scheduler Dynamic Hierarchical User Queues
> ---
>
> Key: YARN-1864
> URL: https://issues.apache.org/jira/browse/YARN-1864
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-1864-v1.txt
>
>
> In Fair Scheduler, we want to be able to create user queues under any parent 
> queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
> called root.allUserQueues, we want be able to create a new queue called 
> root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
> by this user to root.allUserQueues will be run in this newly created 
> root.allUserQueues.user1.
> This is very similar to the 'user-as-default' feature in Fair Scheduler which 
> creates user queues under root queue. But we want the ability to create user 
> queues under ANY parent queue.
> Why do we want this ?
> 1. Preemption : these dynamically created user queues can preempt each other 
> if its fair share is not met. So there is fairness among users.
> User queues can also preempt other non-user leaf queue as well if below fair 
> share.
> 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
> only a fraction of resources in the shared cluster. By creating this 
> feature,we could do that by giving a fair share to the parent user queue 
> which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951456#comment-13951456
 ] 

Hadoop QA commented on YARN-596:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637524/YARN-596.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3486//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3486//console

This message is automatically generated.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1017) Document RM Restart feature

2014-03-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951427#comment-13951427
 ] 

Vinod Kumar Vavilapalli commented on YARN-1017:
---

+1, except for a couple of places where you say "configures" instead of 
"configuration".

> Document RM Restart feature
> ---
>
> Key: YARN-1017
> URL: https://issues.apache.org/jira/browse/YARN-1017
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>Priority: Blocker
> Attachments: rm-restart-doc-1.patch, rm-restart-doc-2.patch
>
>
> This should give users a general idea about how RM Restart works and how to 
> use RM Restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-03-28 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-596:
-

Attachment: YARN-596.patch

Update a new patch without cloneQueueApps(). Instead, add preemptedResources in 
the FSSchedulerApp.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951290#comment-13951290
 ] 

Hudson commented on YARN-1883:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5424 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5424/])
YARN-1883. TestRMAdminService fails due to inconsistent entries in UserGroups 
(Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1582862)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


> TestRMAdminService fails due to inconsistent entries in UserGroups
> --
>
> Key: YARN-1883
> URL: https://issues.apache.org/jira/browse/YARN-1883
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Fix For: 3.0.0, 2.5.0
>
> Attachments: YARN-1883.patch, YARN-1883.patch
>
>
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
> with the following error:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
> {noformat}
> Line Numbers will be inconsistent as I was testing to run it in a particular 
> order. But the Line on which the failure occurs is
> {code}
> Assert.assertTrue(groupBefore.contains("test_group_A")
> && groupBefore.contains("test_group_B")
> && groupBefore.contains("test_group_C") && groupBefore.size() == 3);
> {code}
> testRMInitialsWithFileSystemBasedConfigurationProvider() and
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
> calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
> the list of userGroups.
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
> tries to verify the groups before changing it and fails if 
> testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
> the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1883) TestRMAdminService fails due to inconsistent entries in UserGroups

2014-03-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951270#comment-13951270
 ] 

Jonathan Eagles commented on YARN-1883:
---

+1. Thanks for cleaning this test up. The double bracket initialization that 
was there before is considered a hack since it is creating an anonymous 
subclass with a static initialization. Committing to trunk and branch-2.

jeagles

> TestRMAdminService fails due to inconsistent entries in UserGroups
> --
>
> Key: YARN-1883
> URL: https://issues.apache.org/jira/browse/YARN-1883
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: java7
> Attachments: YARN-1883.patch, YARN-1883.patch
>
>
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails 
> with the following error:
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
> {noformat}
> Line Numbers will be inconsistent as I was testing to run it in a particular 
> order. But the Line on which the failure occurs is
> {code}
> Assert.assertTrue(groupBefore.contains("test_group_A")
> && groupBefore.contains("test_group_B")
> && groupBefore.contains("test_group_C") && groupBefore.size() == 3);
> {code}
> testRMInitialsWithFileSystemBasedConfigurationProvider() and
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
> calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes 
> the list of userGroups.
> testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() 
> tries to verify the groups before changing it and fails if 
> testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made 
> the changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950749#comment-13950749
 ] 

Fengdong Yu commented on YARN-1889:
---

Good catch, Zhiguo.

can you add some test cases in your patch?
please replace 'tab' in your code with 'space'.

{code}
+  private Priority priority = recordFactory.newRecordInstance(Priority.class);
+  private ResourceWeights resourceWeights = new ResourceWeights();
{code}

can you add these to the constructor?

{code}
+  public ResourceWeights getResourceWeightsObject() {
+   return resourceWeights;
+  }
{code}

It would be better for the name "getResourceWeights()"


> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUp

[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-03-28 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951126#comment-13951126
 ] 

Wei Yan commented on YARN-596:
--

Thought that approach before. But I forget why give up.
I'll update a patch using that approach.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions

2014-03-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951040#comment-13951040
 ] 

Sandy Ryza commented on YARN-596:
-

Cloning the full queue/app tree makes preemption O(apps) vs. O(log(apps)), 
which seems expensive to do every time we want to preempt.

Could we possibly add a field to Schedulable called something like 
preemptedResources that gets subtracted when calculating getResourceUsage, and 
cleared out at the end of preemptResources?

Also, minor nit: better to use "sched" than "sche" for consistency with other 
places in the code.

> In fair scheduler, intra-application container priorities affect 
> inter-application preemption decisions
> ---
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1853) Allow containers to be ran under real user even in insecure mode

2014-03-28 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1853:
---

Component/s: resourcemanager

> Allow containers to be ran under real user even in insecure mode
> 
>
> Key: YARN-1853
> URL: https://issues.apache.org/jira/browse/YARN-1853
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.2.0
>Reporter: Andrey Stepachev
> Attachments: YARN-1853.patch, YARN-1853.patch
>
>
> Currently unsecure cluster runs all containers under one user (typically 
> nobody). That is not appropriate, because yarn applications doesn't play well 
> with hdfs having enabled permissions. Yarn applications try to write data (as 
> expected) into /user/nobody regardless of user, who launched application.
> Another sideeffect is that it is not possible to configure cgroups for 
> particular users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications

2014-03-28 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950976#comment-13950976
 ] 

Arpit Gupta commented on YARN-1885:
---

yarn logs -applicationId returned nothing. And the application was done when i 
checked the RM UI.

> yarn logs command does not provide the application logs for some applications
> -
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>
> During our HA testing we have seen cases where yarn application logs are not 
> available through the cli but i can look at AM logs through the UI. RM was 
> also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950999#comment-13950999
 ] 

Sandy Ryza commented on YARN-1889:
--

When you say "gc pressure", which is going down?  The number of gc's or the 
time spent in each gc (or both)?

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUpdate in average: %d us\n",
> (end - start)/(hb_num * 1000));
> System.out.println("GC stats after NodeUpdate processing:");
> printGCStats();
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951002#comment-13951002
 ] 

Sandy Ryza commented on YARN-1889:
--

Another nit:
Priority.newInstance should be used instead of recordFactory

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUpdate in average: %d us\n",
> (end - start)/(hb_num * 1000));
> System.out.println("GC stats after NodeUpdate processing:");
> printGCStats();
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1853) Allow containers to be ran under real user even in insecure mode

2014-03-28 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev updated YARN-1853:
---

Attachment: YARN-1853.patch

Updated patch. RMAppManager should check existence of user before submitting 
app in insecure mode and reject if no user found.
(this patch defensive, check user only in non-impersonate mode).

> Allow containers to be ran under real user even in insecure mode
> 
>
> Key: YARN-1853
> URL: https://issues.apache.org/jira/browse/YARN-1853
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Andrey Stepachev
> Attachments: YARN-1853.patch, YARN-1853.patch
>
>
> Currently unsecure cluster runs all containers under one user (typically 
> nobody). That is not appropriate, because yarn applications doesn't play well 
> with hdfs having enabled permissions. Yarn applications try to write data (as 
> expected) into /user/nobody regardless of user, who launched application.
> Another sideeffect is that it is not possible to configure cgroups for 
> particular users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1891) Document NodeManager health-monitoring

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950624#comment-13950624
 ] 

Hadoop QA commented on YARN-1891:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12637374/apache-yarn-1891.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3485//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3485//console

This message is automatically generated.

> Document NodeManager health-monitoring
> --
>
> Key: YARN-1891
> URL: https://issues.apache.org/jira/browse/YARN-1891
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
> Attachments: apache-yarn-1891.0.patch
>
>
> Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950590#comment-13950590
 ] 

Hadoop QA commented on YARN-1703:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637370/YARN-1703.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3484//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3484//console

This message is automatically generated.

> Too many connections are opened for proxy server when applicationMaster UI is 
> accessed.
> ---
>
> Key: YARN-1703
> URL: https://issues.apache.org/jira/browse/YARN-1703
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-1703.1.patch, YARN-1703.2.patch
>
>
> If running job is accessed for progress. there many CLOSE_WAIT connections 
> for proxyserver. Eventhough connection is released, it  makes  available 
> again to the HttpClient instance, but does not close it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1891) Document NodeManager health-monitoring

2014-03-28 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1891:


Attachment: apache-yarn-1891.0.patch

Patch to create NodeManager documentation and document health monitoring.

> Document NodeManager health-monitoring
> --
>
> Key: YARN-1891
> URL: https://issues.apache.org/jira/browse/YARN-1891
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
> Attachments: apache-yarn-1891.0.patch
>
>
> Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1891) Document NodeManager health-monitoring

2014-03-28 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-1891:
---

 Summary: Document NodeManager health-monitoring
 Key: YARN-1891
 URL: https://issues.apache.org/jira/browse/YARN-1891
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor


Start documenting node manager starting with the health monitoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950576#comment-13950576
 ] 

Hadoop QA commented on YARN-1889:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637367/YARN-1889.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3483//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3483//console

This message is automatically generated.

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.h

[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950566#comment-13950566
 ] 

Rohith commented on YARN-1890:
--

Should log priority change to DEBUG ? I do not understand why so many GET 
request are being sent?:-(

> Too many unnecessary logs are logged while accessing applicationMaster web UI.
> --
>
> Key: YARN-1890
> URL: https://issues.apache.org/jira/browse/YARN-1890
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
>
> Accessing applicationMaster UI which is redirected from RM UI, logs too many 
> logs in ResourceManager logs and ProxyServer logs. On every refresh, logging 
> is done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are 
> filled with UI information logs which are not really necessary for user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950564#comment-13950564
 ] 

Rohith commented on YARN-1890:
--

Below logs are logging at one shot on refresh. 
{noformat}
2014-03-28 15:48:24,456 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked http://host-10-18-40-71:42769/mapreduce which is the app master GUI 
of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,506 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked http://host-10-18-40-71:42769/static/yarn.css which is the app master 
GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,507 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked http://host-10-18-40-71:42769/static/jquery/jquery-1.8.2.min.js which 
is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,508 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/jquery-ui-1.9.1.custom.min.js which 
is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,510 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/dt-1.9.4/js/jquery.dataTables.min.js which 
is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,511 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/jquery-ui.css 
which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,548 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked http://host-10-18-40-71:42769/static/dt-1.9.4/css/jui-dt.css which is 
the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,626 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked http://host-10-18-40-71:42769/static/yarn.dt.plugins.js which is the 
app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,836 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_95_fef1ec_1x400.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,841 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_flat_75_ff_40x100.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,841 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_highlight-soft_75_cc_1x100.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,843 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-icons_88_256x240.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,844 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-icons_454545_256x240.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,844 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_75_e6e6e6_1x400.png
 which is the app master GUI of application_1395977591056_0008 owned by root
2014-03-28 15:48:24,871 INFO 
org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing 
unchecked 
http://host-10-18-40-71:42769/static/jquery/themes-1.9.1/base/images/ui-bg_glass_65_ff_1x400.png
 which is the app master GUI of application_1395977591056_0008 owned by root

{noformat}

> Too many unnecessary logs are logged while accessing applicationMaster web UI.
> --
>
> Key: YARN-1890
> URL: https://issues.apache.org/jira/browse/YARN-1890
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
>
> Accessing applicationMaster UI which is redirected from RM UI, logs too many 
> logs in ResourceManager

[jira] [Created] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)
Rohith created YARN-1890:


 Summary: Too many unnecessary logs are logged while accessing 
applicationMaster web UI.
 Key: YARN-1890
 URL: https://issues.apache.org/jira/browse/YARN-1890
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
Priority: Minor


Accessing applicationMaster UI which is redirected from RM UI, logs too many 
logs in ResourceManager logs and ProxyServer logs. On every refresh, logging is 
done at WebAppProxyServlet.doGet(). All my RM and Proxyserver logs are filled 
with UI information logs which are not really necessary for user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1703:
-

Attachment: YARN-1703.2.patch

Updated patch rebasing to latest code. I verified after this change, it is 
wroking. Please review the patch..

> Too many connections are opened for proxy server when applicationMaster UI is 
> accessed.
> ---
>
> Key: YARN-1703
> URL: https://issues.apache.org/jira/browse/YARN-1703
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-1703.1.patch, YARN-1703.2.patch
>
>
> If running job is accessed for progress. there many CLOSE_WAIT connections 
> for proxyserver. Eventhough connection is released, it  makes  available 
> again to the HttpClient instance, but does not close it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950552#comment-13950552
 ] 

Rohith commented on YARN-1703:
--

I am attaching connection established for accesing 1 application master ui. On 
every manuall refresh , it is incrementing more than 30 connections.


{noformat}
host-10-18-40-71:/home/OpenSource/hadoop-3.0.0-SNAPSHOT/sbin # netstat -tnpla | 
grep 
tcp0  0 0.0.0.0:49960.0.0.0:*   LISTEN  
27664/java
tcp0  0 10.18.40.71:45029   :::*LISTEN  
27664/java
tcp1  0 10.18.40.71:58696   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:10375   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:27934   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:29802   10.18.40.77:45022   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:43589   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45347   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:10980   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:37156   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:37989   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:18116   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:50766   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:18836   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:19376   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:25214   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:54847   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:12605   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:50339   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:49870   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:50956   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:16896   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:43026   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:12091   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:49205   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:29968   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:34355   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:54701   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:47084   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:32938   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:28034   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:40430   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:56572   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:60771   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:39165   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:20341   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:22649   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp1  0 10.18.40.71:23049   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:52172   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:57790   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:17880   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:22911   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:52107   10.18.40.71:56297   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:38345   ESTABLISHED 
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:40127   ESTABLISHED 
27664/java
tcp1  0 10.18.40.71:18287   10.18.40.71:56297   CLOSE_WAIT  
27664/java
tcp0  0 10.18.40.71:45029   10.18.40.77:30754   ESTABLISHED 
27664/java

{noformat}

> Too many connections are opened for proxy server when applicationMaster UI is 
> accessed.
> -

[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1703:
-

Priority: Critical  (was: Major)

> Too many connections are opened for proxy server when applicationMaster UI is 
> accessed.
> ---
>
> Key: YARN-1703
> URL: https://issues.apache.org/jira/browse/YARN-1703
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-1703.1.patch
>
>
> If running job is accessed for progress. there many CLOSE_WAIT connections 
> for proxyserver. Eventhough connection is released, it  makes  available 
> again to the HttpClient instance, but does not close it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1703:
-

Summary: Too many connections are opened for proxy server when 
applicationMaster UI is accessed.  (was: There many CLOSE_WAIT connections  for 
proxy server.)

> Too many connections are opened for proxy server when applicationMaster UI is 
> accessed.
> ---
>
> Key: YARN-1703
> URL: https://issues.apache.org/jira/browse/YARN-1703
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-1703.1.patch
>
>
> If running job is accessed for progress. there many CLOSE_WAIT connections 
> for proxyserver. Eventhough connection is released, it  makes  available 
> again to the HttpClient instance, but does not close it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-1889:
--

Attachment: YARN-1889.patch

> avoid creating new objects on each fair scheduler call to AppSchedulable 
> comparator
> ---
>
> Key: YARN-1889
> URL: https://issues.apache.org/jira/browse/YARN-1889
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-1889.patch
>
>
> In fair scheduler, in each scheduling attempt, a full sort is
> performed on List of AppSchedulable, which invokes Comparator.compare
> method many times. Both FairShareComparator and DRFComparator call
> AppSchedulable.getWeights, and AppSchedulable.getPriority.
> A new ResourceWeights object is allocated on each call of getWeights,
> and the same for getPriority. This introduces a lot of pressure to
> GC because these methods are called very very frequently.
> Below test case shows improvement on performance and GC behaviour. The 
> results show that the GC pressure during processing NodeUpdate is recuded 
> half by this patch.
> The code to show the improvement: (Add it to TestFairScheduler.java)
> import java.lang.management.GarbageCollectorMXBean;
> import java.lang.management.ManagementFactory;
>   public void printGCStats() {
> long totalGarbageCollections = 0;
> long garbageCollectionTime = 0;
> for(GarbageCollectorMXBean gc :
>   ManagementFactory.getGarbageCollectorMXBeans()) {
>   long count = gc.getCollectionCount();
>   if(count >= 0) {
> totalGarbageCollections += count;
>   }
>   long time = gc.getCollectionTime();
>   if(time >= 0) {
> garbageCollectionTime += time;
>   }
> }
> System.out.println("Total Garbage Collections: "
> + totalGarbageCollections);
> System.out.println("Total Garbage Collection Time (ms): "
> + garbageCollectionTime);
>   }
>   @Test
>   public void testImpactOnGC() throws Exception {
> scheduler.reinitialize(conf, resourceManager.getRMContext());
> // Add nodes
> int numNode = 1;
> for (int i = 0; i < numNode; ++i) {
> String host = String.format("192.1.%d.%d", i/256, i%256);
> RMNode node =
> MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
> host);
> NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
> scheduler.handle(nodeEvent);
> assertEquals(1024 * 64 * (i+1), 
> scheduler.getClusterCapacity().getMemory());
> }
> assertEquals(numNode, scheduler.getNumClusterNodes());
> assertEquals(1024 * 64 * numNode, 
> scheduler.getClusterCapacity().getMemory());
> // add apps, each app has 100 containers.
> int minReqSize =
> 
> FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
> int numApp = 8000;
> int priority = 1;
> for (int i = 1; i < numApp + 1; ++i) {
> ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
> AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
> attemptId.getApplicationId(), "queue1", "user1");
> scheduler.handle(appAddedEvent);
> AppAttemptAddedSchedulerEvent attemptAddedEvent =
> new AppAttemptAddedSchedulerEvent(attemptId, false);
> scheduler.handle(attemptAddedEvent);
> createSchedulingRequestExistingApplication(minReqSize * 2, 1, 
> priority, attemptId);
> }
> scheduler.update();
> assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
> true)
> .getRunnableAppSchedulables().size());
> System.out.println("GC stats before NodeUpdate processing:");
> printGCStats();
> int hb_num = 5000;
> long start = System.nanoTime();
> for (int i = 0; i < hb_num; ++i) {
>   String host = String.format("192.1.%d.%d", i/256, i%256);
>   RMNode node =
>   MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
> host);
>   NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
>   scheduler.handle(nodeEvent);
> }
> long end = System.nanoTime();
> System.out.printf("processing time for a NodeUpdate in average: %d us\n",
> (end - start)/(hb_num * 1000));
> System.out.println("GC stats after NodeUpdate processing:");
> printGCStats();
>   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1889) avoid creating new objects on each fair scheduler call to AppSchedulable comparator

2014-03-28 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-1889:
-

 Summary: avoid creating new objects on each fair scheduler call to 
AppSchedulable comparator
 Key: YARN-1889
 URL: https://issues.apache.org/jira/browse/YARN-1889
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Priority: Minor


In fair scheduler, in each scheduling attempt, a full sort is
performed on List of AppSchedulable, which invokes Comparator.compare
method many times. Both FairShareComparator and DRFComparator call
AppSchedulable.getWeights, and AppSchedulable.getPriority.

A new ResourceWeights object is allocated on each call of getWeights,
and the same for getPriority. This introduces a lot of pressure to
GC because these methods are called very very frequently.

Below test case shows improvement on performance and GC behaviour. The results 
show that the GC pressure during processing NodeUpdate is recuded half by this 
patch.

The code to show the improvement: (Add it to TestFairScheduler.java)

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
  public void printGCStats() {
long totalGarbageCollections = 0;
long garbageCollectionTime = 0;

for(GarbageCollectorMXBean gc :
  ManagementFactory.getGarbageCollectorMXBeans()) {
  long count = gc.getCollectionCount();
  if(count >= 0) {
totalGarbageCollections += count;
  }

  long time = gc.getCollectionTime();
  if(time >= 0) {
garbageCollectionTime += time;
  }
}

System.out.println("Total Garbage Collections: "
+ totalGarbageCollections);
System.out.println("Total Garbage Collection Time (ms): "
+ garbageCollectionTime);
  }

  @Test
  public void testImpactOnGC() throws Exception {
scheduler.reinitialize(conf, resourceManager.getRMContext());

// Add nodes
int numNode = 1;

for (int i = 0; i < numNode; ++i) {
String host = String.format("192.1.%d.%d", i/256, i%256);
RMNode node =
MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), i, 
host);
NodeAddedSchedulerEvent nodeEvent = new NodeAddedSchedulerEvent(node);
scheduler.handle(nodeEvent);
assertEquals(1024 * 64 * (i+1), 
scheduler.getClusterCapacity().getMemory());
}
assertEquals(numNode, scheduler.getNumClusterNodes());
assertEquals(1024 * 64 * numNode, 
scheduler.getClusterCapacity().getMemory());

// add apps, each app has 100 containers.
int minReqSize =
FairSchedulerConfiguration.DEFAULT_RM_SCHEDULER_INCREMENT_ALLOCATION_MB;
int numApp = 8000;
int priority = 1;

for (int i = 1; i < numApp + 1; ++i) {
ApplicationAttemptId attemptId = createAppAttemptId(i, 1);
AppAddedSchedulerEvent appAddedEvent = new AppAddedSchedulerEvent(
attemptId.getApplicationId(), "queue1", "user1");
scheduler.handle(appAddedEvent);
AppAttemptAddedSchedulerEvent attemptAddedEvent =
new AppAttemptAddedSchedulerEvent(attemptId, false);
scheduler.handle(attemptAddedEvent);
createSchedulingRequestExistingApplication(minReqSize * 2, 1, priority, 
attemptId);
}
scheduler.update();

assertEquals(numApp, scheduler.getQueueManager().getLeafQueue("queue1", 
true)
.getRunnableAppSchedulables().size());

System.out.println("GC stats before NodeUpdate processing:");
printGCStats();
int hb_num = 5000;
long start = System.nanoTime();
for (int i = 0; i < hb_num; ++i) {
  String host = String.format("192.1.%d.%d", i/256, i%256);
  RMNode node =
  MockNodes.newNodeInfo(1, Resources.createResource(1024 * 64), 5000, 
host);
  NodeUpdateSchedulerEvent nodeEvent = new NodeUpdateSchedulerEvent(node);
  scheduler.handle(nodeEvent);
}
long end = System.nanoTime();

System.out.printf("processing time for a NodeUpdate in average: %d us\n",
(end - start)/(hb_num * 1000));

System.out.println("GC stats after NodeUpdate processing:");
printGCStats();
  }





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port

2014-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950521#comment-13950521
 ] 

Hadoop QA commented on YARN-1888:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637354/YARN-1888.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3482//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3482//console

This message is automatically generated.

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have 
> different port
> 
>
> Key: YARN-1888
> URL: https://issues.apache.org/jira/browse/YARN-1888
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: zhaoyunjiong
>Priority: Minor
> Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
> inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port

2014-03-28 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated YARN-1888:
---

Attachment: YARN-1888.patch

When RMNodeImpl do DeactivateNodeTransition, check whether there is already a 
new NodeManager with different port, if yes, don't add it to "Losts Nodes".

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have 
> different port
> 
>
> Key: YARN-1888
> URL: https://issues.apache.org/jira/browse/YARN-1888
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: zhaoyunjiong
>Priority: Minor
> Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
> inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port

2014-03-28 Thread zhaoyunjiong (JIRA)
zhaoyunjiong created YARN-1888:
--

 Summary: Not add NodeManager to inactiveRMNodes when reboot 
NodeManager which have different port
 Key: YARN-1888
 URL: https://issues.apache.org/jira/browse/YARN-1888
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: zhaoyunjiong
Priority: Minor


When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)