[jira] [Commented] (YARN-6637) Deadlock in NativeIO

2017-05-23 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022273#comment-16022273
 ] 

Ajith S commented on YARN-6637:
---

A flavor of https://bugs.openjdk.java.net/browse/JDK-8037567  but not same, 
rather caused by application code

> Deadlock in NativeIO
> 
>
> Key: YARN-6637
> URL: https://issues.apache.org/jira/browse/YARN-6637
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6637) Deadlock in NativeIO

2017-05-23 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022270#comment-16022270
 ] 

Ajith S commented on YARN-6637:
---

Below are bits from stacktrace:

Thread1
{code}
  java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:224)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:208)
{code}

Thread2
{code}
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
at 
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
at 
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
{code}


The above threads looks to be blocked by below two stacks

Stack1:
{code}
"New I/O worker #1" #135 prio=5 os_prio=0 tid=0x7f1f60817800 nid=0x697d in 
Object.wait() [0x7f1f4429a000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.(NativeIO.java:184)
at 
org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
{code}

Stack2:
{code}
"ContainersLauncher #16" #365 prio=5 os_prio=0 tid=0x7f1f49c8a800 
nid=0x7cd0 in Object.wait() [0x7f1f32891000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO.(NativeIO.java:645)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
{code}

*Stack1* is blocked by *Stack2* as *Stack1* thread needs *NativeIO* class 
initialization to finish, so the problematic stack looks to be Stack 2
Next in Stack 2 @  NativeIO.java:645 initNative in a native call, and it tries 
to initialize the native-hadoop library

so in Stack 2: it try to do this in NativeIO.c via 
*Java_org_apache_hadoop_io_nativeio_NativeIO_initNative*
{code}static void consts_init(JNIEnv *env) {
  jclass clazz = (*env)->FindClass(env, NATIVE_IO_POSIX_CLASS);
  #define NATIVE_IO_POSIX_CLASS "org/apache/hadoop/io/nativeio/NativeIO$POSIX"
   i.e create class org.apache.hadoop.io.nativeio.NativeIO$POSIX{code}
   
but *Stack1* is already in 
{{org.apache.hadoop.io.nativeio.NativeIO$POSIX.}}
so it deadlock and all threads hang

> Deadlock in NativeIO
> 
>
> Key: YARN-6637
> URL: https://issues.apache.org/jira/browse/YARN-6637
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6637) Deadlock in NativeIO

2017-05-23 Thread Ajith S (JIRA)
Ajith S created YARN-6637:
-

 Summary: Deadlock in NativeIO
 Key: YARN-6637
 URL: https://issues.apache.org/jira/browse/YARN-6637
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ajith S
Assignee: Ajith S
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6140) start time key in NM leveldb store should be removed when container is removed

2017-02-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851220#comment-15851220
 ] 

Ajith S commented on YARN-6140:
---

Would like to work on this, assign to self, please feel free to assign back if 
you have started

> start time key in NM leveldb store should be removed when container is removed
> --
>
> Key: YARN-6140
> URL: https://issues.apache.org/jira/browse/YARN-6140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-5355
>Reporter: Sangjin Lee
>  Labels: yarn-5355-merge-blocker
>
> It appears that the start time key is not removed when the container is 
> removed. The key was introduced in YARN-5792.
> I found this while backporting the YARN-5355-branch-2 branch to our internal 
> branch loosely based on 2.6.0. The {{TestNMLeveldbStateStoreService}} test 
> was failing because of this.
> I'm not sure why we didn't see this earlier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6140) start time key in NM leveldb store should be removed when container is removed

2017-02-03 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-6140:
-

Assignee: Ajith S

> start time key in NM leveldb store should be removed when container is removed
> --
>
> Key: YARN-6140
> URL: https://issues.apache.org/jira/browse/YARN-6140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-5355
>Reporter: Sangjin Lee
>Assignee: Ajith S
>  Labels: yarn-5355-merge-blocker
>
> It appears that the start time key is not removed when the container is 
> removed. The key was introduced in YARN-5792.
> I found this while backporting the YARN-5355-branch-2 branch to our internal 
> branch loosely based on 2.6.0. The {{TestNMLeveldbStateStoreService}} test 
> was failing because of this.
> I'm not sure why we didn't see this earlier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2017-01-25 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837584#comment-15837584
 ] 

Ajith S edited comment on YARN-574 at 1/25/17 11:21 AM:


[~jlowe] thanks for the clarification. Attaching patch with suggested approach 
of controlling multiple heartbeats using a atomic counter. Please review


was (Author: ajithshetty):
Attaching patch with suggested approach of controlling multiple heartbeats 
using a atomic counter. Please review

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.05.patch, 
> YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2017-01-25 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-574:
-
Attachment: YARN-574.05.patch

Attaching patch with suggested approach of controlling multiple heartbeats 
using a atomic counter. Please review

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.05.patch, 
> YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2017-01-23 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834338#comment-15834338
 ] 

Ajith S commented on YARN-5547:
---

[~jlowe] thanks for the comments
I have updated the patch {without storing killed state + checkstyle issues}

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch, YARN-5547.04.patch, YARN-5547.05.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2017-01-23 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.05.patch

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch, YARN-5547.04.patch, YARN-5547.05.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2017-01-20 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831378#comment-15831378
 ] 

Ajith S commented on YARN-574:
--

Thanks for the input [~jlowe] agree with the race condition mentioned
To simplify the approach, can we instead track the size of the 
{{LinkedBlockingQueue}} passed to the executor and avoid doing the quick 
heartbeats incase the {{LinkedBlockingQueue}} size is greater than zero.?

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, 
> YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2017-01-19 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831325#comment-15831325
 ] 

Ajith S commented on YARN-5547:
---

Thanks for the detail explanation [~jlowe]
I have updated the patch with expected approach. Please review

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch, YARN-5547.04.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2017-01-19 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.04.patch

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch, YARN-5547.04.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6102:
--
Attachment: eventOrder.JPG

attaching event order for reference

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
> Attachments: eventOrder.JPG
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825803#comment-15825803
 ] 

Ajith S edited comment on YARN-6102 at 1/17/17 10:31 AM:
-

[~rohithsharma] and [~sunilg]
As i mentioned, the nodeheart beat has been triggered and received by 
resourcetracker service before service stop, but not yet passed to dispatcher 
so its drainEvent will not process it


was (Author: ajithshetty):
[~rohithsharma] 
As i mentioned, the nodeheart beat has been triggered and received by 
resourcetracker service before service stop, but not yet passed to dispatcher 
so its drainEvent will not process it

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825803#comment-15825803
 ] 

Ajith S commented on YARN-6102:
---

[~rohithsharma] 
As i mentioned, the nodeheart beat has been triggered and received by 
resourcetracker service before service stop, but not yet passed to dispatcher 
so its drainEvent will not process it

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825721#comment-15825721
 ] 

Ajith S commented on YARN-6102:
---

Here i can think of two approaches
1. We correct the flow of starting {{AsyncDispatcher}} in 
{{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
 i.e first register and then start
2. We pass the dispatcher object instance to active services and using that for 
all operations so that it avoids using the new dispatcher on failover for old 
events

Please suggest

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)
Ajith S created YARN-6102:
-

 Summary: On failover RM can crash due to unregistered event to 
AsyncDispatcher
 Key: YARN-6102
 URL: https://issues.apache.org/jira/browse/YARN-6102
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ajith S
Assignee: Ajith S
Priority: Critical


{code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
dispatcher thread
java.lang.Exception: No handler for registered for class 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
at java.lang.Thread.run(Thread.java:745)
2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}

The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
abnormally, after some analysis, i was able to reproduce.

Once the nodeHeartBeat is sent to RM, inside 
{{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
 before sending it to dispatcher through
{{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
if RM failover is called, the dispatcher is reset
The new dispatcher is however first started and then the events are registered 
at 
{{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}

So event order will look like
1. Send Node heartbeat to {{ResourceTrackerService}}
2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
call RM failover
3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
{{resetDispatcher();}} + {{createAndInitActiveServices();}} )

Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , the 
{{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher

This will cause the above error as at point of time when {{STATUS_UPDATE}} 
event is given to dispatcher in {{ResourceTrackerService}} , the new 
dispatcher(from the failover) may be started but not yet registered for events
Using same steps(with pausing JVM at debug), i was able to reproduce this in 
production cluster also. for {{STATUS_UPDATE}} active service event, when the 
service is yet to forward the event to RM dispatcher but a failover is called 
and dispatcher reset is between {{resetDispatcher();}} & 
{{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6092) Refreshing CapacityScheduler page throws NPE

2017-01-13 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821683#comment-15821683
 ] 

Ajith S commented on YARN-6092:
---

Hi [~rohithsharma] 
I would like to work on this, will assign to myself, feel free to assign back 
if you have started already

> Refreshing CapacityScheduler page throws NPE
> 
>
> Key: YARN-6092
> URL: https://issues.apache.org/jira/browse/YARN-6092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Ajith S
>
> It is observe that RM CapacityScheduler page throw NPE  in RM logs. 
> Application was running and asking for new allocation. 
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$HealthBlock.render(CapacitySchedulerPage.java:551)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30354)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:471)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6092) Refreshing CapacityScheduler page throws NPE

2017-01-13 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-6092:
-

Assignee: Ajith S

> Refreshing CapacityScheduler page throws NPE
> 
>
> Key: YARN-6092
> URL: https://issues.apache.org/jira/browse/YARN-6092
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Ajith S
>
> It is observe that RM CapacityScheduler page throw NPE  in RM logs. 
> Application was running and asking for new allocation. 
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$HealthBlock.render(CapacitySchedulerPage.java:551)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30354)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:471)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt

2017-01-11 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820160#comment-15820160
 ] 

Ajith S commented on YARN-6008:
---

I agree to this, will upload initial patch for the same shortly

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>Assignee: Ajith S
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817222#comment-15817222
 ] 

Ajith S commented on YARN-6072:
---

Thanks for the comments [~djp] [~jianhe] [~naganarasimha...@apache.org] and 
[~bibinchundatt]
I have considered all the comments and reworked on the patch. Please review

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are ad

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.03.branch-2.8.patch
YARN-6072.03.patch

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes 

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815035#comment-15815035
 ] 

Ajith S commented on YARN-6072:
---

These test failure on latest patch looks unrelated. Can someone please 
retrigger this.?

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and

[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2017-01-10 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815023#comment-15815023
 ] 

Ajith S commented on YARN-574:
--

Thanks [~naganarasimha...@apache.org] and [~varun_saxena] for the review 
comments
I have updated the patch as per review comments

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, 
> YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-574:
-
Attachment: YARN-574.04.patch

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, 
> YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6008) Fetch container list for failed application attempt

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6008:
--
Comment: was deleted

(was: Would it be sufficient to show only RUNNING containers of this attempt 
always.? to keep it in sync with RM attempt page where only running containers 
can be seen.?)

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>Assignee: Ajith S
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt

2017-01-10 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814588#comment-15814588
 ] 

Ajith S commented on YARN-6008:
---

Would it be sufficient to show only RUNNING containers of this attempt always.? 
to keep it in sync with RM attempt page where only running containers can be 
seen.?

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>Assignee: Ajith S
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6008) Fetch container list for failed application attempt

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-6008:
-

Assignee: Ajith S

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>Assignee: Ajith S
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6015:
--
Attachment: YARN-6015.02.branch-2.patch

attaching patch for branch-2
Pleas review

> AsyncDispatcher thread name can be set to improved debugging
> 
>
> Key: YARN-6015
> URL: https://issues.apache.org/jira/browse/YARN-6015
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ajith S
>Assignee: Ajith S
> Attachments: YARN-6015.01.patch, YARN-6015.02.branch-2.patch, 
> YARN-6015.02.patch
>
>
> Currently all the running instances of AsyncDispatcher have same thread name. 
> To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.01.branch-2.8.patch
YARN-6072.02.patch

handled checkstyle issue and updated

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.01.branch-2.patch

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.patch, YARN-6072.01.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.01.patch

attaching patch for trunk. will update for branch 2 and 2.8 shortly

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedEle

[jira] [Commented] (YARN-6017) node manager physical memory leak

2017-01-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811100#comment-15811100
 ] 

Ajith S commented on YARN-6017:
---

To clear my understanding, as per the smaps and jmap output, it looks like java 
heap *capacity = 724828160 (691.25MB)* is well within limits of 2GB
The [heap] shown in smaps output is not same as java heap, right.? i just ran a 
shell program and its smaps detail also contained detail about [heap]. The java 
heap is part of OS memory heap. So i feel this is the OS heap size *10193716*  
what we are talking about. Correct.?

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
> Attachments: 31169_smaps.txt, 31169_smaps.txt
>
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 008:1 libresolv-2.12.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---

[jira] [Assigned] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-6072:
-

Assignee: Ajith S  (was: Naganarasimha G R)

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> complete. So {{AdminService#server}} will be *null* which causes  
> {{AdminServi

[jira] [Assigned] (YARN-5816) TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still flakey

2017-01-06 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5816:
-

Assignee: Ajith S

> TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still 
> flakey
> ---
>
> Key: YARN-5816
> URL: https://issues.apache.org/jira/browse/YARN-5816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Reporter: Daniel Templeton
>Assignee: Ajith S
>Priority: Minor
>
> Even after YARN-5057, 
> TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still 
> flakey:
> {noformat}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.796 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> testCancelWithMultipleAppSubmissions(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer)
>   Time elapsed: 2.307 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testCancelWithMultipleAppSubmissions(TestDelegationTokenRenewer.java:1260)
> {noformat}
> Note that it's the same error as YARN-5057, but on a different line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2017-01-05 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6015:
--
Attachment: YARN-6015.02.patch

Thanks for the review. i have updated the patch based on your comments. please 
review

> AsyncDispatcher thread name can be set to improved debugging
> 
>
> Key: YARN-6015
> URL: https://issues.apache.org/jira/browse/YARN-6015
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ajith S
>Assignee: Ajith S
> Attachments: YARN-6015.01.patch, YARN-6015.02.patch
>
>
> Currently all the running instances of AsyncDispatcher have same thread name. 
> To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2017-01-05 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800932#comment-15800932
 ] 

Ajith S commented on YARN-5547:
---

Hi guys, sorry for delay. [~jlowe]  thanks for your comments. You are right, we 
can avoid storing killed state for container which will not be recovered. Also, 
for deleting the unknown keys,  would it be ok to remove unknown keys in 
{{NMLeveldbStateStoreService.loadContainerState(ContainerId, LeveldbIterator, 
String)}} .? As per the patch it will be after the warning log about the 
unknown keys
This will avoid any scanning of store hence forth avoid performance penalty

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt

2017-01-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794822#comment-15794822
 ] 

Ajith S commented on YARN-6008:
---

Hi
Would like to work on this, can i assign this.?

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6048) YarnClient API call that is supposed to return containers of a previous attempt returns containers of the current application attempt

2017-01-03 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6048:
--
Comment: was deleted

(was: Hi [~sunilg] and [~davidyan]
Would like to work on this, can assign this to me.?)

> YarnClient API call that is supposed to return containers of a previous 
> attempt returns containers of the current application attempt 
> --
>
> Key: YARN-6048
> URL: https://issues.apache.org/jira/browse/YARN-6048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver, yarn
>Affects Versions: 2.7.3
> Environment: Ubuntu 16.10, Hadoop 2.7.3, Java 1.8
>Reporter: David Yan
>
> I have enabled the timeline server.
> Sample code:
> {code}
> YarnClient yarnClient;
> ...
> List containerReports = 
> yarnClient.getContainers(somePreviousAttemptId);
> {code}
> YarnClient.getContainers(ApplicationAttemptId) returns the containers of the 
> *current* attempt instead of the given attempt. This is true up until the 
> application is actually terminated. After the application is terminated, the 
> call returns the correct container list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6048) YarnClient API call that is supposed to return containers of a previous attempt returns containers of the current application attempt

2017-01-03 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794818#comment-15794818
 ] 

Ajith S commented on YARN-6048:
---

Hi [~sunilg] and [~davidyan]
Would like to work on this, can assign this to me.?

> YarnClient API call that is supposed to return containers of a previous 
> attempt returns containers of the current application attempt 
> --
>
> Key: YARN-6048
> URL: https://issues.apache.org/jira/browse/YARN-6048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver, yarn
>Affects Versions: 2.7.3
> Environment: Ubuntu 16.10, Hadoop 2.7.3, Java 1.8
>Reporter: David Yan
>
> I have enabled the timeline server.
> Sample code:
> {code}
> YarnClient yarnClient;
> ...
> List containerReports = 
> yarnClient.getContainers(somePreviousAttemptId);
> {code}
> YarnClient.getContainers(ApplicationAttemptId) returns the containers of the 
> *current* attempt instead of the given attempt. This is true up until the 
> application is actually terminated. After the application is terminated, the 
> call returns the correct container list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2017-01-03 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6015:
--
Attachment: YARN-6015.01.patch

Here is the initial patch. Please review

> AsyncDispatcher thread name can be set to improved debugging
> 
>
> Key: YARN-6015
> URL: https://issues.apache.org/jira/browse/YARN-6015
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ajith S
>Assignee: Ajith S
> Attachments: YARN-6015.01.patch
>
>
> Currently all the running instances of AsyncDispatcher have same thread name. 
> To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2016-12-20 Thread Ajith S (JIRA)
Ajith S created YARN-6015:
-

 Summary: AsyncDispatcher thread name can be set to improved 
debugging
 Key: YARN-6015
 URL: https://issues.apache.org/jira/browse/YARN-6015
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ajith S
Assignee: Ajith S


Currently all the running instances of AsyncDispatcher have same thread name. 
To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-18 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Attachment: YARN-5988.04.patch

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-5988.01.patch, YARN-5988.02.patch, 
> YARN-5988.03.patch, YARN-5988.04.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-18 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Attachment: YARN-5988.03.patch

Thanks for review comments, i have handled and updated the patch. 

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-5988.01.patch, YARN-5988.02.patch, 
> YARN-5988.03.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-13 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Attachment: YARN-5988.02.patch

Have updated code and added test case. Please review

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-5988.01.patch, YARN-5988.02.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Attachment: YARN-5988.01.patch

Attaching initial patch. will add test case soon.

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-5988.01.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734863#comment-15734863
 ] 

Ajith S commented on YARN-5988:
---

{code}java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.refreshServiceAcls(ClientRMService.java:1271)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:557)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:653)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:723)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:595)
{code}

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Comment: was deleted

(was: {code}2016-12-09 14:54:27,028 ERROR [Thread-788] 
resourcemanager.AdminService (AdminService.java:refreshAll(733)) - Failure on 
refresh
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:591)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:581)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:729)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:326)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testRefreshQueuesWhenRMHA(TestFairScheduler.java:4888)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
2016-12-09 14:54:27,029 ERROR [Thread-788] resourcemanager.AdminService 
(AdminService.java:transitionToActive(328)) - RefreshAll failed so firing fatal 
event
org.apache.hadoop.ha.ServiceFailedException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:326)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testRefreshQueuesWhenRMHA(TestFairScheduler.java:4888)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17){code})

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Target Version/s: 2.9.0, 3.0.0-alpha2  (was: 2.8.0, 2.9.0, 2.7.4, 
3.0.0-alpha2, 2.6.6)

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5988:
--
Priority: Blocker  (was: Major)

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734801#comment-15734801
 ] 

Ajith S commented on YARN-5988:
---

{code}2016-12-09 14:54:27,028 ERROR [Thread-788] resourcemanager.AdminService 
(AdminService.java:refreshAll(733)) - Failure on refresh
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:591)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:581)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:729)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:326)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testRefreshQueuesWhenRMHA(TestFairScheduler.java:4888)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
2016-12-09 14:54:27,029 ERROR [Thread-788] resourcemanager.AdminService 
(AdminService.java:transitionToActive(328)) - RefreshAll failed so firing fatal 
event
org.apache.hadoop.ha.ServiceFailedException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:326)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testRefreshQueuesWhenRMHA(TestFairScheduler.java:4888)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17){code}

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ajith S
>Assignee: Ajith S
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5988) RM unable to start in secure setup

2016-12-09 Thread Ajith S (JIRA)
Ajith S created YARN-5988:
-

 Summary: RM unable to start in secure setup
 Key: YARN-5988
 URL: https://issues.apache.org/jira/browse/YARN-5988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ajith S
Assignee: Ajith S


When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-01 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5955:
-

Assignee: Ajith S

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: 5feixiang
>Assignee: Ajith S
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5955) Use threadpool or multiple thread to recover app

2016-12-01 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711870#comment-15711870
 ] 

Ajith S commented on YARN-5955:
---

I would like to work on this if you have not yet started

> Use threadpool or multiple thread to recover app
> 
>
> Key: YARN-5955
> URL: https://issues.apache.org/jira/browse/YARN-5955
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: 5feixiang
> Fix For: 2.7.1
>
>
> current app recovery is one by one,use thead pool can make recovery faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5876) TestResourceTrackerService#testGracefulDecommissionWithApp fails intermittently on trunk

2016-11-13 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5876:
-

Assignee: Ajith S

> TestResourceTrackerService#testGracefulDecommissionWithApp fails 
> intermittently on trunk
> 
>
> Key: YARN-5876
> URL: https://issues.apache.org/jira/browse/YARN-5876
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Ajith S
>
> {noformat}
> java.lang.AssertionError: node shouldn't be null
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:750)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionWithApp(TestResourceTrackerService.java:318)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13884/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5858) TestDiskFailures.testLogDirsFailures fails on trunk

2016-11-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659702#comment-15659702
 ] 

Ajith S commented on YARN-5858:
---

Thanks [~varun_saxena] for reporting this issue.
The LocalDirsHandlerService failed to detect the failure as its thread was 
interrupted from the exception below
{code}Exception in thread "DiskHealthMonitor-Timer" 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not initialize 
log dir 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1391)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1379)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:147)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:284)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:397)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:470)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505){code}
So, even the current run for disk failure detection failed and thus failing the 
test case
I suggest in 
{{org.apache.hadoop.yarn.server.TestDiskFailures.waitForDiskHealthCheck()}} we 
can check if 
{{org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLastDisksCheckTime()}}
 is updated, then only we can be sure if thread execution is successful and we 
can continue with our assertions

> TestDiskFailures.testLogDirsFailures fails on trunk
> ---
>
> Key: YARN-5858
> URL: https://issues.apache.org/jira/browse/YARN-5858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Ajith S
>Priority: Minor
>
> {noformat}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLogDirsFailures(TestDiskFailures.java:111)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13828/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5858) TestDiskFailures.testLogDirsFailures fails on trunk

2016-11-12 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5858:
-

Assignee: Ajith S

> TestDiskFailures.testLogDirsFailures fails on trunk
> ---
>
> Key: YARN-5858
> URL: https://issues.apache.org/jira/browse/YARN-5858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Ajith S
>Priority: Minor
>
> {noformat}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLogDirsFailures(TestDiskFailures.java:111)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13828/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2016-11-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659659#comment-15659659
 ] 

Ajith S commented on YARN-5857:
---

Guess this issue is result of test case relying on 
java.util.concurrent.ThreadPoolExecutor.getActiveCount()
As the javadoc for this method 
(https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount()
 ) says, it return the approximate number at the moment.
I feel we can rewrite this testcase using countdown latches to make test case 
more robust, please suggest

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Ajith S
>Priority: Minor
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5857) TestLogAggregationService.testFixedSizeThreadPool fails intermittently on trunk

2016-11-12 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5857:
-

Assignee: Ajith S

> TestLogAggregationService.testFixedSizeThreadPool fails intermittently on 
> trunk
> ---
>
> Key: YARN-5857
> URL: https://issues.apache.org/jira/browse/YARN-5857
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Ajith S
>Priority: Minor
>
> {noformat}
> testFixedSizeThreadPool(org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService)
>   Time elapsed: 0.11 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testFixedSizeThreadPool(TestLogAggregationService.java:1139)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13829/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5820) yarn node CLI help should be clearer

2016-11-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647175#comment-15647175
 ] 

Ajith S commented on YARN-5820:
---

using apache HelpFormatter wraps the character after certain line length, so 
that  is automatically sent to next line after formatting, hence the 
test case is modified to accommodate this

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch, YARN-5820.02.patch, 
> YARN-5820.03.patch, YARN-5820.04.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2016-11-07 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643868#comment-15643868
 ] 

Ajith S commented on YARN-574:
--

I have rebased and added testcase for the patch. Please review

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.03.patch, YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2016-11-07 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-574:


Assignee: Ajith S  (was: Omkar Vinit Joshi)

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
> Attachments: YARN-574.03.patch, YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2016-11-07 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643867#comment-15643867
 ] 

Ajith S commented on YARN-574:
--

I will take it over, if you are working on it, please reassign

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.03.patch, YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2016-11-07 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-574:
-
Attachment: YARN-574.03.patch

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.03.patch, YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-11-07 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643548#comment-15643548
 ] 

Ajith S commented on YARN-5547:
---

Thanks [~Naganarasimha] for your comments. I have attached the latest patch 
after handing review comments

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-11-07 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.03.patch

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5820) yarn node CLI help should be clearer

2016-11-06 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643334#comment-15643334
 ] 

Ajith S commented on YARN-5820:
---

Thanks [~sunilg] and [~Naganarasimha] for your comments
I have updated the patch based on comments. Please review

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch, YARN-5820.02.patch, 
> YARN-5820.03.patch, YARN-5820.04.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5820) yarn node CLI help should be clearer

2016-11-06 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5820:
--
Attachment: YARN-5820.04.patch

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch, YARN-5820.02.patch, 
> YARN-5820.03.patch, YARN-5820.04.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5820) yarn node CLI help should be clearer

2016-11-06 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5820:
--
Attachment: YARN-5820.03.patch

Fixed test case failures. Please review

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch, YARN-5820.02.patch, 
> YARN-5820.03.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5820) yarn node CLI help should be clearer

2016-11-05 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5820:
--
Attachment: YARN-5820.02.patch

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch, YARN-5820.02.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5826) HelpFormatter ordered with insertion order

2016-11-02 Thread Ajith S (JIRA)
Ajith S created YARN-5826:
-

 Summary: HelpFormatter ordered with insertion order
 Key: YARN-5826
 URL: https://issues.apache.org/jira/browse/YARN-5826
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ajith S
Assignee: Ajith S
Priority: Minor


Here is a sample help
{code}
usage: yarn node [-list [-states |-all] | -status ]

 -all   Works with -list to list all nodes.
 -list  List all running nodes. Supports optional use of
-states to filter nodes based on node state, all -all
to list all nodes.
 -statesWorks with -list to filter nodes based on input
comma-separated list of node states.
 -statusPrints the status report of the node.
{code}

instead, it will be better if the options were ordered
{code}

usage: yarn node [-list [-states |-all] | -status ]

 -list  List all running nodes.
 -statesWorks with -list to filter nodes based on input
comma-separated list of node states.
 -all   Works with -list to list all nodes.
 -statusPrints the status report of the node.
{code}

Currently the HelpFormatter provided by commons-cli-1.2 orders the options 
alphabetically while printing. Even though there is a option to add a custom 
comparator, it may be difficult. As of v1.3 we can just  do
{code}
HelpFormatter formatter = new HelpFormatter();
formatter.setOptionComparator(null);
{code}
so that sorting is skipped and the insertion order is maintained



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5820) yarn node CLI help should be clearer

2016-11-02 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5820:
--
Attachment: YARN-5820.01.patch

[~naganarasimha...@apache.org] as per offline discussion with you, i feel 
option 1 will be better. Please review

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
> Attachments: YARN-5820.01.patch
>
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5820) yarn node CLI help should be clearer

2016-11-02 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5820:
-

Assignee: Ajith S

> yarn node CLI help should be clearer
> 
>
> Key: YARN-5820
> URL: https://issues.apache.org/jira/browse/YARN-5820
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Grant Sohn
>Assignee: Ajith S
>Priority: Trivial
>
> Current message is:
> {noformat}
> usage: node
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> It should be either this:
> {noformat}
> usage: yarn node [-list [-states |-all] | -status ]
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> or that.
> {noformat}
> usage: yarn node -list [-states |-all] 
>yarn node -status 
>  -all   Works with -list to list all nodes.
>  -list  List all running nodes. Supports optional use of
> -states to filter nodes based on node state, all -all
> to list all nodes.
>  -statesWorks with -list to filter nodes based on input
> comma-separated list of node states.
>  -statusPrints the status report of the node.
> {noformat}
> The latter is the least ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-10-24 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.02.patch

Please review

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-09-14 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490429#comment-15490429
 ] 

Ajith S commented on YARN-5547:
---

As per offline discussion with [~Naganarasimha Garla] and [~varun_saxena] for 
{{If the old software could consult a table in the database that lists what 
keys are ignorable then it can fail for any unrecognized key that isn't in that 
list and safely ignore ones that are}} we can add a suffix for the keys if they 
are ignoreable so that even lower version will know if keys can be skipped 
safely

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-09-14 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490385#comment-15490385
 ] 

Ajith S commented on YARN-5547:
---

So we have two approaches discussed here
1. Either skip container recovery - this will cause unmonitered containers
2. Container killed/failed

I am ok with second approach, but as per [~jlowe] {{The NM has to unregister 
with a service as part of the container failure}} i don't see any solution for 
such scenario. If this case we can handle separately, i can update patch based 
on second approach

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5256) [YARN-3368] Add REST endpoint to support detailed NodeLabel Informations

2016-09-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473063#comment-15473063
 ] 

Ajith S commented on YARN-5256:
---

I have some suggestions
1. We need this patch so can this jira can be merged to branch-2 as this seems 
more generic and not specific to YARN-3368.?
2. When labels are null, can we return all the labels details.?
3. Instead of one label, can we accept set of labels.?

> [YARN-3368] Add REST endpoint to support detailed NodeLabel Informations
> 
>
> Key: YARN-5256
> URL: https://issues.apache.org/jira/browse/YARN-5256
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5256-YARN-3368.1.patch, YARN-5256-YARN-3368.2.patch
>
>
> Add a new REST endpoint to fetch few more detailed information about node 
> labels such as resource, list of nodes etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5584) Include name of JAR/Tar/Zip on failure to expand artifact on download

2016-08-30 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5584:
-

Assignee: Ajith S

> Include name of JAR/Tar/Zip on failure to expand artifact on download
> -
>
> Key: YARN-5584
> URL: https://issues.apache.org/jira/browse/YARN-5584
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Ajith S
>
> If yarn can't expand a JAR/ZIP/tar file on download, the exception is passed 
> back to the AM —but not the name of which file is failing. This makes it 
> harder to track down the problem than one would like.
> {code}
> java.util.zip.ZipException: invalid CEN header (bad signature)
>   at java.util.zip.ZipFile.open(Native Method)
>   at java.util.zip.ZipFile.(ZipFile.java:215)
>   at java.util.zip.ZipFile.(ZipFile.java:145)
>   at java.util.zip.ZipFile.(ZipFile.java:159)
>   at org.apache.hadoop.fs.FileUtil.unZip(FileUtil.java:589)
>   at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:277)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:362)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4202) TestYarnClient#testReservationAPIs fails intermittently

2016-08-26 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438866#comment-15438866
 ] 

Ajith S commented on YARN-4202:
---

Found that this issue is fixed by YARN-4686

> TestYarnClient#testReservationAPIs fails intermittently
> ---
>
> Key: YARN-4202
> URL: https://issues.apache.org/jira/browse/YARN-4202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Ajith S
>
> Found this failure while looking at the Pre-run on one of my Jiras.
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1149)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:428)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:465)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> Caused by: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:69)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1132)
>  ... 10 more
> {noformat}
> TestReport Link: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9243/testReport/
> When I ran this on my local box branch-2, it succeeds.
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.999 sec - 
> in org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Results :
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 52.029 s
> [INFO] Finished at: 2015-09-23T11:25:04-06:00
> [INFO] Final Memory: 31M/391M
> [INFO] 
> 
> {noformat}
> Haven't tried if it is a problem in branch-2.7 or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4202) TestYarnClient#testReservationAPIs fails intermittently

2016-08-26 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-4202:
-

Assignee: Ajith S  (was: nijel)

> TestYarnClient#testReservationAPIs fails intermittently
> ---
>
> Key: YARN-4202
> URL: https://issues.apache.org/jira/browse/YARN-4202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Ajith S
>
> Found this failure while looking at the Pre-run on one of my Jiras.
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1149)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:428)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:465)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> Caused by: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:69)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1132)
>  ... 10 more
> {noformat}
> TestReport Link: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9243/testReport/
> When I ran this on my local box branch-2, it succeeds.
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.999 sec - 
> in org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Results :
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 52.029 s
> [INFO] Finished at: 2015-09-23T11:25:04-06:00
> [INFO] Final Memory: 31M/391M
> [INFO] 
> 
> {noformat}
> Haven't tried if it is a problem in branch-2.7 or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4202) TestYarnClient#testReservationAPIs fails intermittently

2016-08-26 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438806#comment-15438806
 ] 

Ajith S commented on YARN-4202:
---

Hi [~nijel]

Would like to work on this, in case you are working on this, please feel free 
to assign back

> TestYarnClient#testReservationAPIs fails intermittently
> ---
>
> Key: YARN-4202
> URL: https://issues.apache.org/jira/browse/YARN-4202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: nijel
>
> Found this failure while looking at the Pre-run on one of my Jiras.
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1149)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:428)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:465)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> Caused by: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The planning algorithm could not find a valid allocation for your request
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:69)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1132)
>  ... 10 more
> {noformat}
> TestReport Link: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9243/testReport/
> When I ran this on my local box branch-2, it succeeds.
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.999 sec - 
> in org.apache.hadoop.yarn.client.api.impl.TestYarnClient
> Results :
> Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 52.029 s
> [INFO] Finished at: 2015-09-23T11:25:04-06:00
> [INFO] Final Memory: 31M/391M
> [INFO] 
> 
> {noformat}
> Haven't tried if it is a problem in branch-2.7 or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-08-26 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438766#comment-15438766
 ] 

Ajith S commented on YARN-5547:
---

Hi [~jlowe] 

I have attached the patch for handling the exception but i have a doubt about 
{{prevent leaking unrecognized keys.}} . can you please elaborate.?

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-08-26 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.01.patch

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-08-22 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432153#comment-15432153
 ] 

Ajith S commented on YARN-5547:
---

Thanks [~jlowe]. I would like to work on this. Incase you have already started 
working on this, please feel free to assign back

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-08-22 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5547:
-

Assignee: Ajith S

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2016-08-17 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425918#comment-15425918
 ] 

Ajith S commented on YARN-574:
--

Have a requirement for this. [~ojoshi] can i work on this.?

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5491) Random Failure TestCapacityScheduler#testCSQueueBlocked

2016-08-11 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-5491:
-

Assignee: Ajith S

> Random Failure TestCapacityScheduler#testCSQueueBlocked
> ---
>
> Key: YARN-5491
> URL: https://issues.apache.org/jira/browse/YARN-5491
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>
> Random testcase failure in trunk for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testCSQueueBlocked
> https://builds.apache.org/job/PreCommit-YARN-Build/12694/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacityScheduler/testCSQueueBlocked/
> {noformat}
> java.lang.AssertionError: B Used Resource should be 12 GB expected:<12288> 
> but was:<11264>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testCSQueueBlocked(TestCapacityScheduler.java:3667)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413425#comment-15413425
 ] 

Ajith S commented on YARN-2398:
---

The old test failures are fixed. These new set of failures doesn't look related 
to patch.

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-2398.02.patch, YARN-2398.patch
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-2398:
--
Attachment: YARN-2398.02.patch

We can check while waiting if any failover thread is not running, then testcase 
has done a explicit failover, hence wait can return true. Attaching updated 
patch with proposed fix.

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-2398.02.patch, YARN-2398.patch
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413122#comment-15413122
 ] 

Ajith S commented on YARN-2398:
---

The test failure seems to be because of 
{{org.apache.hadoop.yarn.client.ProtocolHATestBase.MiniYARNClusterForHATesting.CustomedClientRMService.getClusterMetrics(GetClusterMetricsRequest)}}
 called through 
{{org.apache.hadoop.yarn.client.ProtocolHATestBase.verifyConnections()}}
in {{org.apache.hadoop.yarn.client.ProtocolHATestBase.startHACluster(int, 
boolean, boolean, boolean)}} 
The method verifyConnection() i suppose to just check for connection, but 
verifyConnection internally also waits for a failover which is wrong
any thoughts.?

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-2398.patch
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-09 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413092#comment-15413092
 ] 

Ajith S commented on YARN-2398:
---

Thanks for review [~rohithsharma]
The testcase failures are related to the patch. Looks like patch uncovered some 
more bugs in testcode, will fix them and update the patch

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-2398.patch
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-2398:
--
Attachment: YARN-2398.patch

Attached with modification. Please review

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-2398.patch
>
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-08 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411790#comment-15411790
 ] 

Ajith S commented on YARN-2398:
---

I encountered with same scenario. I guess the bug is because of 
{{org.apache.hadoop.yarn.client.ProtocolHATestBase.MiniYARNClusterForHATesting.waittingForFailOver()}}

{code}
 private boolean waittingForFailOver() {
  int maximumWaittingTime = 50;
  int count = 0;
  while (!failoverTriggered.get() && count >= maximumWaittingTime) {
try {
  Thread.sleep(100);
} catch (InterruptedException e) {
  // DO NOTHING
}
count++;
  }
...
{code}

here it should be {{count <= maximumWaittingTime}}
As otherwise, the while loop will be exit at first check itself, and there is 
no actual wait hence it causes racecondition between failover and registerNM 
event

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reopened YARN-2398:
---

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2398) TestResourceTrackerOnHA crashes

2016-08-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-2398:
-

Assignee: Ajith S

> TestResourceTrackerOnHA crashes
> ---
>
> Key: YARN-2398
> URL: https://issues.apache.org/jira/browse/YARN-2398
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jason Lowe
>Assignee: Ajith S
>
> TestResourceTrackerOnHA is currently crashing and failing trunk builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-05-10 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-4989:
--
Attachment: YARN-4989.patch

Please review

> TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> intermittently 
> ---
>
> Key: YARN-4989
> URL: https://issues.apache.org/jira/browse/YARN-4989
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Ajith S
> Attachments: YARN-4989.patch
>
>
> Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> randomly.
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

2015-08-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694797#comment-14694797
 ] 

Ajith S commented on YARN-3924:
---

Adding to the suggestion, currently the *AdminService* in RM follows this, if 
in standby, it will throw StandbyException, we can implement in similar way to 
other RM services

> Submitting an application to standby ResourceManager should respond better 
> than Connection Refused
> --
>
> Key: YARN-3924
> URL: https://issues.apache.org/jira/browse/YARN-3924
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Dustin Cote
>Assignee: Ajith S
>Priority: Minor
>
> When submitting an application directly to a standby resource manager, the 
> resource manager responds with 'Connection Refused' rather than indicating 
> that it is a standby resource manager.  Because the resource manager is aware 
> of its own state, I feel like we can have the 8032 port open for standby 
> resource managers and reject the request with something like 'Cannot process 
> application submission from this standby resource manager'.  
> This would be especially helpful for debugging oozie problems when users put 
> in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
> address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused

2015-08-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694792#comment-14694792
 ] 

Ajith S commented on YARN-3924:
---

Hi [~kasha] and [~rohithsharma]

Thank you for inputs guys. The message *None of the RMs specified by ha-ids 
appear to be active.* is fine, but again, as me and [~cote] previously 
mentioned, this is not somehow making clear about two cases of (RMs being 
down,client config problems etc) vs (RMs in standby). I agree that in current 
design its hard to say this as rpc servers are started only if RMs are active. 
So if i could suggest, we can change this so that in case of RMs in standby, 
rpc servers are up and throw standby exception to client.?

> Submitting an application to standby ResourceManager should respond better 
> than Connection Refused
> --
>
> Key: YARN-3924
> URL: https://issues.apache.org/jira/browse/YARN-3924
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Dustin Cote
>Assignee: Ajith S
>Priority: Minor
>
> When submitting an application directly to a standby resource manager, the 
> resource manager responds with 'Connection Refused' rather than indicating 
> that it is a standby resource manager.  Because the resource manager is aware 
> of its own state, I feel like we can have the 8032 port open for standby 
> resource managers and reject the request with something like 'Cannot process 
> application submission from this standby resource manager'.  
> This would be especially helpful for debugging oozie problems when users put 
> in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM 
> address but rather point to a specific resource manager).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3689) FifoComparator logic is wrong. In method "compare" in "FifoPolicy.java" file, the "s1" and "s2" should change position when compare priority

2015-08-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694784#comment-14694784
 ] 

Ajith S commented on YARN-3689:
---

Please ignore previous comment, the current code is fine if {quote} *Higher 
Integer* indicates *Higher Priority* {quote} 
i overlooked it :)

> FifoComparator logic is wrong. In method "compare" in "FifoPolicy.java" file, 
> the "s1" and "s2" should change position when compare priority 
> -
>
> Key: YARN-3689
> URL: https://issues.apache.org/jira/browse/YARN-3689
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 2.5.0
>Reporter: zhoulinlin
>Assignee: Ajith S
>
> In method "compare" in "FifoPolicy.java" file, the "s1" and "s2" should 
> change position when compare priority.
> I did a test. Configured the schedulerpolicy "fifo",  submitted 2 jobs to the 
> same queue.
> The result is below:
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> before sort --  
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0001 appPririty:4  
> appStartTime:1432094170038
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0002 appPririty:2  
> appStartTime:1432094173131
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> after sort % 
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0001 appPririty:4  
> appStartTime:1432094170038  
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0002 appPririty:2  
> appStartTime:1432094173131  
> But when change the "s1" and "s2" position like below:
> public int compare(Schedulable s1, Schedulable s2) {
>   int res = s2.getPriority().compareTo(s1.getPriority());
> .}
> The result:
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> before sort -- 
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0009 appPririty:4  
> appStartTime:1432092992503
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0010 appPririty:2  
> appStartTime:1432092996437
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> after sort % 
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0010 appPririty:2  
> appStartTime:1432092996437
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0009 appPririty:4  
> appStartTime:1432092992503 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3689) FifoComparator logic is wrong. In method "compare" in "FifoPolicy.java" file, the "s1" and "s2" should change position when compare priority

2015-08-12 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694771#comment-14694771
 ] 

Ajith S commented on YARN-3689:
---

Hi [~rohithsharma]

Thanks for the input. 

{quote} *Higher Integer* indicates *Higher priority* {quote}

So, i think we should flip the operands in 
org.apache.hadoop.yarn.api.records.Priority.compareTo(Priority)

> FifoComparator logic is wrong. In method "compare" in "FifoPolicy.java" file, 
> the "s1" and "s2" should change position when compare priority 
> -
>
> Key: YARN-3689
> URL: https://issues.apache.org/jira/browse/YARN-3689
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 2.5.0
>Reporter: zhoulinlin
>Assignee: Ajith S
>
> In method "compare" in "FifoPolicy.java" file, the "s1" and "s2" should 
> change position when compare priority.
> I did a test. Configured the schedulerpolicy "fifo",  submitted 2 jobs to the 
> same queue.
> The result is below:
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> before sort --  
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0001 appPririty:4  
> appStartTime:1432094170038
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0002 appPririty:2  
> appStartTime:1432094173131
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> after sort % 
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0001 appPririty:4  
> appStartTime:1432094170038  
> 2015-05-20 11:57:41,449 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432094103221_0002 appPririty:2  
> appStartTime:1432094173131  
> But when change the "s1" and "s2" position like below:
> public int compare(Schedulable s1, Schedulable s2) {
>   int res = s2.getPriority().compareTo(s1.getPriority());
> .}
> The result:
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> before sort -- 
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0009 appPririty:4  
> appStartTime:1432092992503
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0010 appPririty:2  
> appStartTime:1432092996437
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> after sort % 
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0010 appPririty:2  
> appStartTime:1432092996437
> 2015-05-20 11:36:37,119 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: 
> appName:application_1432090734333_0009 appPririty:4  
> appStartTime:1432092992503 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >