[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-07 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042057#comment-16042057
 ] 

Hong Zhiguo commented on YARN-4024:
---

[~maobaolong], this depends on the probability that nodes getting new IP 
address without shutting down or NM restart.
If you are sure it's zero, then you can assign it a very big value. Actually 
this is the situation of our clusters.

> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> --
>
> Key: YARN-4024
> URL: https://issues.apache.org/jira/browse/YARN-4024
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Hong Zhiguo
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, 
> YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
> YARN-4024-v6.patch, YARN-4024-v7.patch
>
>
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-06 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040075#comment-16040075
 ] 

Hong Zhiguo edited comment on YARN-4024 at 6/7/17 3:23 AM:
---

[~maobaolong], we don't turn on log-aggregation to avoid the pressure to 
network and HDFS.

For the code you questioned, when the node status is changed, we MUST invalid 
corresponding cache item which is learned when the node is in another state, 
because the IP address may be changed besides status changing.

I think it's OK to add a break statement in "default" case, and it's also OK to 
invalidate the cache item when we get an wrong event.


was (Author: zhiguohong):
[~maobaolong], we don't turn on log-aggregation to avoid the pressure to 
network and HDFS.

> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> --
>
> Key: YARN-4024
> URL: https://issues.apache.org/jira/browse/YARN-4024
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Hong Zhiguo
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, 
> YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
> YARN-4024-v6.patch, YARN-4024-v7.patch
>
>
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-06 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040075#comment-16040075
 ] 

Hong Zhiguo commented on YARN-4024:
---

[~maobaolong], we don't turn on log-aggregation to avoid the pressure to 
network and HDFS.

> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> --
>
> Key: YARN-4024
> URL: https://issues.apache.org/jira/browse/YARN-4024
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Hong Zhiguo
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, 
> YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
> YARN-4024-v6.patch, YARN-4024-v7.patch
>
>
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929330#comment-15929330
 ] 

Hong Zhiguo commented on YARN-6319:
---

[~haibochen], the post-callback will not linearize container cleanup. They are 
still parellel.

> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-15 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927351#comment-15927351
 ] 

Hong Zhiguo commented on YARN-6319:
---

[~haibochen], the CONTAINER_RESOURCES_CLEANEDUP event should be sent by every 
container, not only the last one. No need for special logic to choose the last 
one.
The post-callback could be called without checking deletion success or failure. 
 Just make sure it's sent **after** the "container cleanup" to avoid race 
condition. This will not decrease the robustness.


> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925421#comment-15925421
 ] 

Hong Zhiguo commented on YARN-6319:
---

[~haibochen], thanks for your comments. I found solution 1 may need be done 
with each ContainerExecutor. Solution 2 is general.


> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890
 ] 

Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:50 AM:


One "locking" solution:
Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP 
event is only sent by that callback.

Comments please.


was (Author: zhiguohong):
One "locking" solution:
Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP 
event is sent by that callback.

Comments please.

> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890
 ] 

Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:49 AM:


One "locking" solution:
Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP 
event is sent by that callback.

Comments please.


was (Author: zhiguohong):
One "serialize" solution:
Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP 
event is sent by that callback.

Comments please.

> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890
 ] 

Hong Zhiguo commented on YARN-6319:
---

One "serialize" solution:
Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP 
event is sent by that callback.

Comments please.

> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923881#comment-15923881
 ] 

Hong Zhiguo commented on YARN-6319:
---

The "app dir cleanup" is triggered in 
ApplicationImpl.AppFinishTriggeredTransition or 
ApplicationImpl.AppFinishTransition. There's pre-condition that all container 
dirs are cleaned up.
{code}
if (app.containers.isEmpty()) {
// No container to cleanup. Cleanup app level resources.
app.handleAppFinishWithContainersCleanedup();
return ApplicationState.APPLICATION_RESOURCES_CLEANINGUP;
}
{code}

But this doesn't work. Because in 
ResourceLocalizationService.handleCleanupContainerResources, only async 
"container dir cleanup" is triggered, and then CONTAINER_RESOURCES_CLEANEDUP 
event is sent out, which then leads to ApplicationImpl.AppFinishTransition and 
ApplicationImpl.containers.Remove(...). So ApplicationImpl.containers could be 
empty but  "container dir cleanup"  is still in fly.



> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-13 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-6319:
--
Description: 
Last container (on one node) of one app complete
|--> triggers async deletion of container dir (container cleanup)
|--> triggers async deletion of app dir (app cleanup)

For LCE, deletion is done by container-executor. The "app cleanup" lists 
sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
deleted by "container cleanup" between step 1 and step2, it'll report below 
error and breaks the deletion.
{code}
ContainerExecutor: Couldn't delete file 
$LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
 - No such file or directory
{code}

This app dir then escape the cleanup. And that's why we always have many app 
dirs left there.

solution 1: just ignore the error without breaking in 
container-executor.c::delete_path()
solution 2: use a lock to serialize the cleanup of same app dir.
solution 3: backoff and retry on error

Comments are welcome.



  was:
Last container (on one node) of one app complete
|--> triggers async deletion of container dir (container cleanup)
|--> triggers async deletion of app dir (app cleanup)

For LCE, deletion is done by container-executor. The "app cleanup" lists 
sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
deleted by "container cleanup" between step 1 and step2, it'll report below 
error and breaks the deletion.
{code}
ContainerExecutor: Couldn't delete file 
$LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
 - No such file or directory
{code}

This app dir then escape the cleanup. And that's why we always have many app 
dirs left there.

solution 1: just ignore the error without breaking in 
container-executor.c::delete_path()
solution 2: use a lock to serialize the cleanup of same app dir.
solution 3: backoff and retry on error

Suggestions are welcome.




> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Comments are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-12 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906816#comment-15906816
 ] 

Hong Zhiguo commented on YARN-6319:
---

The race condition could be reproduced by below script:
{code}
USER=xxx
GRP=yyy
CE=/PATH/TO/container-executor
SIZE=200

mkdir app
mkdir -p app/container
dd if=/dev/zero of=app/container/a count=$SIZE bs=1M
dd if=/dev/zero of=app/container/b count=$SIZE bs=1M
dd if=/dev/zero of=app/container/c count=$SIZE bs=1M
dd if=/dev/zero of=app/container/d count=$SIZE bs=1M
dd if=/dev/zero of=app/container/e count=$SIZE bs=1M
chown $USER:$GRP -R app/

$CE $USER 3 ./app/container &
$CE $USER 3 ./app
{code}


> race condition between deleting app dir and deleting container dir
> --
>
> Key: YARN-6319
> URL: https://issues.apache.org/jira/browse/YARN-6319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>
> Last container (on one node) of one app complete
> |--> triggers async deletion of container dir (container cleanup)
> |--> triggers async deletion of app dir (app cleanup)
> For LCE, deletion is done by container-executor. The "app cleanup" lists 
> sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
> deleted by "container cleanup" between step 1 and step2, it'll report below 
> error and breaks the deletion.
> {code}
> ContainerExecutor: Couldn't delete file 
> $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
>  - No such file or directory
> {code}
> This app dir then escape the cleanup. And that's why we always have many app 
> dirs left there.
> solution 1: just ignore the error without breaking in 
> container-executor.c::delete_path()
> solution 2: use a lock to serialize the cleanup of same app dir.
> solution 3: backoff and retry on error
> Suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-10 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-6319:
-

 Summary: race condition between deleting app dir and deleting 
container dir
 Key: YARN-6319
 URL: https://issues.apache.org/jira/browse/YARN-6319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo


Last container (on one node) of one app complete
|--> triggers async deletion of container dir (container cleanup)
|--> triggers async deletion of app dir (app cleanup)

For LCE, deletion is done by container-executor. The "app cleanup" lists 
sub-dir (step 1), and then unlink items one by one(step 2).   If a file is 
deleted by "container cleanup" between step 1 and step2, it'll report below 
error and breaks the deletion.
{code}
ContainerExecutor: Couldn't delete file 
$LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE
 - No such file or directory
{code}

This app dir then escape the cleanup. And that's why we always have many app 
dirs left there.

solution 1: just ignore the error without breaking in 
container-executor.c::delete_path()
solution 2: use a lock to serialize the cleanup of same app dir.
solution 3: backoff and retry on error

Suggestions are welcome.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2016-04-14 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242280#comment-15242280
 ] 

Hong Zhiguo commented on YARN-2306:
---

The patch is available, do you have any comments?

> leak of reservation metrics (fair scheduler)
> 
>
> Key: YARN-2306
> URL: https://issues.apache.org/jira/browse/YARN-2306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch
>
>
> This only applies to fair scheduler. Capacity scheduler is OK.
> When appAttempt or node is removed, the metrics for 
> reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
> back.
> These are important metrics for administrator. The wrong metrics confuses may 
> confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-29 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217334#comment-15217334
 ] 

Hong Zhiguo commented on YARN-4002:
---

Including return statement into readlock critical seciton doesn't make 
difference except longer critical section and worse performance.
Since hostsList and excludeList of hostReader is updated by reference 
assignment, no race condition would exist even the lookup is not protected by 
readlock.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock-v2.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-20 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Attachment: YARN-4002-rwlock-v2.patch

Uploaded YARN-4002-rwlock-v2.patch for an improvement: make the read side 
critical section smaller.
{code}
  this.hostsReadLock.lock();
try {
  hostsList = hostsReader.getHosts();
  excludeList = hostsReader.getExcludedHosts();
} finally {
  this.hostsReadLock.unlock();
}
{code}

As explained by [~rohithsharma], this prevents mixing up old value of 
hostsReader.getHosts() and new value of hostsReader.getExcludedHosts(). And 
this is the only reason someone may prefer rwlock solution than lockless one.

If the mixing up is not thought (for example, by meself) a problem, lockless 
solution is good engouth.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock-v2.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174795#comment-15174795
 ] 

Hong Zhiguo commented on YARN-4002:
---

Hi, [~rohithsharma], thanks for the refinement.
But why don't take the lockless version? 

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Attachment: YARN-4002-rwlock.patch
YARN-4002-lockless-read.patch

2 patch for the 2 proposed solutions submitted.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-4002-lockless-read.patch, YARN-4002-rwlock.patch, 
> YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039597#comment-15039597
 ] 

Hong Zhiguo commented on YARN-4002:
---

I'm working on it. I've proposed 2 different solutions and waiting for specific 
comments.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4181:
--
Description: 
In some cases, a node goes problematic and most launching containers fail on 
this node, as well as the launching AM containers.
Then this node has more available resource than other nodes in the cluster. The 
Application whose AM is failing has zero minShareRatio. With fair scheduler, 
this node is always rated first, and the misfortune Application is also likely 
rated first. The result is:  attempts of the this application are failing again 
and again on the same node.

We should avoid such a deadlock situation.

Solution 1: NM could detect the failure rate of containers. If the rate is 
high, the NM marks itself to unhealthy for a period. But we should be careful 
not to turn all nodes into unhealthy by a buggy Application. Maybe use failure 
rate of containers for different Applications.

Solution 2: To have Application level blacklist by AMLauncher, in addition to 
existing blacklist by AM.

  was:
In some cases, a node goes problematic and most launching containers fail on 
this node, as well as the launching AM containers.
Then this node has more available resource than other nodes in the cluster. The 
Application whose AM is failing has zero minShareRatio. With fair scheduler, 
this node is always rated first, and the misfortune Application is also likely 
rated first. The result is:  attempts of the this application are failing again 
and again on the same node.

Solution 1: NM could detect the failure rate of containers. If the rate is 
high, the NM marks itself to unhealthy for a period. But we should be careful 
not to turn all nodes into unhealthy by a buggy Application. Maybe use failure 
rate of containers for different Applications.

Solution 2: To have Application level blacklist by AMLauncher, in addition to 
existing blacklist by AM.


> node blacklist for AM launching
> ---
>
> Key: YARN-4181
> URL: https://issues.apache.org/jira/browse/YARN-4181
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> In some cases, a node goes problematic and most launching containers fail on 
> this node, as well as the launching AM containers.
> Then this node has more available resource than other nodes in the cluster. 
> The Application whose AM is failing has zero minShareRatio. With fair 
> scheduler, this node is always rated first, and the misfortune Application is 
> also likely rated first. The result is:  attempts of the this application are 
> failing again and again on the same node.
> We should avoid such a deadlock situation.
> Solution 1: NM could detect the failure rate of containers. If the rate is 
> high, the NM marks itself to unhealthy for a period. But we should be careful 
> not to turn all nodes into unhealthy by a buggy Application. Maybe use 
> failure rate of containers for different Applications.
> Solution 2: To have Application level blacklist by AMLauncher, in addition to 
> existing blacklist by AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4181:
-

 Summary: node blacklist for AM launching
 Key: YARN-4181
 URL: https://issues.apache.org/jira/browse/YARN-4181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


In some cases, a node goes problematic and most launching containers fail on 
this node, as well as the launching AM containers.
Then this node has more available resource than other nodes in the cluster. The 
Application whose AM is failing has zero minShareRatio. With fair scheduler, 
this node is always rated first, and the misfortune Application is also likely 
rated first. The result is:  attempts of the this application are failing again 
and again on the same node.

Solution 1: NM could detect the failure rate of containers. If the rate is 
high, the NM marks itself to unhealthy for a period. But we should be careful 
not to turn all nodes into unhealthy by a buggy Application. Maybe use failure 
rate of containers for different Applications.

Solution 2: To have Application level blacklist by AMLauncher, in addition to 
existing blacklist by AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4104:
--
Description: 
We have more than 1 thousand queues and several hundreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the children are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result 
self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be 
merged into the mainline.

  was:
We have more than 1 thousand queues and several handreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's realy hard to answer such questions.

So we added an diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the chidren are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result 
self-explains to the questions.
I feel it's really usefull for multi-tenant clusters, and hope it could be 
merged into the mainline.


> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4104:
-

 Summary: dryrun of schedule for diagnostic and tenant's complain
 Key: YARN-4104
 URL: https://issues.apache.org/jira/browse/YARN-4104
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


We have more than 1 thousand queues and several handreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's realy hard to answer such questions.

So we added an diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the chidren are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result 
self-explains to the questions.
I feel it's really usefull for multi-tenant clusters, and hope it could be 
merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4104:
--
Description: 
We have more than 1 thousand queues and several hundreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the children are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be 
merged into the mainline.

  was:
We have more than 1 thousand queues and several hundreds of tenants in a busy 
cluster. We get a lot of complains/questions from owner/operator of queues 
about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint 
"/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
list of it's children according to it's SchedulingPolicy.getComparator().  All 
scheduling parameters of the children are also displayed, such as minShare, 
usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result 
self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be 
merged into the mainline.


> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726780#comment-14726780
 ] 

Hong Zhiguo commented on YARN-4104:
---

For better human readability, it's plain text.
{code}
0001 root.g_isd_999 min(6143,1) max(12288,4) 
dem(12288,4) use(52194816,15652) weight=1
0002 root.g_ieg_ttlz_ttlz_import_tdbank min(61439,19) max(1228800,400) 
dem(13056,6) use(1536,1) weight=800
0003 root.g_ieg_wegalaxy_wegalaxy_import_tdbank min(61439,19) max(1228800,400) 
dem(13056,6) use(1536,1) weight=800
0004 root.safety_cloud min(18432000,6000) max(18432000,6000) dem(18432000,6000) 
use(10585088,5169) weight=6000
0005 root.g_teg_datacompress min(6144000,2000) max(12288000,4000) dem(52224,27) 
use(32256,18) weight=2400
0006 root.g_input_output_hlw min(368639,119) max(1474560,480) dem(20480,19) 
use(13312,12) weight=800
0007 root.g_ecc_express_ecc_express min(1474559,479) max(5898240,1920) 
dem(9472,4) use(6272,3) weight=832
0008 root.g_iegv2_datacompress min(6144000,2000) max(12288000,4000) 
dem(46080,24) use(34560,19) weight=2400
0009 root.g_raid_datacompress min(6144000,2000) max(12288000,4000) 
dem(65280,35) use(52992,29) weight=2400
0010 root.g_input_output_ieg_tdbank min(2764799,899) max(11059200,3600) 
dem(177408,90) use(145152,76) weight=1210
0011 root.g_ieg_iegpdata_idata_subject_analysis min(1228799,459) 
max(9830400,3680) dem(1372928,601) use(1022720,449) weight=814
...
{code}

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726697#comment-14726697
 ] 

Hong Zhiguo commented on YARN-4104:
---

It only works for fair scheduler at this moment because we use fair scheduler 
only. But it would be easy to support other schedulers.
Can I make several 3rd level of subtasks of this one?

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-29 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721144#comment-14721144
 ] 

Hong Zhiguo commented on YARN-4024:
---

Why jenkins doesn't run against the latest patch?

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
 YARN-4024-v6.patch, YARN-4024-v7.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-23 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-v7.patch

Thanks for your comments, [~adhoot], updated the patch.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
 YARN-4024-v6.patch, YARN-4024-v7.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-20 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-v6.patch

the findbugs warning is about unchecked rawtypes in AMLivelinessMonitor.java. 
I fixed it in the v6 patch. 

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
 YARN-4024-v6.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-19 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-v5.patch

Thanks for your comments, [~sunilg] and [~leftnoteasy]. I updated patch v5 
accordingly.
Interface Resolver is left as public with @VisibleForTesting, because it's 
accessed from test cases.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-18 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-draft-v3.patch

YARN-4024-draft-v3.patch: fix the checkstyle warning and testcase failure

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-18 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-v4.patch

Thanks for your comments, [~leftnoteasy]. I didn't notice there's already such 
events. I updated the patch accordingly.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699285#comment-14699285
 ] 

Hong Zhiguo commented on YARN-4024:
---

In this patch, both positive and negative lookup result is cached and has the 
same expiry interval.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-draft.patch

Add an configuration option 
yarn.resourcemanager.node-ip-cache.expiry-interval-secs, while -1 disables 
caching.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-draft-v2.patch

updated the patch with flushing when node state is transiting between USABLE 
and UNUSABLE.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698927#comment-14698927
 ] 

Hong Zhiguo commented on YARN-4024:
---

That's a good reason to have this cache.
[~leftnoteasy],  in earlier comments, you said
{code}
1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a 
to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is 
on whitelist.
2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve 
every time when the node doing heartbeat since it may change to its IP to a one 
not on the blacklist.
{code}
I think that's too complicated. The cache lookup is a part of resolving (name 
to address). And the check of IP whitelist/blacklist is just the following 
stage. I think cache with configurable expiration is enough, we'd better leave 
the 2 stages orthogonal, not to mix them up.

BTW, I think it's not good to have Name in NodeId, but Address in 
whitelist/blacklist. Different layers of abstraction are mixed up.  We'll don't 
have this issue if Name or Address is used for both NodeId and 
whitelist/blacklist.
a better way is to have Name in whitelist/blacklist, instead of Address. 



 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698929#comment-14698929
 ] 

Hong Zhiguo commented on YARN-4024:
---

Please ignore the last sentence a better way is to have Name in 
whitelist/blacklist, instead of Address. Or could someone help to delete it.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-13 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695011#comment-14695011
 ] 

Hong Zhiguo commented on YARN-4024:
---

There's DNS cache in InetAddress. What's the benefit to have another layer of 
cache in memory?  Maybe easier to control?

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Hong Zhiguo

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-09 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679444#comment-14679444
 ] 

Hong Zhiguo commented on YARN-4024:
---

We've did this one year ago in our 5k+ cluster. Can I take this issue?

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4018:
-

 Summary: correct docker image name is rejected by 
DockerContainerExecutor
 Key: YARN-4018
 URL: https://issues.apache.org/jira/browse/YARN-4018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo


For example:
www.dockerbase.net/library/mongo
www.dockerbase.net:5000/library/mongo:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4018:
--
Description: 
For example:
www.dockerbase.net/library/mongo
www.dockerbase.net:5000/library/mongo:latest
leads to error:
Image: www.dockerbase.net/library/mongo is not a proper docker image
Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker image

  was:
For example:
www.dockerbase.net/library/mongo
www.dockerbase.net:5000/library/mongo:latest


 correct docker image name is rejected by DockerContainerExecutor
 

 Key: YARN-4018
 URL: https://issues.apache.org/jira/browse/YARN-4018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo

 For example:
 www.dockerbase.net/library/mongo
 www.dockerbase.net:5000/library/mongo:latest
 leads to error:
 Image: www.dockerbase.net/library/mongo is not a proper docker image
 Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker 
 image



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4018:
--
Attachment: YARN-4018.patch

 correct docker image name is rejected by DockerContainerExecutor
 

 Key: YARN-4018
 URL: https://issues.apache.org/jira/browse/YARN-4018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-4018.patch


 For example:
 www.dockerbase.net/library/mongo
 www.dockerbase.net:5000/library/mongo:latest
 leads to error:
 Image: www.dockerbase.net/library/mongo is not a proper docker image
 Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker 
 image



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4016) docker container is still running when app is killed

2015-08-04 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4016:
-

 Summary: docker container is still running when app is killed
 Key: YARN-4016
 URL: https://issues.apache.org/jira/browse/YARN-4016
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo


The docker_container_executor_session.sh is generated like below:
{code}
### get the pid of docker container by docker inspect
echo `/usr/bin/docker inspect --format {{.State.Pid}} 
container_1438681002528_0001_01_02`  
.../container_1438681002528_0001_01_02.pid.tmp

### rename *.pid.tmp to *.pid
/bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp 
.../container_1438681002528_0001_01_02.pid

### launch the docker container
/usr/bin/docker run  --rm  --net=host --name 
container_1438681002528_0001_01_02 -v ... library/mysql 
/container_1438681002528_0001_01_02/launch_container.sh 
{code}

This is obviously wrong because you can not get the pid of a docker container 
before starting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3965:
--
Attachment: YARN-3965-3.patch

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-30 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647709#comment-14647709
 ] 

Hong Zhiguo commented on YARN-3965:
---

made it private with Getter.
Hi, [~zxu], [~jlowe], could please review the patch?

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3965) Add startup timestamp to nodemanager UI

2015-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3965:
--
Attachment: YARN-3965-4.patch

 Add startup timestamp to nodemanager UI
 ---

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, 
 YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4001) normalizeHostName takes too much of execution time

2015-07-30 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4001:
-

 Summary: normalizeHostName takes too much of execution time
 Key: YARN-4001
 URL: https://issues.apache.org/jira/browse/YARN-4001
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


For each NodeHeartbeatRequest, NetUtils.normalizeHostName is called under a 
lock.  I did profiling for a very large cluster and found out 
NetUtils.normalizeHostName takes most execution time of 
ResourceTrackerService.nodeHeartbeat(...).
We'd better have an option to use raw IP (plus port) as the Node identity to 
scale for large clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4002:
-

 Summary: make ResourceTrackerService.nodeHeartbeat more concurrent
 Key: YARN-4002
 URL: https://issues.apache.org/jira/browse/YARN-4002
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical


We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG log in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to have alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI

2015-07-30 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648666#comment-14648666
 ] 

Hong Zhiguo commented on YARN-3965:
---

Hi, [~jlowe], version 4 of the patch is uploaded with 2 changes:
1) NodeInfo.getNmStartupTime  -  NodeInfo.getNMStartupTime
2) removed the final qualifier on NodeManager.nmStartupTime to avoid 
checkstyle error:
   {code}
   Name 'nmStartupTime' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'
   {code}
   It's private with Getter. So it's OK not to be final.

 Add startup timestamp to nodemanager UI
 ---

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, 
 YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Description: 
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG lock in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to have alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.

  was:
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG log in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to have alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.


 make ResourceTrackerService.nodeHeartbeat more concurrent
 -

 Key: YARN-4002
 URL: https://issues.apache.org/jira/browse/YARN-4002
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
 design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
 enough to scale for large clusters.
 But we have a BIG lock in NodesListManager.isValidNode which I think it's 
 unnecessary.
 First, the fields includes and excludes of HostsFileReader are only 
 updated on refresh nodes.  All RPC threads handling node heartbeats are 
 only readers.  So RWLock could be used to have alow concurrently access by 
 RPC threads.
 Second, since he fields includes and excludes of HostsFileReader are 
 always updated by reference assignment, which is atomic in Java, the reader 
 side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Description: 
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG lock in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to  alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.

  was:
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG lock in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to have alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.


 make ResourceTrackerService.nodeHeartbeat more concurrent
 -

 Key: YARN-4002
 URL: https://issues.apache.org/jira/browse/YARN-4002
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
 design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
 enough to scale for large clusters.
 But we have a BIG lock in NodesListManager.isValidNode which I think it's 
 unnecessary.
 First, the fields includes and excludes of HostsFileReader are only 
 updated on refresh nodes.  All RPC threads handling node heartbeats are 
 only readers.  So RWLock could be used to  alow concurrently access by RPC 
 threads.
 Second, since he fields includes and excludes of HostsFileReader are 
 always updated by reference assignment, which is atomic in Java, the reader 
 side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Description: 
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG lock in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to  alow concurrent access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.

  was:
We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design 
the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to 
scale for large clusters.
But we have a BIG lock in NodesListManager.isValidNode which I think it's 
unnecessary.
First, the fields includes and excludes of HostsFileReader are only updated 
on refresh nodes.  All RPC threads handling node heartbeats are only readers. 
 So RWLock could be used to  alow concurrently access by RPC threads.
Second, since he fields includes and excludes of HostsFileReader are always 
updated by reference assignment, which is atomic in Java, the reader side 
lock could just be skipped.


 make ResourceTrackerService.nodeHeartbeat more concurrent
 -

 Key: YARN-4002
 URL: https://issues.apache.org/jira/browse/YARN-4002
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
 design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
 enough to scale for large clusters.
 But we have a BIG lock in NodesListManager.isValidNode which I think it's 
 unnecessary.
 First, the fields includes and excludes of HostsFileReader are only 
 updated on refresh nodes.  All RPC threads handling node heartbeats are 
 only readers.  So RWLock could be used to  alow concurrent access by RPC 
 threads.
 Second, since he fields includes and excludes of HostsFileReader are 
 always updated by reference assignment, which is atomic in Java, the reader 
 side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-25 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641473#comment-14641473
 ] 

Hong Zhiguo commented on YARN-3965:
---

Hi, [~zxu], thanks for your comments.  Here comes my re-consideration.

1. The nmStartupTime could be non-statice field of NodeManager, but it make it 
harder to access it since the accesser must have a reference to the NodeManager 
instance.  For example, there's no such reference in current implementaion of 
NodeInfo constructor.  One option is to make nmStartupTime as a non-static 
filed of NMContext.  But I doubt is it worth to make simple thing complecated.  
BTW, the startup timestampt of ResourceManager is also static.

2. It's final so don't need warry about that. Private field with a Getter is 
also OK if you think it's better.

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640195#comment-14640195
 ] 

Hong Zhiguo commented on YARN-3965:
---

The polling doesn't need to happen frequently. Only when the operator NM 
upgrade or NM configuration change.

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3965:
--
Attachment: YARN-3965.patch

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3965:
--
Attachment: YARN-3965-2.patch

The first patch breaks TestNMWebServices.verifyNodeInfo. Corrected in this one.

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2015-07-23 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638610#comment-14638610
 ] 

Hong Zhiguo commented on YARN-2545:
---

RMAppEventType#ATTEMPT_FAILED is not suitable because it leads to check of 
maxAppAttempt. 
Here AM unregistered with getFinalApplicationStatus()==FAILED,  the RMApp 
should transit to FAILED without check of maxAppAttempt

In current implementation of RMAppImpl, targetedFinalState of 
FinalSavingTransition is statically determined by (preState, eventType). A 
simple solution is to replace ATTEMPT_UNREGISTERED event with 2 types of event: 
ATTEMPT_UNREGISTERED_SUCC and ATTEMPT_UNREGISTERED_FAIL.

Any suggestion?

 RMApp should transit to FAILED when AM calls finishApplicationMaster with 
 FAILED
 

 Key: YARN-2545
 URL: https://issues.apache.org/jira/browse/YARN-2545
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, 
 and then exits, the corresponding RMApp and RMAppAttempt transit to state 
 FINISHED.
 I think this is wrong and confusing. On RM WebUI, this application is 
 displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps 
 Completed, not as Apps Failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3965) Add starup timestamp for nodemanager

2015-07-23 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-3965:
-

 Summary: Add starup timestamp for nodemanager
 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Priority: Minor


We have startup timestamp for RM already, but don't for NM.
Sometimes cluster operator modified configuration of all nodes and kicked off 
command to restart all NMs.  He found out it's hard for him to check whether 
all NMs are restarted.  Actually there's always some NMs didn't restart as he 
expected, which leads to some error later due to inconsistent configuration.

If we have startup timestamp for NM,  the operator could easily fetch it via NM 
webservice and find out which NM didn't restart, and take mannaul action for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3965) Add starup timestamp for nodemanager

2015-07-23 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo reassigned YARN-3965:
-

Assignee: Hong Zhiguo

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630694#comment-14630694
 ] 

Hong Zhiguo commented on YARN-2306:
---

hi, [~rchiang], do you mean running the unit test in patch againt trunk?

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run

2015-07-16 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo resolved YARN-1974.
---
Resolution: Not A Problem

 add args for DistributedShell to specify a set of nodes on which the tasks run
 --

 Key: YARN-1974
 URL: https://issues.apache.org/jira/browse/YARN-1974
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.7.0
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-1974.patch


 It's very useful to execute a script on a specific set of machines for both 
 testing and maintenance purpose.
 The args --nodes and --relax_locality are added to DistributedShell. 
 Together with an unit test using miniCluster.
 It's also tested on our real cluster with Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630742#comment-14630742
 ] 

Hong Zhiguo commented on YARN-2306:
---

Updated the patch. I ran testReservationMetrics several times and no failure 
now.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2015-07-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630688#comment-14630688
 ] 

Hong Zhiguo commented on YARN-2768:
---

[~kasha], could you please review the patch?

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3897) Too many links in NM log dir

2015-07-16 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3897:
--
Description: 
Users need to left container logs more than one day. On some nodes of our busy 
cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, 
which is the defaul limit of ext3 file system. As a result, we got errors when 
initiating containers:
Failed to create directory 
{yarn.nodemanager.log-dirs}/application_1435111082717_1341740 - Too many links

log aggregation is not an option for us because of the heavy pressure on 
namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
acceptable to aggregate so many files to hdfs.

Since ext3 is still widely used, we'd better do something to avoid such error.

  was:
Users need to left container logs more than one day. On some nodes of our busy 
cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, 
which is the defaul limit of ext3 file system. As a result, we got errors when 
initiating containers:
Failed to create directory 
{yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
links

log aggregation is not an option for us because of the heavy pressure on 
namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
acceptable to aggregate so many files to hdfs.

Since ext3 is still widely used, we'd better do something to avoid such error.


 Too many links in NM log dir
 --

 Key: YARN-3897
 URL: https://issues.apache.org/jira/browse/YARN-3897
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 Users need to left container logs more than one day. On some nodes of our 
 busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 
 32000, which is the defaul limit of ext3 file system. As a result, we got 
 errors when initiating containers:
 Failed to create directory 
 {yarn.nodemanager.log-dirs}/application_1435111082717_1341740 - Too many 
 links
 log aggregation is not an option for us because of the heavy pressure on 
 namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
 acceptable to aggregate so many files to hdfs.
 Since ext3 is still widely used, we'd better do something to avoid such error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630741#comment-14630741
 ] 

Hong Zhiguo commented on YARN-2306:
---

I checked the code of tearDown and it shows someone  already did this.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--
Attachment: YARN-2306-3.patch

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--
Attachment: YARN-2306.patch-3

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--
Attachment: (was: YARN-2306.patch-3)

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3897) Too many links in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-3897:
--
Description: 
Users need to left container logs more than one day. On some nodes of our busy 
cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, 
which is the defaul limit of ext3 file system. As a result, we got errors when 
initiating containers:
Failed to create directory 
{yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
links

log aggregation is not an option for us because of the heavy pressure on 
namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
acceptable to aggregate so many files to hdfs.

Since ext3 is still widely used, we'd better do something to avoid such error.

  was:
Users need to left container logs more than one day. On some nodes of our busy 
cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, 
which is the defaul limit of ext3 file system. As a result, we got errors when 
initiating containers:
Failed to create directory 
{yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
links

log aggregation is not an option for us because of the heavy pressure on 
namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
acceptable to aggregate some many files to hdfs.

Since ext3 is still widely used, we'd better do something to avoid such error.


 Too many links in NM log dir
 --

 Key: YARN-3897
 URL: https://issues.apache.org/jira/browse/YARN-3897
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 Users need to left container logs more than one day. On some nodes of our 
 busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 
 32000, which is the defaul limit of ext3 file system. As a result, we got 
 errors when initiating containers:
 Failed to create directory 
 {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
 links
 log aggregation is not an option for us because of the heavy pressure on 
 namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
 acceptable to aggregate so many files to hdfs.
 Since ext3 is still widely used, we'd better do something to avoid such error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3897) Too many links in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618365#comment-14618365
 ] 

Hong Zhiguo commented on YARN-3897:
---

One solution is to have an extra layer of dirs as the parent of appId 
subdirs.  The middle layer of dirs could be named with hash of appId. 
This behaviour shoud be configrable.

 Too many links in NM log dir
 --

 Key: YARN-3897
 URL: https://issues.apache.org/jira/browse/YARN-3897
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 Users need to left container logs more than one day. On some nodes of our 
 busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 
 32000, which is the defaul limit of ext3 file system. As a result, we got 
 errors when initiating containers:
 Failed to create directory 
 {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
 links
 log aggregation is not an option for us because of the heavy pressure on 
 namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
 acceptable to aggregate so many files to hdfs.
 Since ext3 is still widely used, we'd better do something to avoid such error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3897) Too many links in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-3897:
-

 Summary: Too many links in NM log dir
 Key: YARN-3897
 URL: https://issues.apache.org/jira/browse/YARN-3897
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


Users need to left container logs more than one day. On some nodes of our busy 
cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, 
which is the defaul limit of ext3 file system. As a result, we got errors when 
initiating containers:
Failed to create directory 
{yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many 
links

log aggregation is not an option for us because of the heavy pressure on 
namenode. With a cluster of 5K nodes and 20k log files per node, it's not 
acceptable to aggregate some many files to hdfs.

Since ext3 is still widely used, we'd better do something to avoid such error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2015-06-10 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581349#comment-14581349
 ] 

Hong Zhiguo commented on YARN-2768:
---

[~kasha], the excution time displayed in the profiling output is cumulative.
Actually, I repeated such profiling a lot of times and got the same ratio.
The profiling is done with a cluster of NM/AM simulators and I don't have such 
resource now.

I wrote a testcase which creates 8000 nodes, 4500 apps within 1200 queues, and 
then performs 1 rounds of FairScheduler.update(), and print the average 
execution time of one call to update. With this patch, the average execution 
time decreased from about 35ms to 20ms.

I think the effect comes from GC and memory allocation since in each round of 
FairScheduler.update(), Resource.multiply is called as many times as the number 
of pending ResourceRequests, which is more than 3 million in our production 
cluster.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560663#comment-14560663
 ] 

Hong Zhiguo commented on YARN-3678:
---

First, stop container happens frequently.
Second, the pid recycle doesn't need to have a whole round in 250ms.  Only need 
to have one or more rounds during the container lifetime.

If we have 100 times of stop container happen on one node per day, we have 
100/32768, about 0.3% chance for one node one day. That's not very low, 
especially when we have 5000 nodes.


 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560748#comment-14560748
 ] 

Hong Zhiguo commented on YARN-3678:
---

the event sequence:
call SEND SIGTERM  -  pid recycle   -  call SEND SIGKILL  - check 
process live time(based on current time)

The time between [call SEND SIGTERM] and [call SEND SIGKILL] is 250ms
The time between [pid recycle] and [check process live time] may be shorter or 
longer than 250ms. When it's longer than 250ms, there's chance we make false 
positive judgement.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560578#comment-14560578
 ] 

Hong Zhiguo commented on YARN-3678:
---

We met same issue on our production cluster last year.  The same user  is used 
for NM and some app-submitter.
I hacked the kernel __send_signal function via kprobe 
(https://github.com/honkiko/signal-monitor) and confirmed the happening:
 - container-executor sends SIGTERM to a container (say, pid = X)
 - The container exits quickly (in 250ms)
 - pid X is recycled and taken by a new spawned thread of NM
 - after 250ms, container-executor sends SIGKILL to pid X
 - NM is killed

I added checking of living time before container-executor sends SIGKILL. If the 
process has living time shorter than 250ms,  it's not the target process that 
we send SIGTERM to, and just skip it.

With this fix, the accident rate is reduced from several times per day to 
nearly zero.
If you think such fix is acceptable, I'll post it here.

 DelayedProcessKiller may kill other process other than container
 

 Key: YARN-3678
 URL: https://issues.apache.org/jira/browse/YARN-3678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: gu-chi
Priority: Critical

 Suppose one container finished, then it will do clean up, the PID file still 
 exist and will trigger once singalContainer, this will kill the process with 
 the pid in PID file, but as container already finished, so this PID may be 
 occupied by other process, this may cause serious issue.
 As I know, my NM was killed unexpectedly, what I described can be the cause. 
 Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-04-07 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482891#comment-14482891
 ] 

Hong Zhiguo commented on YARN-3102:
---

I met the same problem. Hi,  [~Naganarasimha], can I take this issue?

 Decommisioned Nodes not listed in Web UI
 

 Key: YARN-3102
 URL: https://issues.apache.org/jira/browse/YARN-3102
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: 2 Node Manager and 1 Resource Manager 
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor

 Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
 yarn.exlude file In RM1 machine
 Add Yarn.exclude with NM1 Host Name 
 Start the node as listed below NM1,NM2 Resource manager
 Now check Nodes decommisioned in /cluster/nodes
 Number of decommisioned node is listed as 1 but Table is empty in 
 /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2768:
-

 Summary: optimize FSAppAttempt.updateDemand by avoid clone of 
Resource which takes 85% of computing time of update thread
 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: profiling_FairScheduler_update.png

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Description: 
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.multiply(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(clone(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.

  was:
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.


 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: YARN-2768.patch

Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo.
After this optimization, the average time costed by FairScheduler.update (a 
TestCase with 10k apps) is reduced 40%.

I'm not sure whether it's better to have such test cases also submitted.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-10-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154946#comment-14154946
 ] 

Hong Zhiguo commented on YARN-2545:
---

How about the state of appAttempt? should it finally be FAILED instead of 
FINISHED?

 RMApp should transit to FAILED when AM calls finishApplicationMaster with 
 FAILED
 

 Key: YARN-2545
 URL: https://issues.apache.org/jira/browse/YARN-2545
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, 
 and then exits, the corresponding RMApp and RMAppAttempt transit to state 
 FINISHED.
 I think this is wrong and confusing. On RM WebUI, this application is 
 displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps 
 Completed, not as Apps Failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-09-29 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152804#comment-14152804
 ] 

Hong Zhiguo commented on YARN-2545:
---

[~leftnoteasy], [~jianhe], [~ozawa], please have a look, should we set state of 
app/appAttempt to FAILED instead of FINISHED, or just count it as Apps Failed 
instead of Apps Completed?

 RMApp should transit to FAILED when AM calls finishApplicationMaster with 
 FAILED
 

 Key: YARN-2545
 URL: https://issues.apache.org/jira/browse/YARN-2545
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, 
 and then exits, the corresponding RMApp and RMAppAttempt transit to state 
 FINISHED.
 I think this is wrong and confusing. On RM WebUI, this application is 
 displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps 
 Completed, not as Apps Failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-09-12 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2545:
-

 Summary: RMApp should transit to FAILED when AM calls 
finishApplicationMaster with FAILED
 Key: YARN-2545
 URL: https://issues.apache.org/jira/browse/YARN-2545
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, 
and then exits, the corresponding RMApp and RMAppAttempt transit to state 
FINISHED.

I think this is wrong and confusing. On RM WebUI, this application is displayed 
as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, 
not as Apps Failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-09-02 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--
Attachment: YARN-2306-2.patch

updated the patch with only the new unit test, since it seems this bug is fixed 
already in trunk.
The unit test succeeded.
I suggest to include this unit test to have regression for this bug.

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306-2.patch, YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1801) NPE in public localizer

2014-08-22 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106555#comment-14106555
 ] 

Hong Zhiguo commented on YARN-1801:
---

I think YARN-1575 already fixed this NPE. We could mark it as duplicated.

 NPE in public localizer
 ---

 Key: YARN-1801
 URL: https://issues.apache.org/jira/browse/YARN-1801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Hong Zhiguo
Priority: Critical
 Attachments: YARN-1801.patch


 While investigating YARN-1800 found this in the NM logs that caused the 
 public localizer to shutdown:
 {noformat}
 2014-01-23 01:26:38,655 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:addResource(651)) - Downloading public 
 rsrc:{ 
 hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar,
  1390440382009, FILE, null }
 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(726)) - Error: Shutting down
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
 2014-01-23 01:26:38,656 INFO  localizer.ResourceLocalizationService 
 (ResourceLocalizationService.java:run(728)) - Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2371:
-

 Summary: Wrong NMToken is issued when NM preserving restart with 
containers running
 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo


When application is submitted with 
ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
true, and NM is restarted with containers running, wrong NMToken is issued to 
AM through RegisterApplicationMasterResponse.
See the NM log:
{code}
2014-07-30 11:59:58,941 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Unauthorized request to start container.-
NMToken for application attempt : appattempt_1406691610864_0002_01 was used 
for starting container with container token issued for application attempt : 
appattempt_1406691610864_0002_02
{code}

The reason is in below code:
{code} 
createAndGetNMToken(String applicationSubmitter,
  ApplicationAttemptId appAttemptId, Container container) {
  ..
  Token token =
  createNMToken(container.getId().getApplicationAttemptId(),
container.getNodeId(), applicationSubmitter);
 ..
}
{code} 
appAttemptId instead of container.getId().getApplicationAttemptId() should 
be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2371:
--

Attachment: YARN-2371.patch

 Wrong NMToken is issued when NM preserving restart with containers running
 --

 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-2371.patch


 When application is submitted with 
 ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
 true, and NM is restarted with containers running, wrong NMToken is issued 
 to AM through RegisterApplicationMasterResponse.
 See the NM log:
 {code}
 2014-07-30 11:59:58,941 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Unauthorized request to start container.-
 NMToken for application attempt : appattempt_1406691610864_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1406691610864_0002_02
 {code}
 The reason is in below code:
 {code} 
 createAndGetNMToken(String applicationSubmitter,
   ApplicationAttemptId appAttemptId, Container container) {
   ..
   Token token =
   createNMToken(container.getId().getApplicationAttemptId(),
 container.getNodeId(), applicationSubmitter);
  ..
 }
 {code} 
 appAttemptId instead of container.getId().getApplicationAttemptId() 
 should be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2371:
--

Summary: Wrong NMToken is issued when NM preserving restarts with 
containers running  (was: Wrong NMToken is issued when NM preserving restart 
with containers running)

 Wrong NMToken is issued when NM preserving restarts with containers running
 ---

 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-2371.patch


 When application is submitted with 
 ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
 true, and NM is restarted with containers running, wrong NMToken is issued 
 to AM through RegisterApplicationMasterResponse.
 See the NM log:
 {code}
 2014-07-30 11:59:58,941 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Unauthorized request to start container.-
 NMToken for application attempt : appattempt_1406691610864_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1406691610864_0002_02
 {code}
 The reason is in below code:
 {code} 
 createAndGetNMToken(String applicationSubmitter,
   ApplicationAttemptId appAttemptId, Container container) {
   ..
   Token token =
   createNMToken(container.getId().getApplicationAttemptId(),
 container.getNodeId(), applicationSubmitter);
  ..
 }
 {code} 
 appAttemptId instead of container.getId().getApplicationAttemptId() 
 should be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2371:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1489

 Wrong NMToken is issued when NM preserving restarts with containers running
 ---

 Key: YARN-2371
 URL: https://issues.apache.org/jira/browse/YARN-2371
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
 Attachments: YARN-2371.patch


 When application is submitted with 
 ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == 
 true, and NM is restarted with containers running, wrong NMToken is issued 
 to AM through RegisterApplicationMasterResponse.
 See the NM log:
 {code}
 2014-07-30 11:59:58,941 ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Unauthorized request to start container.-
 NMToken for application attempt : appattempt_1406691610864_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1406691610864_0002_02
 {code}
 The reason is in below code:
 {code} 
 createAndGetNMToken(String applicationSubmitter,
   ApplicationAttemptId appAttemptId, Container container) {
   ..
   Token token =
   createNMToken(container.getId().getApplicationAttemptId(),
 container.getNodeId(), applicationSubmitter);
  ..
 }
 {code} 
 appAttemptId instead of container.getId().getApplicationAttemptId() 
 should be passed to createNMToken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-20 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2323:
--

Attachment: YARN-2323-2.patch

patch revised according to [~sandyr]'s comments.

 FairShareComparator creates too much Resource object
 

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323-2.patch, YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-19 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2323:
-

 Summary: FairShareComparator creates too much Resource object
 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


Each call of {{FairShareComparator}} creates a new Resource object one:
{code}
Resource one = Resources.createResource(1);
{code}

At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
than 10 million times per second, thus creating more than 10 million object 
one, which is unnecessary.

Since the object one is read-only and is never referenced outside of 
comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-19 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2323:
--

Attachment: YARN-2323.patch

 FairShareComparator creates too much Resource object
 

 Key: YARN-2323
 URL: https://issues.apache.org/jira/browse/YARN-2323
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2323.patch


 Each call of {{FairShareComparator}} creates a new Resource object one:
 {code}
 Resource one = Resources.createResource(1);
 {code}
 At the volume of 1000 nodes and 1000 apps, the comparator will be called more 
 than 10 million times per second, thus creating more than 10 million object 
 one, which is unnecessary.
 Since the object one is read-only and is never referenced outside of 
 comparator, we could make it static.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-07-17 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064711#comment-14064711
 ] 

Hong Zhiguo commented on YARN-2299:
---

or take usage of existing config property 
yarn.scheduler.include-port-in-node-name when differentiating nodes.

 inconsistency at identifying node
 -

 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 If port of yarn.nodemanager.address is not specified at NM, NM will choose 
 random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
 and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
 host:port1 and host:port2 will both be present in Active Nodes on WebUI 
 for a while, and after host:port1 expiration, we get host:port1 in Lost 
 Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead 
 again, we get only host:port1 in Lost Nodes. host:port2 is neither in 
 Active Nodes nor in  Lost Nodes.
 Another case, two NM is running on same host(miniYarnCluster or other test 
 purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
 In both case, sum of Active Nodes and Lost Nodes is not the number of 
 nodes we expected.
 The root cause is due to inconsistency at how we think two Nodes are 
 identical.
 When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
 contains port. Two nodes with same host but different port are thought to be 
 different node.
 But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
 use host. Two nodes with same host but different port are thought to 
 identical.
 To fix the inconsistency, we should differentiate below 2 cases and be 
 consistent for both of them:
  - intentionally multiple NMs per host
  - NM instances one after another on same host
 Two possible solutions:
 1) Introduce a boolean config like one-node-per-host(default as true), 
 and use host to differentiate nodes on RM if it's true.
 2) Make it mandatory to have valid port in yarn.nodemanager.address config. 
  In this sutiation, NM instances one after another on same host will have 
 same NodeId, while intentionally multiple NMs per host will have different 
 NodeId.
 Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo reassigned YARN-2305:
-

Assignee: Hong Zhiguo

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo

 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064732#comment-14064732
 ] 

Hong Zhiguo commented on YARN-2305:
---

Are you using fair scheduler? If yes, then I thinks it's the same reason of 
YARN-2306.

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina

 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-07-17 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--

Summary: leak of reservation metrics (fair scheduler)  (was: leak of 
reservation metrics)

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor

 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064738#comment-14064738
 ] 

Hong Zhiguo commented on YARN-2305:
---

OK

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Hong Zhiguo
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-07-17 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2306:
--

Attachment: YARN-2306.patch

 leak of reservation metrics (fair scheduler)
 

 Key: YARN-2306
 URL: https://issues.apache.org/jira/browse/YARN-2306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2306.patch


 This only applies to fair scheduler. Capacity scheduler is OK.
 When appAttempt or node is removed, the metrics for 
 reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
 back.
 These are important metrics for administrator. The wrong metrics confuses may 
 confuse them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2299) inconsistency at identifying node

2014-07-15 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2299:
-

 Summary: inconsistency at identifying node
 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical


If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

We should differentiate 2 cases: 
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2299) inconsistency at identifying node

2014-07-15 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2299:
--

Description: 
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

To fix the inconsistency, we should differentiate below 2 cases and support 
both of them:
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.


  was:
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

We should differentiate 2 cases: 
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.



 inconsistency at identifying node
 -

 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 If port of yarn.nodemanager.address is not specified at NM, NM will choose 
 random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
 and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
 host:port1 and host:port2 will both be present in Active Nodes on WebUI 
 for a while, and after host:port1 expiration, we get host:port1 in Lost 
 Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead 
 again, we get only host:port1 in Lost Nodes. host:port2 is neither in 
 Active Nodes nor in  Lost Nodes.
 Another case, two NM is running on same host(miniYarnCluster or other test 
 purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
 In both case, sum of Active Nodes and Lost Nodes is not the number of 
 nodes we expected.
 The root cause is due 

  1   2   >