[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042057#comment-16042057 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], this depends on the probability that nodes getting new IP address without shutting down or NM restart. If you are sure it's zero, then you can assign it a very big value. Actually this is the situation of our clusters. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040075#comment-16040075 ] Hong Zhiguo edited comment on YARN-4024 at 6/7/17 3:23 AM: --- [~maobaolong], we don't turn on log-aggregation to avoid the pressure to network and HDFS. For the code you questioned, when the node status is changed, we MUST invalid corresponding cache item which is learned when the node is in another state, because the IP address may be changed besides status changing. I think it's OK to add a break statement in "default" case, and it's also OK to invalidate the cache item when we get an wrong event. was (Author: zhiguohong): [~maobaolong], we don't turn on log-aggregation to avoid the pressure to network and HDFS. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040075#comment-16040075 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], we don't turn on log-aggregation to avoid the pressure to network and HDFS. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929330#comment-15929330 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], the post-callback will not linearize container cleanup. They are still parellel. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927351#comment-15927351 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], the CONTAINER_RESOURCES_CLEANEDUP event should be sent by every container, not only the last one. No need for special logic to choose the last one. The post-callback could be called without checking deletion success or failure. Just make sure it's sent **after** the "container cleanup" to avoid race condition. This will not decrease the robustness. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925421#comment-15925421 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], thanks for your comments. I found solution 1 may need be done with each ContainerExecutor. Solution 2 is general. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890 ] Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:50 AM: One "locking" solution: Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP event is only sent by that callback. Comments please. was (Author: zhiguohong): One "locking" solution: Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP event is sent by that callback. Comments please. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890 ] Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:49 AM: One "locking" solution: Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP event is sent by that callback. Comments please. was (Author: zhiguohong): One "serialize" solution: Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP event is sent by that callback. Comments please. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923890#comment-15923890 ] Hong Zhiguo commented on YARN-6319: --- One "serialize" solution: Add a post-callback to FileDeletionTask. And CONTAINER_RESOURCES_CLEANEDUP event is sent by that callback. Comments please. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923881#comment-15923881 ] Hong Zhiguo commented on YARN-6319: --- The "app dir cleanup" is triggered in ApplicationImpl.AppFinishTriggeredTransition or ApplicationImpl.AppFinishTransition. There's pre-condition that all container dirs are cleaned up. {code} if (app.containers.isEmpty()) { // No container to cleanup. Cleanup app level resources. app.handleAppFinishWithContainersCleanedup(); return ApplicationState.APPLICATION_RESOURCES_CLEANINGUP; } {code} But this doesn't work. Because in ResourceLocalizationService.handleCleanupContainerResources, only async "container dir cleanup" is triggered, and then CONTAINER_RESOURCES_CLEANEDUP event is sent out, which then leads to ApplicationImpl.AppFinishTransition and ApplicationImpl.containers.Remove(...). So ApplicationImpl.containers could be empty but "container dir cleanup" is still in fly. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-6319: -- Description: Last container (on one node) of one app complete |--> triggers async deletion of container dir (container cleanup) |--> triggers async deletion of app dir (app cleanup) For LCE, deletion is done by container-executor. The "app cleanup" lists sub-dir (step 1), and then unlink items one by one(step 2). If a file is deleted by "container cleanup" between step 1 and step2, it'll report below error and breaks the deletion. {code} ContainerExecutor: Couldn't delete file $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE - No such file or directory {code} This app dir then escape the cleanup. And that's why we always have many app dirs left there. solution 1: just ignore the error without breaking in container-executor.c::delete_path() solution 2: use a lock to serialize the cleanup of same app dir. solution 3: backoff and retry on error Comments are welcome. was: Last container (on one node) of one app complete |--> triggers async deletion of container dir (container cleanup) |--> triggers async deletion of app dir (app cleanup) For LCE, deletion is done by container-executor. The "app cleanup" lists sub-dir (step 1), and then unlink items one by one(step 2). If a file is deleted by "container cleanup" between step 1 and step2, it'll report below error and breaks the deletion. {code} ContainerExecutor: Couldn't delete file $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE - No such file or directory {code} This app dir then escape the cleanup. And that's why we always have many app dirs left there. solution 1: just ignore the error without breaking in container-executor.c::delete_path() solution 2: use a lock to serialize the cleanup of same app dir. solution 3: backoff and retry on error Suggestions are welcome. > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Comments are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906816#comment-15906816 ] Hong Zhiguo commented on YARN-6319: --- The race condition could be reproduced by below script: {code} USER=xxx GRP=yyy CE=/PATH/TO/container-executor SIZE=200 mkdir app mkdir -p app/container dd if=/dev/zero of=app/container/a count=$SIZE bs=1M dd if=/dev/zero of=app/container/b count=$SIZE bs=1M dd if=/dev/zero of=app/container/c count=$SIZE bs=1M dd if=/dev/zero of=app/container/d count=$SIZE bs=1M dd if=/dev/zero of=app/container/e count=$SIZE bs=1M chown $USER:$GRP -R app/ $CE $USER 3 ./app/container & $CE $USER 3 ./app {code} > race condition between deleting app dir and deleting container dir > -- > > Key: YARN-6319 > URL: https://issues.apache.org/jira/browse/YARN-6319 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > > Last container (on one node) of one app complete > |--> triggers async deletion of container dir (container cleanup) > |--> triggers async deletion of app dir (app cleanup) > For LCE, deletion is done by container-executor. The "app cleanup" lists > sub-dir (step 1), and then unlink items one by one(step 2). If a file is > deleted by "container cleanup" between step 1 and step2, it'll report below > error and breaks the deletion. > {code} > ContainerExecutor: Couldn't delete file > $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE > - No such file or directory > {code} > This app dir then escape the cleanup. And that's why we always have many app > dirs left there. > solution 1: just ignore the error without breaking in > container-executor.c::delete_path() > solution 2: use a lock to serialize the cleanup of same app dir. > solution 3: backoff and retry on error > Suggestions are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6319) race condition between deleting app dir and deleting container dir
Hong Zhiguo created YARN-6319: - Summary: race condition between deleting app dir and deleting container dir Key: YARN-6319 URL: https://issues.apache.org/jira/browse/YARN-6319 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Last container (on one node) of one app complete |--> triggers async deletion of container dir (container cleanup) |--> triggers async deletion of app dir (app cleanup) For LCE, deletion is done by container-executor. The "app cleanup" lists sub-dir (step 1), and then unlink items one by one(step 2). If a file is deleted by "container cleanup" between step 1 and step2, it'll report below error and breaks the deletion. {code} ContainerExecutor: Couldn't delete file $LOCAL/usercache/$USER/appcache/application_1481785469354_353539/container_1481785469354_353539_01_28/$FILE - No such file or directory {code} This app dir then escape the cleanup. And that's why we always have many app dirs left there. solution 1: just ignore the error without breaking in container-executor.c::delete_path() solution 2: use a lock to serialize the cleanup of same app dir. solution 3: backoff and retry on error Suggestions are welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242280#comment-15242280 ] Hong Zhiguo commented on YARN-2306: --- The patch is available, do you have any comments? > leak of reservation metrics (fair scheduler) > > > Key: YARN-2306 > URL: https://issues.apache.org/jira/browse/YARN-2306 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch > > > This only applies to fair scheduler. Capacity scheduler is OK. > When appAttempt or node is removed, the metrics for > reservation(reservedContainers, reservedMB, reservedVCores) is not reduced > back. > These are important metrics for administrator. The wrong metrics confuses may > confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217334#comment-15217334 ] Hong Zhiguo commented on YARN-4002: --- Including return statement into readlock critical seciton doesn't make difference except longer critical section and worse performance. Since hostsList and excludeList of hostReader is updated by reference assignment, no race condition would exist even the lookup is not protected by readlock. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock-v2.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Attachment: YARN-4002-rwlock-v2.patch Uploaded YARN-4002-rwlock-v2.patch for an improvement: make the read side critical section smaller. {code} this.hostsReadLock.lock(); try { hostsList = hostsReader.getHosts(); excludeList = hostsReader.getExcludedHosts(); } finally { this.hostsReadLock.unlock(); } {code} As explained by [~rohithsharma], this prevents mixing up old value of hostsReader.getHosts() and new value of hostsReader.getExcludedHosts(). And this is the only reason someone may prefer rwlock solution than lockless one. If the mixing up is not thought (for example, by meself) a problem, lockless solution is good engouth. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock-v2.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174795#comment-15174795 ] Hong Zhiguo commented on YARN-4002: --- Hi, [~rohithsharma], thanks for the refinement. But why don't take the lockless version? > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, > YARN-4002-rwlock.patch, YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Attachment: YARN-4002-rwlock.patch YARN-4002-lockless-read.patch 2 patch for the 2 proposed solutions submitted. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-4002-lockless-read.patch, YARN-4002-rwlock.patch, > YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039597#comment-15039597 ] Hong Zhiguo commented on YARN-4002: --- I'm working on it. I've proposed 2 different solutions and waiting for specific comments. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4181) node blacklist for AM launching
[ https://issues.apache.org/jira/browse/YARN-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4181: -- Description: In some cases, a node goes problematic and most launching containers fail on this node, as well as the launching AM containers. Then this node has more available resource than other nodes in the cluster. The Application whose AM is failing has zero minShareRatio. With fair scheduler, this node is always rated first, and the misfortune Application is also likely rated first. The result is: attempts of the this application are failing again and again on the same node. We should avoid such a deadlock situation. Solution 1: NM could detect the failure rate of containers. If the rate is high, the NM marks itself to unhealthy for a period. But we should be careful not to turn all nodes into unhealthy by a buggy Application. Maybe use failure rate of containers for different Applications. Solution 2: To have Application level blacklist by AMLauncher, in addition to existing blacklist by AM. was: In some cases, a node goes problematic and most launching containers fail on this node, as well as the launching AM containers. Then this node has more available resource than other nodes in the cluster. The Application whose AM is failing has zero minShareRatio. With fair scheduler, this node is always rated first, and the misfortune Application is also likely rated first. The result is: attempts of the this application are failing again and again on the same node. Solution 1: NM could detect the failure rate of containers. If the rate is high, the NM marks itself to unhealthy for a period. But we should be careful not to turn all nodes into unhealthy by a buggy Application. Maybe use failure rate of containers for different Applications. Solution 2: To have Application level blacklist by AMLauncher, in addition to existing blacklist by AM. > node blacklist for AM launching > --- > > Key: YARN-4181 > URL: https://issues.apache.org/jira/browse/YARN-4181 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > In some cases, a node goes problematic and most launching containers fail on > this node, as well as the launching AM containers. > Then this node has more available resource than other nodes in the cluster. > The Application whose AM is failing has zero minShareRatio. With fair > scheduler, this node is always rated first, and the misfortune Application is > also likely rated first. The result is: attempts of the this application are > failing again and again on the same node. > We should avoid such a deadlock situation. > Solution 1: NM could detect the failure rate of containers. If the rate is > high, the NM marks itself to unhealthy for a period. But we should be careful > not to turn all nodes into unhealthy by a buggy Application. Maybe use > failure rate of containers for different Applications. > Solution 2: To have Application level blacklist by AMLauncher, in addition to > existing blacklist by AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4181) node blacklist for AM launching
Hong Zhiguo created YARN-4181: - Summary: node blacklist for AM launching Key: YARN-4181 URL: https://issues.apache.org/jira/browse/YARN-4181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor In some cases, a node goes problematic and most launching containers fail on this node, as well as the launching AM containers. Then this node has more available resource than other nodes in the cluster. The Application whose AM is failing has zero minShareRatio. With fair scheduler, this node is always rated first, and the misfortune Application is also likely rated first. The result is: attempts of the this application are failing again and again on the same node. Solution 1: NM could detect the failure rate of containers. If the rate is high, the NM marks itself to unhealthy for a period. But we should be careful not to turn all nodes into unhealthy by a buggy Application. Maybe use failure rate of containers for different Applications. Solution 2: To have Application level blacklist by AMLauncher, in addition to existing blacklist by AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4104: -- Description: We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? " It's really hard to answer such questions. So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc. Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions. I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline. was: We have more than 1 thousand queues and several handreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? " It's realy hard to answer such questions. So we added an diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the chidren are also displayed, such as minShare, usage, demand, weight, priority etc. Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions. I feel it's really usefull for multi-tenant clusters, and hope it could be merged into the mainline. > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
Hong Zhiguo created YARN-4104: - Summary: dryrun of schedule for diagnostic and tenant's complain Key: YARN-4104 URL: https://issues.apache.org/jira/browse/YARN-4104 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor We have more than 1 thousand queues and several handreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? " It's realy hard to answer such questions. So we added an diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the chidren are also displayed, such as minShare, usage, demand, weight, priority etc. Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions. I feel it's really usefull for multi-tenant clusters, and hope it could be merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4104: -- Description: We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? " It's really hard to answer such questions. So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc. Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result self-explains to the questions. I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline. was: We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? " It's really hard to answer such questions. So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator(). All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc. Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions. I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline. > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726780#comment-14726780 ] Hong Zhiguo commented on YARN-4104: --- For better human readability, it's plain text. {code} 0001 root.g_isd_999 min(6143,1) max(12288,4) dem(12288,4) use(52194816,15652) weight=1 0002 root.g_ieg_ttlz_ttlz_import_tdbank min(61439,19) max(1228800,400) dem(13056,6) use(1536,1) weight=800 0003 root.g_ieg_wegalaxy_wegalaxy_import_tdbank min(61439,19) max(1228800,400) dem(13056,6) use(1536,1) weight=800 0004 root.safety_cloud min(18432000,6000) max(18432000,6000) dem(18432000,6000) use(10585088,5169) weight=6000 0005 root.g_teg_datacompress min(6144000,2000) max(12288000,4000) dem(52224,27) use(32256,18) weight=2400 0006 root.g_input_output_hlw min(368639,119) max(1474560,480) dem(20480,19) use(13312,12) weight=800 0007 root.g_ecc_express_ecc_express min(1474559,479) max(5898240,1920) dem(9472,4) use(6272,3) weight=832 0008 root.g_iegv2_datacompress min(6144000,2000) max(12288000,4000) dem(46080,24) use(34560,19) weight=2400 0009 root.g_raid_datacompress min(6144000,2000) max(12288000,4000) dem(65280,35) use(52992,29) weight=2400 0010 root.g_input_output_ieg_tdbank min(2764799,899) max(11059200,3600) dem(177408,90) use(145152,76) weight=1210 0011 root.g_ieg_iegpdata_idata_subject_analysis min(1228799,459) max(9830400,3680) dem(1372928,601) use(1022720,449) weight=814 ... {code} > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726697#comment-14726697 ] Hong Zhiguo commented on YARN-4104: --- It only works for fair scheduler at this moment because we use fair scheduler only. But it would be easy to support other schedulers. Can I make several 3rd level of subtasks of this one? > dryrun of schedule for diagnostic and tenant's complain > --- > > Key: YARN-4104 > URL: https://issues.apache.org/jira/browse/YARN-4104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > We have more than 1 thousand queues and several hundreds of tenants in a busy > cluster. We get a lot of complains/questions from owner/operator of queues > about "Why my queue/app can't get resource for a long while? " > It's really hard to answer such questions. > So we added a diagnostic REST endpoint > "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted > list of it's children according to it's SchedulingPolicy.getComparator(). > All scheduling parameters of the children are also displayed, such as > minShare, usage, demand, weight, priority etc. > Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result > self-explains to the questions. > I feel it's really useful for multi-tenant clusters, and hope it could be > merged into the mainline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721144#comment-14721144 ] Hong Zhiguo commented on YARN-4024: --- Why jenkins doesn't run against the latest patch? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch, YARN-4024-v7.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v7.patch Thanks for your comments, [~adhoot], updated the patch. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch, YARN-4024-v7.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v6.patch the findbugs warning is about unchecked rawtypes in AMLivelinessMonitor.java. I fixed it in the v6 patch. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v5.patch Thanks for your comments, [~sunilg] and [~leftnoteasy]. I updated patch v5 accordingly. Interface Resolver is left as public with @VisibleForTesting, because it's accessed from test cases. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft-v3.patch YARN-4024-draft-v3.patch: fix the checkstyle warning and testcase failure YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v4.patch Thanks for your comments, [~leftnoteasy]. I didn't notice there's already such events. I updated the patch accordingly. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699285#comment-14699285 ] Hong Zhiguo commented on YARN-4024: --- In this patch, both positive and negative lookup result is cached and has the same expiry interval. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft.patch Add an configuration option yarn.resourcemanager.node-ip-cache.expiry-interval-secs, while -1 disables caching. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft-v2.patch updated the patch with flushing when node state is transiting between USABLE and UNUSABLE. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698927#comment-14698927 ] Hong Zhiguo commented on YARN-4024: --- That's a good reason to have this cache. [~leftnoteasy], in earlier comments, you said {code} 1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is on whitelist. 2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve every time when the node doing heartbeat since it may change to its IP to a one not on the blacklist. {code} I think that's too complicated. The cache lookup is a part of resolving (name to address). And the check of IP whitelist/blacklist is just the following stage. I think cache with configurable expiration is enough, we'd better leave the 2 stages orthogonal, not to mix them up. BTW, I think it's not good to have Name in NodeId, but Address in whitelist/blacklist. Different layers of abstraction are mixed up. We'll don't have this issue if Name or Address is used for both NodeId and whitelist/blacklist. a better way is to have Name in whitelist/blacklist, instead of Address. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698929#comment-14698929 ] Hong Zhiguo commented on YARN-4024: --- Please ignore the last sentence a better way is to have Name in whitelist/blacklist, instead of Address. Or could someone help to delete it. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695011#comment-14695011 ] Hong Zhiguo commented on YARN-4024: --- There's DNS cache in InetAddress. What's the benefit to have another layer of cache in memory? Maybe easier to control? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679444#comment-14679444 ] Hong Zhiguo commented on YARN-4024: --- We've did this one year ago in our 5k+ cluster. Can I take this issue? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
Hong Zhiguo created YARN-4018: - Summary: correct docker image name is rejected by DockerContainerExecutor Key: YARN-4018 URL: https://issues.apache.org/jira/browse/YARN-4018 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo For example: www.dockerbase.net/library/mongo www.dockerbase.net:5000/library/mongo:latest -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Description: For example: www.dockerbase.net/library/mongo www.dockerbase.net:5000/library/mongo:latest leads to error: Image: www.dockerbase.net/library/mongo is not a proper docker image Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker image was: For example: www.dockerbase.net/library/mongo www.dockerbase.net:5000/library/mongo:latest correct docker image name is rejected by DockerContainerExecutor Key: YARN-4018 URL: https://issues.apache.org/jira/browse/YARN-4018 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo For example: www.dockerbase.net/library/mongo www.dockerbase.net:5000/library/mongo:latest leads to error: Image: www.dockerbase.net/library/mongo is not a proper docker image Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker image -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Attachment: YARN-4018.patch correct docker image name is rejected by DockerContainerExecutor Key: YARN-4018 URL: https://issues.apache.org/jira/browse/YARN-4018 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-4018.patch For example: www.dockerbase.net/library/mongo www.dockerbase.net:5000/library/mongo:latest leads to error: Image: www.dockerbase.net/library/mongo is not a proper docker image Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker image -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4016) docker container is still running when app is killed
Hong Zhiguo created YARN-4016: - Summary: docker container is still running when app is killed Key: YARN-4016 URL: https://issues.apache.org/jira/browse/YARN-4016 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo The docker_container_executor_session.sh is generated like below: {code} ### get the pid of docker container by docker inspect echo `/usr/bin/docker inspect --format {{.State.Pid}} container_1438681002528_0001_01_02` .../container_1438681002528_0001_01_02.pid.tmp ### rename *.pid.tmp to *.pid /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp .../container_1438681002528_0001_01_02.pid ### launch the docker container /usr/bin/docker run --rm --net=host --name container_1438681002528_0001_01_02 -v ... library/mysql /container_1438681002528_0001_01_02/launch_container.sh {code} This is obviously wrong because you can not get the pid of a docker container before starting it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-3.patch Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647709#comment-14647709 ] Hong Zhiguo commented on YARN-3965: --- made it private with Getter. Hi, [~zxu], [~jlowe], could please review the patch? Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-4.patch Add startup timestamp to nodemanager UI --- Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4001) normalizeHostName takes too much of execution time
Hong Zhiguo created YARN-4001: - Summary: normalizeHostName takes too much of execution time Key: YARN-4001 URL: https://issues.apache.org/jira/browse/YARN-4001 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor For each NodeHeartbeatRequest, NetUtils.normalizeHostName is called under a lock. I did profiling for a very large cluster and found out NetUtils.normalizeHostName takes most execution time of ResourceTrackerService.nodeHeartbeat(...). We'd better have an option to use raw IP (plus port) as the Node identity to scale for large clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
Hong Zhiguo created YARN-4002: - Summary: make ResourceTrackerService.nodeHeartbeat more concurrent Key: YARN-4002 URL: https://issues.apache.org/jira/browse/YARN-4002 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG log in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to have alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648666#comment-14648666 ] Hong Zhiguo commented on YARN-3965: --- Hi, [~jlowe], version 4 of the patch is uploaded with 2 changes: 1) NodeInfo.getNmStartupTime - NodeInfo.getNMStartupTime 2) removed the final qualifier on NodeManager.nmStartupTime to avoid checkstyle error: {code} Name 'nmStartupTime' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$' {code} It's private with Getter. So it's OK not to be final. Add startup timestamp to nodemanager UI --- Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965-4.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to have alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. was: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG log in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to have alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. make ResourceTrackerService.nodeHeartbeat more concurrent - Key: YARN-4002 URL: https://issues.apache.org/jira/browse/YARN-4002 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to have alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. was: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to have alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. make ResourceTrackerService.nodeHeartbeat more concurrent - Key: YARN-4002 URL: https://issues.apache.org/jira/browse/YARN-4002 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrent access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. was: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrently access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. make ResourceTrackerService.nodeHeartbeat more concurrent - Key: YARN-4002 URL: https://issues.apache.org/jira/browse/YARN-4002 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method ResourceTrackerService.nodeHeartbeat should be concurrent enough to scale for large clusters. But we have a BIG lock in NodesListManager.isValidNode which I think it's unnecessary. First, the fields includes and excludes of HostsFileReader are only updated on refresh nodes. All RPC threads handling node heartbeats are only readers. So RWLock could be used to alow concurrent access by RPC threads. Second, since he fields includes and excludes of HostsFileReader are always updated by reference assignment, which is atomic in Java, the reader side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641473#comment-14641473 ] Hong Zhiguo commented on YARN-3965: --- Hi, [~zxu], thanks for your comments. Here comes my re-consideration. 1. The nmStartupTime could be non-statice field of NodeManager, but it make it harder to access it since the accesser must have a reference to the NodeManager instance. For example, there's no such reference in current implementaion of NodeInfo constructor. One option is to make nmStartupTime as a non-static filed of NMContext. But I doubt is it worth to make simple thing complecated. BTW, the startup timestampt of ResourceManager is also static. 2. It's final so don't need warry about that. Private field with a Getter is also OK if you think it's better. Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640195#comment-14640195 ] Hong Zhiguo commented on YARN-3965: --- The polling doesn't need to happen frequently. Only when the operator NM upgrade or NM configuration change. Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965.patch Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-2.patch The first patch breaks TestNMWebServices.verifyNodeInfo. Corrected in this one. Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638610#comment-14638610 ] Hong Zhiguo commented on YARN-2545: --- RMAppEventType#ATTEMPT_FAILED is not suitable because it leads to check of maxAppAttempt. Here AM unregistered with getFinalApplicationStatus()==FAILED, the RMApp should transit to FAILED without check of maxAppAttempt In current implementation of RMAppImpl, targetedFinalState of FinalSavingTransition is statically determined by (preState, eventType). A simple solution is to replace ATTEMPT_UNREGISTERED event with 2 types of event: ATTEMPT_UNREGISTERED_SUCC and ATTEMPT_UNREGISTERED_FAIL. Any suggestion? RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoop YARN Issue Type: Bug Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, and then exits, the corresponding RMApp and RMAppAttempt transit to state FINISHED. I think this is wrong and confusing. On RM WebUI, this application is displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, not as Apps Failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3965) Add starup timestamp for nodemanager
Hong Zhiguo created YARN-3965: - Summary: Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Priority: Minor We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo reassigned YARN-3965: - Assignee: Hong Zhiguo Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630694#comment-14630694 ] Hong Zhiguo commented on YARN-2306: --- hi, [~rchiang], do you mean running the unit test in patch againt trunk? leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo resolved YARN-1974. --- Resolution: Not A Problem add args for DistributedShell to specify a set of nodes on which the tasks run -- Key: YARN-1974 URL: https://issues.apache.org/jira/browse/YARN-1974 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.7.0 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-1974.patch It's very useful to execute a script on a specific set of machines for both testing and maintenance purpose. The args --nodes and --relax_locality are added to DistributedShell. Together with an unit test using miniCluster. It's also tested on our real cluster with Fair scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630742#comment-14630742 ] Hong Zhiguo commented on YARN-2306: --- Updated the patch. I ran testReservationMetrics several times and no failure now. leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630688#comment-14630688 ] Hong Zhiguo commented on YARN-2768: --- [~kasha], could you please review the patch? optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3897) Too many links in NM log dir
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3897: -- Description: Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. was: Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. Too many links in NM log dir -- Key: YARN-3897 URL: https://issues.apache.org/jira/browse/YARN-3897 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630741#comment-14630741 ] Hong Zhiguo commented on YARN-2306: --- I checked the code of tearDown and it shows someone already did this. leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306-3.patch leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306.patch-3 leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: (was: YARN-2306.patch-3) leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306-3.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3897) Too many links in NM log dir
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3897: -- Description: Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. was: Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate some many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. Too many links in NM log dir -- Key: YARN-3897 URL: https://issues.apache.org/jira/browse/YARN-3897 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3897) Too many links in NM log dir
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618365#comment-14618365 ] Hong Zhiguo commented on YARN-3897: --- One solution is to have an extra layer of dirs as the parent of appId subdirs. The middle layer of dirs could be named with hash of appId. This behaviour shoud be configrable. Too many links in NM log dir -- Key: YARN-3897 URL: https://issues.apache.org/jira/browse/YARN-3897 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate so many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3897) Too many links in NM log dir
Hong Zhiguo created YARN-3897: - Summary: Too many links in NM log dir Key: YARN-3897 URL: https://issues.apache.org/jira/browse/YARN-3897 Project: Hadoop YARN Issue Type: Improvement Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Users need to left container logs more than one day. On some nodes of our busy cluster, the number of subdirs of {yarn.nodemanager.log-dirs} may reach 32000, which is the defaul limit of ext3 file system. As a result, we got errors when initiating containers: Failed to create directory {yarn.nodemanager.log-dirs}/logs/application_1435111082717_1341740 - Too many links log aggregation is not an option for us because of the heavy pressure on namenode. With a cluster of 5K nodes and 20k log files per node, it's not acceptable to aggregate some many files to hdfs. Since ext3 is still widely used, we'd better do something to avoid such error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581349#comment-14581349 ] Hong Zhiguo commented on YARN-2768: --- [~kasha], the excution time displayed in the profiling output is cumulative. Actually, I repeated such profiling a lot of times and got the same ratio. The profiling is done with a cluster of NM/AM simulators and I don't have such resource now. I wrote a testcase which creates 8000 nodes, 4500 apps within 1200 queues, and then performs 1 rounds of FairScheduler.update(), and print the average execution time of one call to update. With this patch, the average execution time decreased from about 35ms to 20ms. I think the effect comes from GC and memory allocation since in each round of FairScheduler.update(), Resource.multiply is called as many times as the number of pending ResourceRequests, which is more than 3 million in our production cluster. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560663#comment-14560663 ] Hong Zhiguo commented on YARN-3678: --- First, stop container happens frequently. Second, the pid recycle doesn't need to have a whole round in 250ms. Only need to have one or more rounds during the container lifetime. If we have 100 times of stop container happen on one node per day, we have 100/32768, about 0.3% chance for one node one day. That's not very low, especially when we have 5000 nodes. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560748#comment-14560748 ] Hong Zhiguo commented on YARN-3678: --- the event sequence: call SEND SIGTERM - pid recycle - call SEND SIGKILL - check process live time(based on current time) The time between [call SEND SIGTERM] and [call SEND SIGKILL] is 250ms The time between [pid recycle] and [check process live time] may be shorter or longer than 250ms. When it's longer than 250ms, there's chance we make false positive judgement. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560578#comment-14560578 ] Hong Zhiguo commented on YARN-3678: --- We met same issue on our production cluster last year. The same user is used for NM and some app-submitter. I hacked the kernel __send_signal function via kprobe (https://github.com/honkiko/signal-monitor) and confirmed the happening: - container-executor sends SIGTERM to a container (say, pid = X) - The container exits quickly (in 250ms) - pid X is recycled and taken by a new spawned thread of NM - after 250ms, container-executor sends SIGKILL to pid X - NM is killed I added checking of living time before container-executor sends SIGKILL. If the process has living time shorter than 250ms, it's not the target process that we send SIGTERM to, and just skip it. With this fix, the accident rate is reduced from several times per day to nearly zero. If you think such fix is acceptable, I'll post it here. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482891#comment-14482891 ] Hong Zhiguo commented on YARN-3102: --- I met the same problem. Hi, [~Naganarasimha], can I take this issue? Decommisioned Nodes not listed in Web UI Key: YARN-3102 URL: https://issues.apache.org/jira/browse/YARN-3102 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: 2 Node Manager and 1 Resource Manager Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to yarn.exlude file In RM1 machine Add Yarn.exclude with NM1 Host Name Start the node as listed below NM1,NM2 Resource manager Now check Nodes decommisioned in /cluster/nodes Number of decommisioned node is listed as 1 but Table is empty in /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
Hong Zhiguo created YARN-2768: - Summary: optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: profiling_FairScheduler_update.png optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Description: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. was: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: YARN-2768.patch Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo. After this optimization, the average time costed by FairScheduler.update (a TestCase with 10k apps) is reduced 40%. I'm not sure whether it's better to have such test cases also submitted. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154946#comment-14154946 ] Hong Zhiguo commented on YARN-2545: --- How about the state of appAttempt? should it finally be FAILED instead of FINISHED? RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoop YARN Issue Type: Bug Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, and then exits, the corresponding RMApp and RMAppAttempt transit to state FINISHED. I think this is wrong and confusing. On RM WebUI, this application is displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, not as Apps Failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152804#comment-14152804 ] Hong Zhiguo commented on YARN-2545: --- [~leftnoteasy], [~jianhe], [~ozawa], please have a look, should we set state of app/appAttempt to FAILED instead of FINISHED, or just count it as Apps Failed instead of Apps Completed? RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoop YARN Issue Type: Bug Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, and then exits, the corresponding RMApp and RMAppAttempt transit to state FINISHED. I think this is wrong and confusing. On RM WebUI, this application is displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, not as Apps Failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
Hong Zhiguo created YARN-2545: - Summary: RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoop YARN Issue Type: Bug Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, and then exits, the corresponding RMApp and RMAppAttempt transit to state FINISHED. I think this is wrong and confusing. On RM WebUI, this application is displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, not as Apps Failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306-2.patch updated the patch with only the new unit test, since it seems this bug is fixed already in trunk. The unit test succeeded. I suggest to include this unit test to have regression for this bug. leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306-2.patch, YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1801) NPE in public localizer
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106555#comment-14106555 ] Hong Zhiguo commented on YARN-1801: --- I think YARN-1575 already fixed this NPE. We could mark it as duplicated. NPE in public localizer --- Key: YARN-1801 URL: https://issues.apache.org/jira/browse/YARN-1801 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Hong Zhiguo Priority: Critical Attachments: YARN-1801.patch While investigating YARN-1800 found this in the NM logs that caused the public localizer to shutdown: {noformat} 2014-01-23 01:26:38,655 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(651)) - Downloading public rsrc:{ hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, 1390440382009, FILE, null } 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(726)) - Error: Shutting down java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712) 2014-01-23 01:26:38,656 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running
Hong Zhiguo created YARN-2371: - Summary: Wrong NMToken is issued when NM preserving restart with containers running Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Attachment: YARN-2371.patch Wrong NMToken is issued when NM preserving restart with containers running -- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Summary: Wrong NMToken is issued when NM preserving restarts with containers running (was: Wrong NMToken is issued when NM preserving restart with containers running) Wrong NMToken is issued when NM preserving restarts with containers running --- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1489 Wrong NMToken is issued when NM preserving restarts with containers running --- Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Attachments: YARN-2371.patch When application is submitted with ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == true, and NM is restarted with containers running, wrong NMToken is issued to AM through RegisterApplicationMasterResponse. See the NM log: {code} 2014-07-30 11:59:58,941 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start container.- NMToken for application attempt : appattempt_1406691610864_0002_01 was used for starting container with container token issued for application attempt : appattempt_1406691610864_0002_02 {code} The reason is in below code: {code} createAndGetNMToken(String applicationSubmitter, ApplicationAttemptId appAttemptId, Container container) { .. Token token = createNMToken(container.getId().getApplicationAttemptId(), container.getNodeId(), applicationSubmitter); .. } {code} appAttemptId instead of container.getId().getApplicationAttemptId() should be passed to createNMToken. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2323: -- Attachment: YARN-2323-2.patch patch revised according to [~sandyr]'s comments. FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2323) FairShareComparator creates too much Resource object
Hong Zhiguo created YARN-2323: - Summary: FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2323: -- Attachment: YARN-2323.patch FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2299) inconsistency at identifying node
[ https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064711#comment-14064711 ] Hong Zhiguo commented on YARN-2299: --- or take usage of existing config property yarn.scheduler.include-port-in-node-name when differentiating nodes. inconsistency at identifying node - Key: YARN-2299 URL: https://issues.apache.org/jira/browse/YARN-2299 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical If port of yarn.nodemanager.address is not specified at NM, NM will choose random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, host:port1 and host:port2 will both be present in Active Nodes on WebUI for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor in Lost Nodes. Another case, two NM is running on same host(miniYarnCluster or other test purpose), if both of them are lost, we get only one Lost Nodes in WebUI. In both case, sum of Active Nodes and Lost Nodes is not the number of nodes we expected. The root cause is due to inconsistency at how we think two Nodes are identical. When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains port. Two nodes with same host but different port are thought to be different node. But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only use host. Two nodes with same host but different port are thought to identical. To fix the inconsistency, we should differentiate below 2 cases and be consistent for both of them: - intentionally multiple NMs per host - NM instances one after another on same host Two possible solutions: 1) Introduce a boolean config like one-node-per-host(default as true), and use host to differentiate nodes on RM if it's true. 2) Make it mandatory to have valid port in yarn.nodemanager.address config. In this sutiation, NM instances one after another on same host will have same NodeId, while intentionally multiple NMs per host will have different NodeId. Personally I prefer option 1 because it's easier for users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo reassigned YARN-2305: - Assignee: Hong Zhiguo When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Hong Zhiguo ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064732#comment-14064732 ] Hong Zhiguo commented on YARN-2305: --- Are you using fair scheduler? If yes, then I thinks it's the same reason of YARN-2306. When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Summary: leak of reservation metrics (fair scheduler) (was: leak of reservation metrics) leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064738#comment-14064738 ] Hong Zhiguo commented on YARN-2305: --- OK When a container is in reserved state then total cluster memory is displayed wrongly. - Key: YARN-2305 URL: https://issues.apache.org/jira/browse/YARN-2305 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: J.Andreina Assignee: Hong Zhiguo Attachments: Capture.jpg ENV Details: = 3 queues : a(50%),b(25%),c(25%) --- All max utilization is set to 100 2 Node cluster with total memory as 16GB TestSteps: = Execute following 3 jobs with different memory configurations for Map , reducer and AM task ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 /dir8 /preempt_85 (application_1405414066690_0023) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 /dir2 /preempt_86 (application_1405414066690_0025) ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 /dir2 /preempt_62 Issue = when 2GB memory is in reserved state totoal memory is shown as 15GB and used as 15GB ( while total memory is 16GB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306.patch leak of reservation metrics (fair scheduler) Key: YARN-2306 URL: https://issues.apache.org/jira/browse/YARN-2306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2306.patch This only applies to fair scheduler. Capacity scheduler is OK. When appAttempt or node is removed, the metrics for reservation(reservedContainers, reservedMB, reservedVCores) is not reduced back. These are important metrics for administrator. The wrong metrics confuses may confuse them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2299) inconsistency at identifying node
Hong Zhiguo created YARN-2299: - Summary: inconsistency at identifying node Key: YARN-2299 URL: https://issues.apache.org/jira/browse/YARN-2299 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical If port of yarn.nodemanager.address is not specified at NM, NM will choose random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, host:port1 and host:port2 will both be present in Active Nodes on WebUI for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor in Lost Nodes. Another case, two NM is running on same host(miniYarnCluster or other test purpose), if both of them are lost, we get only one Lost Nodes in WebUI. In both case, sum of Active Nodes and Lost Nodes is not the number of nodes we expected. The root cause is due to inconsistency at how we think two Nodes are identical. When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains port. Two nodes with same host but different port are thought to be different node. But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only use host. Two nodes with same host but different port are thought to identical. We should differentiate 2 cases: - intentionally multiple NMs per host - NM instances one after another on same host Two possible solutions: 1) Introduce a boolean config like one-node-per-host(default as true), and use host to differentiate nodes on RM if it's true. 2) Make it mandatory to have valid port in yarn.nodemanager.address config. In this sutiation, NM instances one after another on same host will have same NodeId, while intentionally multiple NMs per host will have different NodeId. Personally I prefer option 1 because it's easier for users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2299) inconsistency at identifying node
[ https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2299: -- Description: If port of yarn.nodemanager.address is not specified at NM, NM will choose random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, host:port1 and host:port2 will both be present in Active Nodes on WebUI for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor in Lost Nodes. Another case, two NM is running on same host(miniYarnCluster or other test purpose), if both of them are lost, we get only one Lost Nodes in WebUI. In both case, sum of Active Nodes and Lost Nodes is not the number of nodes we expected. The root cause is due to inconsistency at how we think two Nodes are identical. When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains port. Two nodes with same host but different port are thought to be different node. But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only use host. Two nodes with same host but different port are thought to identical. To fix the inconsistency, we should differentiate below 2 cases and support both of them: - intentionally multiple NMs per host - NM instances one after another on same host Two possible solutions: 1) Introduce a boolean config like one-node-per-host(default as true), and use host to differentiate nodes on RM if it's true. 2) Make it mandatory to have valid port in yarn.nodemanager.address config. In this sutiation, NM instances one after another on same host will have same NodeId, while intentionally multiple NMs per host will have different NodeId. Personally I prefer option 1 because it's easier for users. was: If port of yarn.nodemanager.address is not specified at NM, NM will choose random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, host:port1 and host:port2 will both be present in Active Nodes on WebUI for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor in Lost Nodes. Another case, two NM is running on same host(miniYarnCluster or other test purpose), if both of them are lost, we get only one Lost Nodes in WebUI. In both case, sum of Active Nodes and Lost Nodes is not the number of nodes we expected. The root cause is due to inconsistency at how we think two Nodes are identical. When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains port. Two nodes with same host but different port are thought to be different node. But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only use host. Two nodes with same host but different port are thought to identical. We should differentiate 2 cases: - intentionally multiple NMs per host - NM instances one after another on same host Two possible solutions: 1) Introduce a boolean config like one-node-per-host(default as true), and use host to differentiate nodes on RM if it's true. 2) Make it mandatory to have valid port in yarn.nodemanager.address config. In this sutiation, NM instances one after another on same host will have same NodeId, while intentionally multiple NMs per host will have different NodeId. Personally I prefer option 1 because it's easier for users. inconsistency at identifying node - Key: YARN-2299 URL: https://issues.apache.org/jira/browse/YARN-2299 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Critical If port of yarn.nodemanager.address is not specified at NM, NM will choose random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, host:port1 and host:port2 will both be present in Active Nodes on WebUI for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor in Lost Nodes. Another case, two NM is running on same host(miniYarnCluster or other test purpose), if both of them are lost, we get only one Lost Nodes in WebUI. In both case, sum of Active Nodes and Lost Nodes is not the number of nodes we expected. The root cause is due