[jira] [Commented] (YARN-8776) Container Executor change to create stdin/stdout pipeline
[ https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650894#comment-16650894 ] Zian Chen commented on YARN-8776: - Still work on refine the patch. Will update initial one later today or tomorrow. > Container Executor change to create stdin/stdout pipeline > - > > Key: YARN-8776 > URL: https://issues.apache.org/jira/browse/YARN-8776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > > The pipeline is built to connect the stdin/stdout channel from WebSocket > servlet through container-executor to docker executor. So when the WebSocket > servlet is started, we need to invoke container-executor “dockerExec” method > (which will be implemented) to create a new docker executor and use “docker > exec -it $ContainerId” command which executes an interactive bash shell on > the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8778) Add Command Line interface to invoke interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8778: --- Assignee: Eric Yang (was: Zian Chen) > Add Command Line interface to invoke interactive docker shell > - > > Key: YARN-8778 > URL: https://issues.apache.org/jira/browse/YARN-8778 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > > CLI will be the mandatory interface we are providing for a user to use > interactive docker shell feature. We will need to create a new class > “InteractiveDockerShellCLI” to read command line into the servlet and pass > all the way down to docker executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8778) Add Command Line interface to invoke interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650892#comment-16650892 ] Zian Chen commented on YARN-8778: - Hi [~eyang], sure, please take this one. I'll assign it to you. > Add Command Line interface to invoke interactive docker shell > - > > Key: YARN-8778 > URL: https://issues.apache.org/jira/browse/YARN-8778 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > > CLI will be the mandatory interface we are providing for a user to use > interactive docker shell feature. We will need to create a new class > “InteractiveDockerShellCLI” to read command line into the servlet and pass > all the way down to docker executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642220#comment-16642220 ] Zian Chen commented on YARN-8777: - +1 for patch 7. > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch, YARN-8777.002.patch, > YARN-8777.003.patch, YARN-8777.004.patch, YARN-8777.005.patch, > YARN-8777.006.patch, YARN-8777.007.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642219#comment-16642219 ] Zian Chen commented on YARN-8763: - [~eyang], thanks for +1. Could you help me commit this patch? Thanks > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch, YARN-8763.005.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763.005.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch, YARN-8763.005.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640482#comment-16640482 ] Zian Chen edited comment on YARN-8763 at 10/6/18 12:01 AM: --- [~eyang] sorry for getting back late. I'm curious why TestContainerManager been triggered too. Anyway, i tried the patch 004 locally and TestContainerManager UTs all passed, And I updated patch 005 with your suggestions. Let's see how patch-005 goes. was (Author: zian chen): [~eyang] sorry for getting back late. I'm curious why TestContainerManager been triggered too. Anyway, i tried the patch 003 locally and TestContainerManager UTs all passed, And I updated patch 004 with your suggestions. Let's see how patch-004 goes. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640482#comment-16640482 ] Zian Chen commented on YARN-8763: - [~eyang] sorry for getting back late. I'm curious why TestContainerManager been triggered too. Anyway, i tried the patch 003 locally and TestContainerManager UTs all passed, And I updated patch 004 with your suggestions. Let's see how patch-004 goes. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638734#comment-16638734 ] Zian Chen commented on YARN-8777: - Hi [~eyang], thanks for patch 006. Seems we still have whitespace errors in latest Jenkins build? Could you help fix it? Overall patch looks good to me. > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch, YARN-8777.002.patch, > YARN-8777.003.patch, YARN-8777.004.patch, YARN-8777.005.patch, > YARN-8777.006.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638730#comment-16638730 ] Zian Chen commented on YARN-8763: - [~eyang], just uploaded patch 004, please help review it. Thanks > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763.004.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch, YARN-8763.004.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638658#comment-16638658 ] Zian Chen commented on YARN-8763: - Thanks for the comments, Eric, I'll update the patch later today. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636543#comment-16636543 ] Zian Chen commented on YARN-8763: - Hi [~eyang], could you help review patch 003? It address comments as we discussed above. Thanks > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763.003.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch, > YARN-8763.003.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634486#comment-16634486 ] Zian Chen commented on YARN-8763: - Hi [~eyang] , make sense. I'll work on patch 003 to address comments & Jenkins failures. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632636#comment-16632636 ] Zian Chen commented on YARN-8763: - Update patch-002 for review. really appreciate the help from [~eyang] on patch-002. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763.002.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch, YARN-8763.002.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626601#comment-16626601 ] Zian Chen commented on YARN-8758: - Hi [~sunilg] [~weiweiyagn666], could you help review the patch? Thanks > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8758.001.patch > > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8758: Attachment: YARN-8758.001.patch > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8758.001.patch > > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626513#comment-16626513 ] Zian Chen commented on YARN-8758: - I'll work on this Jira and provide an initial patch. > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8758) PreemptionMessage when using AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8758: --- Assignee: Zian Chen > PreemptionMessage when using AMRMClientAsync > > > Key: YARN-8758 > URL: https://issues.apache.org/jira/browse/YARN-8758 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.1 >Reporter: Krishna Kishore >Assignee: Zian Chen >Priority: Major > > Hi, > The preemption notification messages sent in the time period defined by > the following parameter now work only on AMRMClient, but not on > AMRMClientAsync. > *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill* > We want this work on the AMRMClientAsync also because our implementations are > based on this one. > > Thanks, > Kishore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8785) Error Message "Invalid docker rw mount" not helpful
[ https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624007#comment-16624007 ] Zian Chen commented on YARN-8785: - Hi [~simonprewo], thanks for the patch. The patch itself looks good to me. One more add-on with [~eyang] comments, after rename the patch as " YARN-8785.001.patch", please click submit patch on the top bottom and drop in the patch file as an attachment, then it will trigger Jenkins build to verify if anything is affected by this patch. Thanks for the effort > Error Message "Invalid docker rw mount" not helpful > --- > > Key: YARN-8785 > URL: https://issues.apache.org/jira/browse/YARN-8785 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.9.1, 3.1.1 >Reporter: Simon Prewo >Assignee: Simon Prewo >Priority: Major > Labels: Docker > Original Estimate: 2h > Remaining Estimate: 2h > > A user recieves the error message _Invalid docker rw mount_ when a container > tries to mount a directory which is not configured in property > *docker.allowed.rw-mounts*. > {code:java} > Invalid docker rw mount > '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01', > > realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code} > The error message makes the user think "It is not possible due to a docker > issue". My suggestion would be to put there a message like *Configuration of > the container executor does not allow mounting directory.*. > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c > CURRENT: > {code:java} > permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, > mount_src); > permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, > mount_src); > if (permitted_ro == -1 || permitted_rw == -1) { > fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", > values[i], mount_src); > ... > {code} > NEW: > {code:java} > permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, > mount_src); > permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, > mount_src); > if (permitted_ro == -1 || permitted_rw == -1) { > fprintf(ERRORFILE, "Configuration of the container executor does not > allow mounting directory '%s', realpath=%s\n", values[i], mount_src); > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623999#comment-16623999 ] Zian Chen commented on YARN-8777: - Thanks [~eyang] for the work. I'm ok with patch 003. One quick question, you mentioned {code:java} It is entirely possible to use ProcessBuilder and launch container-executor to run docker exec, and send unix command to be executed. {code} Is processbuilder mentioned to be an possible way for code reuse on passing arbitrary commands? If yes, then this approach might run into similar issue for enum approach, which can only handle a small set of command options not arbitrary commands. > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch, YARN-8777.002.patch, > YARN-8777.003.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8805) Automatically convert the launch command to the exec form when using entrypoint support
[ https://issues.apache.org/jira/browse/YARN-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622555#comment-16622555 ] Zian Chen commented on YARN-8805: - Thanks [~shaneku...@gmail.com], I'll work on the patch > Automatically convert the launch command to the exec form when using > entrypoint support > --- > > Key: YARN-8805 > URL: https://issues.apache.org/jira/browse/YARN-8805 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Zian Chen >Priority: Major > Labels: Docker > > When {{YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}} is true, and a > launch command is provided, it is expected that the launch command is > provided by the user in exec form. > For example: > {code:java} > "/usr/bin/sleep 6000"{code} > must be changed to: > {code}"/usr/bin/sleep,6000"{code} > If this is not done, the container will never start and will be in a Created > state. We should automatically do this conversion vs making the user > understand this nuance of using the entrypoint support. Docs should be > updated to reflect this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8805) Automatically convert the launch command to the exec form when using entrypoint support
[ https://issues.apache.org/jira/browse/YARN-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8805: --- Assignee: Zian Chen > Automatically convert the launch command to the exec form when using > entrypoint support > --- > > Key: YARN-8805 > URL: https://issues.apache.org/jira/browse/YARN-8805 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Zian Chen >Priority: Major > Labels: Docker > > When {{YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}} is true, and a > launch command is provided, it is expected that the launch command is > provided by the user in exec form. > For example: > {code:java} > "/usr/bin/sleep 6000"{code} > must be changed to: > {code}"/usr/bin/sleep,6000"{code} > If this is not done, the container will never start and will be in a Created > state. We should automatically do this conversion vs making the user > understand this nuance of using the entrypoint support. Docs should be > updated to reflect this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8805) Automatically convert the launch command to the exec form when using entrypoint support
[ https://issues.apache.org/jira/browse/YARN-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622499#comment-16622499 ] Zian Chen commented on YARN-8805: - Yes, just checked the latest released doc, [https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/Examples.html,] format needs to be fixed. Also agree with [~shaneku...@gmail.com], we should make the convert automatically when YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is set to true. Would you like to provide a patch for this [~shaneku...@gmail.com], or I can help > Automatically convert the launch command to the exec form when using > entrypoint support > --- > > Key: YARN-8805 > URL: https://issues.apache.org/jira/browse/YARN-8805 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Priority: Major > Labels: Docker > > When {{YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}} is true, and a > launch command is provided, it is expected that the launch command is > provided by the user in exec form. > For example: > {code:java} > "/usr/bin/sleep 6000"{code} > must be changed to: > {code}"/usr/bin/sleep,6000"{code} > If this is not done, the container will never start and will be in a Created > state. We should automatically do this conversion vs making the user > understand this nuance of using the entrypoint support. Docs should be > updated to reflect this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8785) Error Message "Invalid docker rw mount" not helpful
[ https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622495#comment-16622495 ] Zian Chen commented on YARN-8785: - Hi [~simonprewo], would you like to work on this Jira and provide a patch? Or I can help with it. > Error Message "Invalid docker rw mount" not helpful > --- > > Key: YARN-8785 > URL: https://issues.apache.org/jira/browse/YARN-8785 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.9.1, 3.1.1 >Reporter: Simon Prewo >Priority: Major > Labels: Docker > Original Estimate: 2h > Remaining Estimate: 2h > > A user recieves the error message _Invalid docker rw mount_ when a container > tries to mount a directory which is not configured in property > *docker.allowed.rw-mounts*. > {code:java} > Invalid docker rw mount > '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01', > > realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code} > The error message makes the user think "It is not possible due to a docker > issue". My suggestion would be to put there a message like *Configuration of > the container executor does not allow mounting directory.*. > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c > CURRENT: > {code:java} > permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, > mount_src); > permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, > mount_src); > if (permitted_ro == -1 || permitted_rw == -1) { > fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", > values[i], mount_src); > ... > {code} > NEW: > {code:java} > permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, > mount_src); > permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, > mount_src); > if (permitted_ro == -1 || permitted_rw == -1) { > fprintf(ERRORFILE, "Configuration of the container executor does not > allow mounting directory '%s', realpath=%s\n", values[i], mount_src); > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8801) java doc comments in docker-util.h is confusing
[ https://issues.apache.org/jira/browse/YARN-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622438#comment-16622438 ] Zian Chen commented on YARN-8801: - Thank you [~eyang] > java doc comments in docker-util.h is confusing > --- > > Key: YARN-8801 > URL: https://issues.apache.org/jira/browse/YARN-8801 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8801.001.patch > > > {code:java} > /** > + * Get the Docker exec command line string. The function will verify that > the params file is meant for the exec command. > + * @param command_file File containing the params for the Docker start > command > + * @param conf Configuration struct containing the container-executor.cfg > details > + * @param out Buffer to fill with the exec command > + * @param outlen Size of the output buffer > + * @return Return code with 0 indicating success and non-zero codes > indicating error > + */ > +int get_docker_exec_command(const char* command_file, const struct > configuration* conf, args *args);{code} > The method param list have out an outlen which didn't match the signature, > and we miss description for param args. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8801) java doc comments in docker-util.h is confusing
[ https://issues.apache.org/jira/browse/YARN-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621350#comment-16621350 ] Zian Chen commented on YARN-8801: - Provide patch for the fix. Don't need to add UTs here. > java doc comments in docker-util.h is confusing > --- > > Key: YARN-8801 > URL: https://issues.apache.org/jira/browse/YARN-8801 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Labels: Docker > > {code:java} > /** > + * Get the Docker exec command line string. The function will verify that > the params file is meant for the exec command. > + * @param command_file File containing the params for the Docker start > command > + * @param conf Configuration struct containing the container-executor.cfg > details > + * @param out Buffer to fill with the exec command > + * @param outlen Size of the output buffer > + * @return Return code with 0 indicating success and non-zero codes > indicating error > + */ > +int get_docker_exec_command(const char* command_file, const struct > configuration* conf, args *args);{code} > The method param list have out an outlen which didn't match the signature, > and we miss description for param args. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8801) java doc comments in docker-util.h is confusing
[ https://issues.apache.org/jira/browse/YARN-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8801: Labels: Docker (was: ) > java doc comments in docker-util.h is confusing > --- > > Key: YARN-8801 > URL: https://issues.apache.org/jira/browse/YARN-8801 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Labels: Docker > > {code:java} > /** > + * Get the Docker exec command line string. The function will verify that > the params file is meant for the exec command. > + * @param command_file File containing the params for the Docker start > command > + * @param conf Configuration struct containing the container-executor.cfg > details > + * @param out Buffer to fill with the exec command > + * @param outlen Size of the output buffer > + * @return Return code with 0 indicating success and non-zero codes > indicating error > + */ > +int get_docker_exec_command(const char* command_file, const struct > configuration* conf, args *args);{code} > The method param list have out an outlen which didn't match the signature, > and we miss description for param args. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8801) java doc comments in docker-util.h is confusing
Zian Chen created YARN-8801: --- Summary: java doc comments in docker-util.h is confusing Key: YARN-8801 URL: https://issues.apache.org/jira/browse/YARN-8801 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Assignee: Zian Chen {code:java} /** + * Get the Docker exec command line string. The function will verify that the params file is meant for the exec command. + * @param command_file File containing the params for the Docker start command + * @param conf Configuration struct containing the container-executor.cfg details + * @param out Buffer to fill with the exec command + * @param outlen Size of the output buffer + * @return Return code with 0 indicating success and non-zero codes indicating error + */ +int get_docker_exec_command(const char* command_file, const struct configuration* conf, args *args);{code} The method param list have out an outlen which didn't match the signature, and we miss description for param args. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8790) Authentication Filter change to force security check
Zian Chen created YARN-8790: --- Summary: Authentication Filter change to force security check Key: YARN-8790 URL: https://issues.apache.org/jira/browse/YARN-8790 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Hadoop node manager REST API is authenticated using AuthenticationFilter from Hadoop-auth project. AuthenticationFilter is added to the new WebSocket URL path spec. The requested remote user is verified to match the container owner to allow WebSocket connection to be established. WebSocket servlet code enforces the username match check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619569#comment-16619569 ] Zian Chen commented on YARN-8777: - Hi [~eyang], thanks for the patch, some quick suggestions and questions, 1. {code:java} /** + * Get the Docker exec command line string. The function will verify that the params file is meant for the exec command. + * @param command_file File containing the params for the Docker start command + * @param conf Configuration struct containing the container-executor.cfg details + * @param out Buffer to fill with the exec command + * @param outlen Size of the output buffer + * @return Return code with 0 indicating success and non-zero codes indicating error + */ +int get_docker_exec_command(const char* command_file, const struct configuration* conf, args *args);{code} The method param list have out an outlen which didn't match the signature, and we miss description for param args, is this typo? 2. for the code reuse you discussed with [~ebadger], my quick thoughts is instead of passing parameters from node manager, we can probably give an enum to index several common used command options, and ask node manager only pass index which can be matched with one of these enum elements, in this way we can have some kind of flexibility without open up bigger attack interface. 3. This patch seems focus on running docker exec -it command to attach to a running container, but later on when the pipeline is been build, should we also take care of passing shell commands inside the container ? > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8781) back-port YARN-8091 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen resolved YARN-8781. - Resolution: Invalid > back-port YARN-8091 to branch-2.6.4 > --- > > Key: YARN-8781 > URL: https://issues.apache.org/jira/browse/YARN-8781 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.4 >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8091 to branch 2.6.4 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8781) back-port YARN-8091 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616961#comment-16616961 ] Zian Chen commented on YARN-8781: - Close as invalid. > back-port YARN-8091 to branch-2.6.4 > --- > > Key: YARN-8781 > URL: https://issues.apache.org/jira/browse/YARN-8781 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.4 >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8091 to branch 2.6.4 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8781) back-port YARN-8091 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616416#comment-16616416 ] Zian Chen commented on YARN-8781: - Rework the YARN 8091 patch to fix conflicts with trunk since a lot of changes has been made since 2.6.4 > back-port YARN-8091 to branch-2.6.4 > --- > > Key: YARN-8781 > URL: https://issues.apache.org/jira/browse/YARN-8781 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.4 >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8091 to branch 2.6.4 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8781) back-port YARN-8091 to branch-2.6.4
Zian Chen created YARN-8781: --- Summary: back-port YARN-8091 to branch-2.6.4 Key: YARN-8781 URL: https://issues.apache.org/jira/browse/YARN-8781 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.4 Reporter: Zian Chen Assignee: Zian Chen Fix For: 2.6.4 We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/YARN-8091 to branch 2.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8780) back-port YARN-8028 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8780: Attachment: YARN-8028-branch-2.6.4-001.patch > back-port YARN-8028 to branch-2.6.4 > --- > > Key: YARN-8780 > URL: https://issues.apache.org/jira/browse/YARN-8780 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > Attachments: YARN-8028-branch-2.6.4-001.patch > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8028 to branch 2.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8780) back-port YARN-802 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616409#comment-16616409 ] Zian Chen commented on YARN-8780: - Rework the YARN-8028 patch to fix conflicts with trunk since a lot of changes has been made since 2.6.4 > back-port YARN-802 to branch-2.6.4 > -- > > Key: YARN-8780 > URL: https://issues.apache.org/jira/browse/YARN-8780 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8028 to branch 2.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8780) back-port YARN-8028 to branch-2.6.4
[ https://issues.apache.org/jira/browse/YARN-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8780: Summary: back-port YARN-8028 to branch-2.6.4 (was: back-port YARN-802 to branch-2.6.4) > back-port YARN-8028 to branch-2.6.4 > --- > > Key: YARN-8780 > URL: https://issues.apache.org/jira/browse/YARN-8780 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Minor > Fix For: 2.6.4 > > > We suggest a patch that back-ports the change > https://issues.apache.org/jira/browse/YARN-8028 to branch 2.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8780) back-port YARN-802 to branch-2.6.4
Zian Chen created YARN-8780: --- Summary: back-port YARN-802 to branch-2.6.4 Key: YARN-8780 URL: https://issues.apache.org/jira/browse/YARN-8780 Project: Hadoop YARN Issue Type: Bug Reporter: Zian Chen Assignee: Zian Chen Fix For: 2.6.4 We suggest a patch that back-ports the change https://issues.apache.org/jira/browse/YARN-8028 to branch 2.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8778) Add Command Line interface to invoke interactive docker shell
Zian Chen created YARN-8778: --- Summary: Add Command Line interface to invoke interactive docker shell Key: YARN-8778 URL: https://issues.apache.org/jira/browse/YARN-8778 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Assignee: Zian Chen CLI will be the mandatory interface we are providing for a user to use interactive docker shell feature. We will need to create a new class “InteractiveDockerShellCLI” to read command line into the servlet and pass all the way down to docker executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8777) Container Executor C binary change to execute interactive docker command
Zian Chen created YARN-8777: --- Summary: Container Executor C binary change to execute interactive docker command Key: YARN-8777 URL: https://issues.apache.org/jira/browse/YARN-8777 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Since Container Executor provides Container execution using the native container-executor binary, we also need to make changes to accept new “dockerExec” method to invoke the corresponding native function to execute docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8776) Container Executor change to create stdin/stdout pipeline
Zian Chen created YARN-8776: --- Summary: Container Executor change to create stdin/stdout pipeline Key: YARN-8776 URL: https://issues.apache.org/jira/browse/YARN-8776 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Assignee: Zian Chen The pipeline is built to connect the stdin/stdout channel from WebSocket servlet through container-executor to docker executor. So when the WebSocket servlet is started, we need to invoke container-executor “dockerExec” method (which will be implemented) to create a new docker executor and use “docker exec -it $ContainerId” command which executes an interactive bash shell on the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610186#comment-16610186 ] Zian Chen commented on YARN-8763: - Hi [~eyang], thanks for the detailed suggestions. Make sense. Let me address these comments as well as Jenkins errors/ > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609919#comment-16609919 ] Zian Chen commented on YARN-8763: - Hi [~eyang], could you help review the patch? Thanks! > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8763: Attachment: YARN-8763-001.patch > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8763-001.patch > > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
[ https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609917#comment-16609917 ] Zian Chen commented on YARN-8763: - Provide initial patch for this. > Add WebSocket logic to the Node Manager web server to establish servlet > --- > > Key: YARN-8763 > URL: https://issues.apache.org/jira/browse/YARN-8763 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > > The reason we want to use WebSocket servlet to serve the backend instead of > establishing the connection through HTTP is that WebSocket solves a few > issues with HTTP which needed for our scenario, > # In HTTP, the request is always initiated by the client and the response is > processed by the server — making HTTP a unidirectional protocol, while web > socket provides the Bi-directional protocol which means either client/server > can send a message to the other party. > # Full-duplex communication — client and server can talk to each other > independently at the same time > # Single TCP connection — After upgrading the HTTP connection in the > beginning, client and server communicate over that same TCP connection > throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet
Zian Chen created YARN-8763: --- Summary: Add WebSocket logic to the Node Manager web server to establish servlet Key: YARN-8763 URL: https://issues.apache.org/jira/browse/YARN-8763 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Assignee: Zian Chen The reason we want to use WebSocket servlet to serve the backend instead of establishing the connection through HTTP is that WebSocket solves a few issues with HTTP which needed for our scenario, # In HTTP, the request is always initiated by the client and the response is processed by the server — making HTTP a unidirectional protocol, while web socket provides the Bi-directional protocol which means either client/server can send a message to the other party. # Full-duplex communication — client and server can talk to each other independently at the same time # Single TCP connection — After upgrading the HTTP connection in the beginning, client and server communicate over that same TCP connection throughout the lifecycle of WebSocket connection -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
[ https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8762: Attachment: Interactive Docker Shell design doc.pdf > [Umbrella] Support Interactive Docker Shell to running Containers > - > > Key: YARN-8762 > URL: https://issues.apache.org/jira/browse/YARN-8762 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Priority: Major > Labels: Docker > Attachments: Interactive Docker Shell design doc.pdf > > > Debugging distributed application can be challenging on Hadoop. Hadoop > provide limited debugging ability through application log files. One of the > most frequently requested feature is to provide interactive shell to assist > real time debugging. This feature is inspired by docker exec to provide > ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
[ https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609916#comment-16609916 ] Zian Chen commented on YARN-8762: - Provide design doc for this. > [Umbrella] Support Interactive Docker Shell to running Containers > - > > Key: YARN-8762 > URL: https://issues.apache.org/jira/browse/YARN-8762 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Zian Chen >Priority: Major > Labels: Docker > > Debugging distributed application can be challenging on Hadoop. Hadoop > provide limited debugging ability through application log files. One of the > most frequently requested feature is to provide interactive shell to assist > real time debugging. This feature is inspired by docker exec to provide > ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609914#comment-16609914 ] Zian Chen commented on YARN-8523: - Offline discussed with Eric and Wangda, this feature involves creating a pipeline among NM, container-exec and docker exec which requires a lot of changes to container stack, create Umbrella Jira YARN-8762 to track progress. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Zian Chen >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers
Zian Chen created YARN-8762: --- Summary: [Umbrella] Support Interactive Docker Shell to running Containers Key: YARN-8762 URL: https://issues.apache.org/jira/browse/YARN-8762 Project: Hadoop YARN Issue Type: New Feature Reporter: Zian Chen Debugging distributed application can be challenging on Hadoop. Hadoop provide limited debugging ability through application log files. One of the most frequently requested feature is to provide interactive shell to assist real time debugging. This feature is inspired by docker exec to provide ability to run arbitrary commands in docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592307#comment-16592307 ] Zian Chen commented on YARN-8509: - Hi [~eepayne], sorry for getting back late. I took some time to run SLS test in order to evaluate if this change introduced unnecessary preemption, however, there is no suitable dataset which can be used to test preemption behavior since almost all the dataset submits applications at the same time, which let scheduler considered all the resource request in allocation stage. This leaves no chance for preemption to come into play. I'll work on generate a dataset for preemption SLS test offline and may take several weeks. I'll comment on the updates once I have some progress. > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588010#comment-16588010 ] Zian Chen commented on YARN-8509: - Fix failed UTs and re-upload the patch > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.005.patch > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.004.patch > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581573#comment-16581573 ] Zian Chen commented on YARN-8509: - Offline discussed with Eric and Wangda, will upload a new patch to evaluate the algorithm we provided here works as expected and not cause any over-preemption. > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578859#comment-16578859 ] Zian Chen commented on YARN-7417: - Hi [~eyang], thanks for the comments. I think we don't need to add extra UTs, we already get a bunch of UTs locate in hadoop-mpreduce-client, hadoop-yarn-common and a lot more which used for testing IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock already. As long as all the UTs are passed, it means we didn't break anything after this refactoring. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch, > YARN-7417.003.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Attachment: YARN-7417.003.patch > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch, > YARN-7417.003.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-8523: --- Assignee: Zian Chen > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Zian Chen >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576945#comment-16576945 ] Zian Chen commented on YARN-8523: - Make sense. I'll work on provide an initial patch for this idea. Thanks [~eyang] > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576917#comment-16576917 ] Zian Chen commented on YARN-7417: - But looks like we can make AggregatedLogFormat.ContainerLogsReader to extend InputStream to achieve this. Let me update the patch. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576900#comment-16576900 ] Zian Chen commented on YARN-8509: - Hi [~eepayne], sure, let address these two questions, 1) the summation is for each user, we calculate the minimum of these two expression, one is the pending resource for this user per partition, another one is user limit (which is queue_capacity * user_limit_factor) - user used resource per partition 2) I think there is some misunderstanding here. First of all, after the title been changed, this Jira is not intend to only support balancing of queues after satisfied. It intend to change the general strategy of how user limit is been calculated in preemption scenario. So the queue capacities I mentioned here for the example is an initial state, which is like this, || ||queue-a||queue-b||queue-c||queue-b|| |Guaranteed|30|30|30|10| |Used|10|40|50|0| |Pending|6|30|30|0| this configuration should able to happen if we set user_limit_percent to 50 and user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within current equation, this initial state won't happen. user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) in above case, queue-b's queue_capacity * user_limit_factor is 90GB while max(current_capacity)/ #active_users, current_capacity * user_limit_percent) is 40GB, this will make user-limit-factor don't make any effect at all, and headroom becomes zero for queue-b. So the point is, we should let user-limit to reach at most queue_capacity * user_limit_factor > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573997#comment-16573997 ] Zian Chen edited comment on YARN-8509 at 8/10/18 9:21 PM: -- Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded before is not correct due to misunderstand of the original problem. I have changed the Jira title. The intention of this Jira is to fix calculation of pending resource consider user-limit in preemption scenario. Currently, pending resource calculation in preemption uses the calculation algorithm in scheduling which is this one, {code:java} user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) {code} this is good for scheduling cause we want to make sure users can get at least "minimum-user-limit-percent" of resource to use, which is more like a lower bound of user-limit. However we should not capture total pending resource a leaf queue can get by minimum-user-limit-percent, instead, we want to use user-limit-factor which is the upper bound to capture pending resource in preemption. Cause if we use minimum-user-limit-percent to capture pending resource, resource under-utilization will happen in preemption scenario. Thus, we suggest the pending resource calculation for preemption should use this formula. {code:java} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min { User.ulf(partition) - User.used(partition), User.pending(partition})} {code} Let me give an example, {code:java} Root / | \ \ a b c d 30 30 30 10 1) Only one node (n1) in the cluster, it has 100G. 2) app1 submit to queue-a, asks for 10G used, 6G pending. 3) app2 submit to queue-b, asks for 40G used, 30G pending. 4) app3 submit to queue-c, asks for 50G used, 30G pending. {code} Here we only have one user, and user-limit-factor for queues are ||Queue name|| minimum-user-limit-percent ||user-limit-factor|| | a| 50| 1.0 f| | b| 50| 3.0 f| | c| 50| 3.0 f| | d| 50| 2.0 f| With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G pending, but user-limit for queue-b becomes 40G, which makes headroom become zero after subtract 40G used, the 30G pending resource been asked can not be accepted, same thing with queue-c too. However if we see this test case in preemption point of view, we should allow queue-b and queue-c take more pending resources. Because even though queue-a has 30G guaranteed configured, it's under utilization. And by pending resource captured by the old algorithm, queue-b and queue-c can not take available resource through preemption which make the cluster resource not used effectively. To summarize, since user-limit-factor maintains the hard-limit of how much resource can be used by a user, we should calculate pending resource consider user-limit-factor instead of minimum-user-limit-percent. Could you share your opinion on this, [~eepayne]? was (Author: zian chen): Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded before is not correct due to misunderstand of the original problem. I have changed the Jira title. The intention of this Jira is to fix calculation of pending resource consider user-limit in preemption scenario. Currently, pending resource calculation in preemption uses the calculation algorithm in scheduling which is this one, {code:java} user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) {code} this is good for scheduling cause we want to make sure users can get at least "minimum-user-limit-percent" of resource to use, which is more like a lower bound of user-limit. However we should not capture total pending resource a leaf queue can get by minimum-user-limit-percent, instead, we want to use user-limit-factor which is the upper bound to capture pending resource in preemption. Cause if we use minimum-user-limit-percent to capture pending resource, resource under-utilization will happen in preemption scenario. Thus, we suggest the pending resource calculation for preemption should use this formula. {code:java} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min { User.ulf(partition) - User.used(partition), User.pending(partition})} {code} Let me give an example, {code:java} Root / | \ \ a b c d 30 30 30 10 1) Only one node (n1) in the cluster, it has 100G. 2) app1 submit to queue-a, asks for 10G used, 6G pending. 3) app2 submit to queue-b, asks for 40G used
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576815#comment-16576815 ] Zian Chen commented on YARN-7417: - Thanks for the review [~eyang], that was my original plan to make it reusable, but after investigating the logic, it's very almost impossible to achieve this. The main reason is one formal parameter can not be abstracted into a common class type. The "AggregatedLogFormat.ContainerLogsReader logReader" in TFileAggregatedLogsBlock is a static class which can not be converted into any of the parent class of the formal parameter "InputStream in" in IndexedFileAggregatedLogsBlock > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock # We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be abstract into common methods. # render method is too long in both of these class, we want to make it clear by abstracting some helper methods out. was: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be abstract into common method. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock and > TFileAggregatedLogsBlock > # We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock which can be > abstract into common methods. > # render method is too long in both of these class, we want to make it clear > by abstracting some helper methods out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can be abstract into common method. was:We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > This Jira is focus on refactor code for IndexedFileAggregatedLogsBlock > We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and IndexedFileAggregatedLogsBlock which can > be abstract into common method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Description: We have duplicate code in current implementation of IndexedFileAggregatedLogsBlock and > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > > We have duplicate code in current implementation of > IndexedFileAggregatedLogsBlock and -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576697#comment-16576697 ] Zian Chen commented on YARN-8523: - Good point, I think we can make this Jira focus on building this pipline and create a second Jira for persistent docker exec state while NM restart. Two more questions here, # Should we give user sone kind of notification while NM restart and we are trying to resuming the docker exec? What if we get several retries to reconnect and don't succeed? We may need to give user some friendly reminder to avoid the misunderstanding of session been stuck for too long, right? # How to handle NM unexpected shutdown(like crash, etc) scenario? > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573997#comment-16573997 ] Zian Chen commented on YARN-8509: - Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded before is not correct due to misunderstand of the original problem. I have changed the Jira title. The intention of this Jira is to fix calculation of pending resource consider user-limit in preemption scenario. Currently, pending resource calculation in preemption uses the calculation algorithm in scheduling which is this one, {code:java} user_limit = min(max(current_capacity)/ #active_users, current_capacity * user_limit_percent), queue_capacity * user_limit_factor) {code} this is good for scheduling cause we want to make sure users can get at least "minimum-user-limit-percent" of resource to use, which is more like a lower bound of user-limit. However we should not capture total pending resource a leaf queue can get by minimum-user-limit-percent, instead, we want to use user-limit-factor which is the upper bound to capture pending resource in preemption. Cause if we use minimum-user-limit-percent to capture pending resource, resource under-utilization will happen in preemption scenario. Thus, we suggest the pending resource calculation for preemption should use this formula. {code:java} total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ (min { User.ulf(partition) - User.used(partition), User.pending(partition})} {code} Let me give an example, {code:java} Root / | \ \ a b c d 30 30 30 10 1) Only one node (n1) in the cluster, it has 100G. 2) app1 submit to queue-a, asks for 10G used, 6G pending. 3) app2 submit to queue-b, asks for 40G used, 30G pending. 4) app3 submit to queue-c, asks for 50G used, 30G pending. {code} Here we only have one user, and user-limit-factor for queues are ||Queue name|| minimum-user-limit-percent ||user-limit-factor|| | a| 1| 1.0 f| | b| 1| 2.0 f| | c| 1| 2.0 f| | d| 1| 2.0 f| With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G pending, but user-limit for queue-b becomes 40G, which makes headroom become zero after subtract 40G used, the 30G pending resource been asked can not be accepted, same thing with queue-c too. However if we see this test case in preemption point of view, we should allow queue-b and queue-c take more pending resources. Because even though queue-a has 30G guaranteed configured, it's under utilization. And by pending resource captured by the old algorithm, queue-b and queue-c can not take available resource through preemption which make the cluster resource not used effectively. To summarize, since user-limit-factor maintains the hard-limit of how much resource can be used by a user, we should calculate pending resource consider user-limit-factor instead of minimum-user-limit-percent. Could you share your opinion on this, [~eepayne]? > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573946#comment-16573946 ] Zian Chen commented on YARN-8523: - [~eyang], thanks for raising this feature. This is very useful for live debug of container diagnosis. we can add a series of interactive commands to let user debug more effectively, like tail -f container log, container resource usage, etc. For handling nodemanager restart scenario, we can register a event listener to listen restart or shutdown signal of node manager web socket and respond in xterm js terminal accordingly, (like print out NM restart/shutdown message to user, etc) and do reconnect retries several times after typical nm restart interval. Again, if NM meet any unexpected issue which can not resume its service, that's something we can not solve on this interactive docker shell by itself and we should just give user reasonable alert message to inform the current situation (like retry failed with timeout, please check NM log to get more information, etc). I think pass command through NM web socket and reuse container-executor security check would be a good prototype we can build first without have too much burden on handling root daemon by carving another secure channel. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573946#comment-16573946 ] Zian Chen edited comment on YARN-8523 at 8/8/18 10:07 PM: -- [~eyang], thanks for raising this feature. This is very useful for live debug of container diagnosis. we can add a series of interactive commands to let user debug more effectively, like tail -f container log, container resource usage, etc. For handling nodemanager restart scenario, we can register a event listener to listen restart or shutdown signal of node manager web socket and respond in xterm js terminal accordingly, (like print out NM restart/shutdown message to user, etc) and do reconnect retries several times after typical nm restart interval. Again, if NM meet any unexpected issue which can not resume its service, that's something we can not solve on this interactive docker shell by itself and we should just give user reasonable alert message to inform the current situation (like retry failed with timeout, please check NM log to get more information, etc). I think pass command through NM web socket and reuse container-executor security check would be a good prototype we can build first without have too much burden on handling root daemon by carving another secure channel. was (Author: zian chen): [~eyang], thanks for raising this feature. This is very useful for live debug of container diagnosis. we can add a series of interactive commands to let user debug more effectively, like tail -f container log, container resource usage, etc. For handling nodemanager restart scenario, we can register a event listener to listen restart or shutdown signal of node manager web socket and respond in xterm js terminal accordingly, (like print out NM restart/shutdown message to user, etc) and do reconnect retries several times after typical nm restart interval. Again, if NM meet any unexpected issue which can not resume its service, that's something we can not solve on this interactive docker shell by itself and we should just give user reasonable alert message to inform the current situation (like retry failed with timeout, please check NM log to get more information, etc). I think pass command through NM web socket and reuse container-executor security check would be a good prototype we can build first without have too much burden on handling root daemon by carving another secure channel. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Summary: Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent (was: Fix UserLimit calculation for preemption to balance scenario after queue satisfied ) > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570836#comment-16570836 ] Zian Chen commented on YARN-7417: - Update the patch 003 to address the findbugs issue. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570838#comment-16570838 ] Zian Chen commented on YARN-7417: - [~sunilg], could you help review the patch? Thanks > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Attachment: YARN-7417.002.patch > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch, YARN-7417.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7089) Mark the log-aggregation-controller APIs as public
[ https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570780#comment-16570780 ] Zian Chen commented on YARN-7089: - Hi [~leftnoteasy], could you help review this patch? Thanks > Mark the log-aggregation-controller APIs as public > -- > > Key: YARN-7089 > URL: https://issues.apache.org/jira/browse/YARN-7089 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7089.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7417: Attachment: YARN-7417.001.patch > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7417.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568795#comment-16568795 ] Zian Chen commented on YARN-7417: - Thanks [~xgong] for report this issue. I'll work on it and provide patch shortly. > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7417) re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to remove duplicate codes
[ https://issues.apache.org/jira/browse/YARN-7417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-7417: --- Assignee: Zian Chen > re-factory IndexedFileAggregatedLogsBlock and TFileAggregatedLogsBlock to > remove duplicate codes > > > Key: YARN-7417 > URL: https://issues.apache.org/jira/browse/YARN-7417 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566490#comment-16566490 ] Zian Chen commented on YARN-8509: - Thanks [~csingh] for the review. The failed UT are not related. [~sunilg], can you help commit the patch if everything looks good? Thanks > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7089) Mark the log-aggregation-controller APIs as public
[ https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen reassigned YARN-7089: --- Assignee: Zian Chen (was: Xuan Gong) > Mark the log-aggregation-controller APIs as public > -- > > Key: YARN-7089 > URL: https://issues.apache.org/jira/browse/YARN-7089 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7089) Mark the log-aggregation-controller APIs as public
[ https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565895#comment-16565895 ] Zian Chen commented on YARN-7089: - [~djp] [~xgong], [~rkanter] could you help review the patch? Thanks > Mark the log-aggregation-controller APIs as public > -- > > Key: YARN-7089 > URL: https://issues.apache.org/jira/browse/YARN-7089 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7089.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7089) Mark the log-aggregation-controller APIs as public
[ https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-7089: Attachment: YARN-7089.001.patch > Mark the log-aggregation-controller APIs as public > -- > > Key: YARN-7089 > URL: https://issues.apache.org/jira/browse/YARN-7089 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Zian Chen >Priority: Major > Attachments: YARN-7089.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.003.patch > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562513#comment-16562513 ] Zian Chen commented on YARN-8509: - Thanks [~csingh] for reviewing the patch. Fix the javadoc block and remove debug level. Also changed the comments for the item3 to be more straightforward. Is it looks better now? > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: (was: YARN-8509.003.patch) > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.003.patch > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558965#comment-16558965 ] Zian Chen commented on YARN-8522: - [~sunilg], could you help review the latest patch? > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsub
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558936#comment-16558936 ] Zian Chen commented on YARN-8509: - Update patch fixing failed UTs. > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.002.patch > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Description: In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total pending resource based on user-limit percent and user-limit factor which will cap pending resource for each user to the minimum of user-limit pending and actual pending. This will prevent queue from taking more pending resource to achieve queue balance after all queue satisfied with its ideal allocation. We need to change the logic to let queue pending can go beyond userlimit. was: In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total pending resource based on user-limit percent and user-limit factor which will cap pending resource for each user to the minimum of user-limit pending and actual pending. This will prevent queue from taking more pending resource to achieve queue balance after all queue satisfied with its ideal allocation. We need to change the logic to let queue pending can reach at most (Queue_max_capacity - Queue_used_capacity). > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558882#comment-16558882 ] Zian Chen commented on YARN-8509: - talked with [~sunilg], It can go beyond maxCap - usedCap. Because a user can ask for 1maps. but cluster can run a max of 1000. In this case, as soon as each map finish, other one pending will get scheduled > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8522: Attachment: YARN-8522.002.patch > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558870#comment-16558870 ] Zian Chen commented on YARN-8522: - Thanks for the suggestions [~sunilg], Update patch 002. > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554956#comment-16554956 ] Zian Chen commented on YARN-8509: - [~sunilg], I found something interesting while address the failed UT TestContainerAllocation#testPendingResourcesConsideringUserLimit, after we change the logic in patch 001, we actually allows pending resource to be simply reach whatever we made pending on the current app regardless of the max capacity hard limit. I didn't notice this before in my own UTs, my opinion for pending resource is it can go beyond user limit, but still cannot beyond maxCap - usedCap limit. What's your opinion? > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551467#comment-16551467 ] Zian Chen commented on YARN-8509: - [~sunilg], Let me address the failed UTs first. > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551334#comment-16551334 ] Zian Chen commented on YARN-8509: - Upload first patch for review. [~sunilg],could you help review the patch? Thanks > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.001.patch > Fix UserLimit calculation for preemption to balance scenario after queue > satisfied > > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8509.001.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can reach at most > (Queue_max_capacity - Queue_used_capacity). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org