[jira] [Commented] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976465#comment-13976465 ] Masatake Iwasaki commented on YARN-1418: One of the YARN specific todo is adding the way to pass tracing information to containers forked from NodeManagers. Using configuration property is straightforward. > Add Tracing to YARN > --- > > Key: YARN-1418 > URL: https://issues.apache.org/jira/browse/YARN-1418 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, nodemanager, resourcemanager >Reporter: Masatake Iwasaki > > Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. > The most part of changes needed for basis such as RPC seems to be almost > ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976408#comment-13976408 ] Hadoop QA commented on YARN-1897: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641175/YARN-1897-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/3605//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3605//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3605//console This message is automatically generated. > Define SignalContainerRequest and SignalContainerResponse > - > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1897: -- Attachment: YARN-1897-2.patch Xuan, thanks for the early patch. here is the updated version to expand SignalContainerCommand and rename some methods. 1. SignalContainerResponse has a flag to indicate the request was submitted successfully. If it fails, the application doesn't know why. Is that the diagnosis string for? Previous patch just throws exception. 2. There is no unit test just for this patch. I tested it manually with related changes in YarnCLI and RM to verify messages are being passed properly. When other work items in RM and NM are added, unit tests will be added accordingly. > Define SignalContainerRequest and SignalContainerResponse > - > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976310#comment-13976310 ] Ming Ma commented on YARN-1897: --- Thanks, Xuan. I will merge this one with the version I have and provide an update shortly. BTW, why does SignalContainerResponse needs to provide diagnosis string, to explain why the request can't be processed? > Define SignalContainerRequest and SignalContainerResponse > - > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma > Attachments: YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976282#comment-13976282 ] Sandy Ryza commented on YARN-1796: -- +1 > container-executor shouldn't require o-r permissions > > > Key: YARN-1796 > URL: https://issues.apache.org/jira/browse/YARN-1796 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Minor > Attachments: YARN-1796.patch > > > The container-executor currently checks that "other" users don't have read > permissions. This is unnecessary and runs contrary to the debian packaging > policy manual. > This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976278#comment-13976278 ] Hadoop QA commented on YARN-1796: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633282/YARN-1796.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3604//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3604//console This message is automatically generated. > container-executor shouldn't require o-r permissions > > > Key: YARN-1796 > URL: https://issues.apache.org/jira/browse/YARN-1796 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Minor > Attachments: YARN-1796.patch > > > The container-executor currently checks that "other" users don't have read > permissions. This is unnecessary and runs contrary to the debian packaging > policy manual. > This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976268#comment-13976268 ] Xuan Gong commented on YARN-1897: - [~mingma] I uploaded an initial patch for this. Please take a look and feel free to do any editions, renaming, etc. > Define SignalContainerRequest and SignalContainerResponse > - > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma > Attachments: YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1897: Attachment: YARN-1897.1.patch > Define SignalContainerRequest and SignalContainerResponse > - > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma > Attachments: YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1969) Fair Scheduler: Add policy for Earliest Deadline First
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1969: --- Summary: Fair Scheduler: Add policy for Earliest Deadline First (was: Earliest Deadline First Scheduling in the Fair Scheduler) > Fair Scheduler: Add policy for Earliest Deadline First > -- > > Key: YARN-1969 > URL: https://issues.apache.org/jira/browse/YARN-1969 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > > What we are observing is that some big jobs with many allocated containers > are waiting for a few containers to finish. Under *fair-share scheduling* > however they have a low priority since there are other jobs (usually much > smaller, new comers) that are using resources way below their fair share, > hence new released containers are not offered to the big, yet > close-to-be-finished job. Nevertheless, everybody would benefit from an > "unfair" scheduling that offers the resource to the big job since the sooner > the big job finishes, the sooner it releases its "many" allocated resources > to be used by other jobs.In other words, what we require is a kind of > variation of *Earliest Deadline First scheduling*, that takes into account > the number of already-allocated resources and estimated time to finish. > http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling > For example, if a job is using MEM GB of memory and is expected to finish in > TIME minutes, the priority in scheduling would be a function p of (MEM, > TIME). The expected time to finish can be estimated by the AppMaster using > TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource > request messages. To be less susceptible to the issue of apps gaming the > system, we can have this scheduling limited to *only within a queue*: i.e., > adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues > to use it by setting the "schedulingPolicy" field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976205#comment-13976205 ] Hudson commented on YARN-1970: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5547 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5547/]) YARN-1970. Prepare YARN codebase for JUnit 4.11. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589001) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/utils/TestSLSUtils.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/web/TestSLSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java > Prepare YARN codebase for JUnit 4.11. > - > > Key: YARN-1970 > URL: https://issues.apache.org/jira/browse/YARN-1970 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-1970.1.patch > > > HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the > YARN code needs some minor updates to fix deprecation warnings and test > isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976178#comment-13976178 ] Chris Nauroth commented on YARN-1970: - Thanks, Arpit. I'll commit this soon. BTW, I meant to mention that HADOOP-10503 contains some comments with more explanation of the need for these changes. > Prepare YARN codebase for JUnit 4.11. > - > > Key: YARN-1970 > URL: https://issues.apache.org/jira/browse/YARN-1970 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Attachments: YARN-1970.1.patch > > > HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the > YARN code needs some minor updates to fix deprecation warnings and test > isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976155#comment-13976155 ] Arpit Agarwal commented on YARN-1970: - +1 for the patch. > Prepare YARN codebase for JUnit 4.11. > - > > Key: YARN-1970 > URL: https://issues.apache.org/jira/browse/YARN-1970 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Attachments: YARN-1970.1.patch > > > HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the > YARN code needs some minor updates to fix deprecation warnings and test > isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976127#comment-13976127 ] Hadoop QA commented on YARN-1962: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641129/YARN-1962.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3603//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3603//console This message is automatically generated. > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URL
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976094#comment-13976094 ] Mohammad Kamrul Islam commented on YARN-1962: - Testing done: 1. Tested in 2.4.0 cluster of 100 nodes with [~tthompso] 2. Ran the relevant unit test including the new one. > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddre
[jira] [Updated] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated YARN-1962: Attachment: YARN-1962.2.patch Patch with review comments. > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.jav
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976061#comment-13976061 ] Jian He commented on YARN-1506: --- AdminService.updateNodeResource should RMAuditLogger to log the operations as well. > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, > YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, > YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976056#comment-13976056 ] Jian He commented on YARN-1506: --- some comments on the patch, mostly cosmetic changes: - Since it’s changed to asynchronous, we may change the log to not say *successfully*. {code} LOG.info("Update resource successfully on node(" + node.getNodeID() +") with resource(" + newResourceOption.toString() + ")"); {code} - Log inside UpdateNodeResourceWhenNonRunningTransition, good for debugging as this should be an unusual case. UpdateNodeResourceWhenNonRunningTransition -> UpdateNodeResourceWhenNotRunningTransition ? - IMO, since UpdateNodeResourceWhenUnusableTransition and UpdateNodeResourceWhenNonRunningTransition are the same except one extra logging, we can do the logging for both and just keep one transition? - if possible, nodeResourceUpdate method can be moved into AbstractYarnScheduler, a new common base class for sharing common code among all the schedulers. - SchedulerNode.setTotalResource -> SchedulerNode.updateTotalAndAvailableResource() ? - UpdateNodeResourceResponse should be an abstract class which implements newInstance() method. > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, > YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, > YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976010#comment-13976010 ] Junping Du commented on YARN-313: - Thanks [~kj-ki] for contributing a sample patch here. Although I think some code in sample patch is duplicated with YARN-312 that we already have (the proto staff on refreshResource), I will check if some code here can be integrated with my patch after YARN-1506 is figured out. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-313-sample.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976000#comment-13976000 ] Hadoop QA commented on YARN-1970: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641104/YARN-1970.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3602//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3602//console This message is automatically generated. > Prepare YARN codebase for JUnit 4.11. > - > > Key: YARN-1970 > URL: https://issues.apache.org/jira/browse/YARN-1970 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Attachments: YARN-1970.1.patch > > > HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the > YARN code needs some minor updates to fix deprecation warnings and test > isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975937#comment-13975937 ] jay vyas commented on YARN-1964: Ah, sorry, i thought this was meant to be done at a different part of the stack ... So is this jira is specificaly to create a "DockerContainerExectuor" class? then that would a really good idea, and I'm pretty sure it would be feasible. I guess you'd need to add a few parameters to the core-site in core-site.xml {noformat} yarn.nodemanager.container-executor.class DockerContainerExecutor {noformat} and then maybe have a docker-site.xml {noformat} docker.container.container.impl PythonCentOSContainer docker.container.container1 PythonCentOSContainer docker.container.container2 MyContainerWithPostgres ... {noformat} And then somehow localize resources in a docker-ish sort of way so that the containers can see all the task resources properly Is that the idea here? > Support Docker containers in YARN > - > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975920#comment-13975920 ] Bikas Saha commented on YARN-1506: -- This sleep is too long for a test. Same for other sleeps in the patch. {code}+while (alloc1Response.getAllocatedContainers().size() < 1) { + LOG.info("Waiting for containers to be created for app 1..."); + Thread.sleep(1000);{code} In the test is will be useful to have 2 outstanding container requests for that machine (while the machine is fully booked) so we know that the scheduler will be trying to allocate on that machine whenever the machine heartbeats. One outstanding request should be for 2 GB and the other for 3GB. Also, after the node resource is changed, the test should do a few node heartbeats to make sure that the scheduler will try to allocate new container (for the outstanding request) on that node and fail to do so (+ not hit any NPE). Then after the first container completes, the test should check that the 2GB outstanding container request is satisfiied but not the 3GB request. Thereafter, complete the second container and verify that the 3GB requests is still not satisfied (because the NM has only 2GB resource. Which brings another question to my mind. What happens when this change resource command reduces the NM size to something less than the max container size allowed. e.g. Lets say that the NM is 8GB and max allowed container size is 4GB. So the RM accepts a 4GB request. Now the admin changes NM to 2GB. At this point the previously accepted 4GB request cannot be satisfied and the application will get stuck. We may need to follow this up in a different jira. There may be some existing jiras related to max container size and actual NM resource size. bq. Update transition as only 2 lines of code can be shared and shared a method across different class seems over-kill in this case Its probably about personal stylistic choices. The number of lines of code are less important. What is more important is that having a shared method declares that these pieces of code are related to each other in a logical way (if such a relation exists). The dependency may be via a method in RMNodeImpl thats called by both transitions or one transition extending the other one. The choice depends on how the 2 pieces of code are related to each other. > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, > YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, > YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1970: Attachment: YARN-1970.1.patch > Prepare YARN codebase for JUnit 4.11. > - > > Key: YARN-1970 > URL: https://issues.apache.org/jira/browse/YARN-1970 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Attachments: YARN-1970.1.patch > > > HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the > YARN code needs some minor updates to fix deprecation warnings and test > isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
Chris Nauroth created YARN-1970: --- Summary: Prepare YARN codebase for JUnit 4.11. Key: YARN-1970 URL: https://issues.apache.org/jira/browse/YARN-1970 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: YARN-1970.1.patch HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the YARN code needs some minor updates to fix deprecation warnings and test isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975885#comment-13975885 ] Mohammad Kamrul Islam commented on YARN-1962: - [~zjshen] Thanks for the feedback. I will upload a new patch. > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketIm
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975790#comment-13975790 ] Vinod Kumar Vavilapalli commented on YARN-1964: --- YARN already has platform specific plugins like LinuxContainerExecutor, this would just be one more option. > Support Docker containers in YARN > - > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1969) Earliest Deadline First Scheduling in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1969: - Summary: Earliest Deadline First Scheduling in the Fair Scheduler (was: Earliest Deadline First Scheduling) > Earliest Deadline First Scheduling in the Fair Scheduler > > > Key: YARN-1969 > URL: https://issues.apache.org/jira/browse/YARN-1969 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > > What we are observing is that some big jobs with many allocated containers > are waiting for a few containers to finish. Under *fair-share scheduling* > however they have a low priority since there are other jobs (usually much > smaller, new comers) that are using resources way below their fair share, > hence new released containers are not offered to the big, yet > close-to-be-finished job. Nevertheless, everybody would benefit from an > "unfair" scheduling that offers the resource to the big job since the sooner > the big job finishes, the sooner it releases its "many" allocated resources > to be used by other jobs.In other words, what we require is a kind of > variation of *Earliest Deadline First scheduling*, that takes into account > the number of already-allocated resources and estimated time to finish. > http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling > For example, if a job is using MEM GB of memory and is expected to finish in > TIME minutes, the priority in scheduling would be a function p of (MEM, > TIME). The expected time to finish can be estimated by the AppMaster using > TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource > request messages. To be less susceptible to the issue of apps gaming the > system, we can have this scheduling limited to *only within a queue*: i.e., > adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues > to use it by setting the "schedulingPolicy" field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1969) Earliest Deadline First Scheduling
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975776#comment-13975776 ] Maysam Yabandeh commented on YARN-1969: --- An example of this behavior is when a job preempts its reducers to free space for its mappers. The freed space is however first offered to the app the has already made a reservation on the node. And then it is offered to the queues that are using lower than their fair share, it then it is offered to the queue to which the app belongs, and at the end it is offered to the app that released the resource in the first place. Note that preemption is just one example and we observe similar inefficiencies when preemption is not involved. There are already open jiras that could alleviate the problem. e.g., if YARN-1197 is finished the MRAppMaster can reuse the reducer's container instead of returning it to RM. Or YARN-1404 would allow for a more flexible scheduling for individual apps. Nevertheless it seems to us augmenting the fair-schedular to take such priorities into account addresses the problem in a more general fashion. I would highly appreciate your feedback. > Earliest Deadline First Scheduling > -- > > Key: YARN-1969 > URL: https://issues.apache.org/jira/browse/YARN-1969 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > > What we are observing is that some big jobs with many allocated containers > are waiting for a few containers to finish. Under *fair-share scheduling* > however they have a low priority since there are other jobs (usually much > smaller, new comers) that are using resources way below their fair share, > hence new released containers are not offered to the big, yet > close-to-be-finished job. Nevertheless, everybody would benefit from an > "unfair" scheduling that offers the resource to the big job since the sooner > the big job finishes, the sooner it releases its "many" allocated resources > to be used by other jobs.In other words, what we require is a kind of > variation of *Earliest Deadline First scheduling*, that takes into account > the number of already-allocated resources and estimated time to finish. > http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling > For example, if a job is using MEM GB of memory and is expected to finish in > TIME minutes, the priority in scheduling would be a function p of (MEM, > TIME). The expected time to finish can be estimated by the AppMaster using > TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource > request messages. To be less susceptible to the issue of apps gaming the > system, we can have this scheduling limited to *only within a queue*: i.e., > adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues > to use it by setting the "schedulingPolicy" field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1969) Earliest Deadline First Scheduling
Maysam Yabandeh created YARN-1969: - Summary: Earliest Deadline First Scheduling Key: YARN-1969 URL: https://issues.apache.org/jira/browse/YARN-1969 Project: Hadoop YARN Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh What we are observing is that some big jobs with many allocated containers are waiting for a few containers to finish. Under *fair-share scheduling* however they have a low priority since there are other jobs (usually much smaller, new comers) that are using resources way below their fair share, hence new released containers are not offered to the big, yet close-to-be-finished job. Nevertheless, everybody would benefit from an "unfair" scheduling that offers the resource to the big job since the sooner the big job finishes, the sooner it releases its "many" allocated resources to be used by other jobs.In other words, what we require is a kind of variation of *Earliest Deadline First scheduling*, that takes into account the number of already-allocated resources and estimated time to finish. http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling For example, if a job is using MEM GB of memory and is expected to finish in TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). The expected time to finish can be estimated by the AppMaster using TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource request messages. To be less susceptible to the issue of apps gaming the system, we can have this scheduling limited to *only within a queue*: i.e., adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues to use it by setting the "schedulingPolicy" field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1968) YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations.
[ https://issues.apache.org/jira/browse/YARN-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1968: - Description: AdminService's operation today have different dimensions of management, some are on user management while others are on cluster management, etc. Today, we only check if user belongs to some authorized group to see if he can execute operations in admin service. The result is who can either execute all operations or none which is a simple strategy but not very precisely so we cannot separate different management roles to several admins. We may need more fine-grained ACLs which can authorized user with partial operations in AdminService. was: AdminService's operation today have different dimensions of management, some is on user management while other is on cluster management. Today, we only check if user belongs to some authorized group to see if he can execute operations in admin service. The result is he can either execute all operations or none which is a simple strategy but not very precisely. We may need more fine-grained ACLs which can authorized user with partial operations in AdminService. > YARN Admin service should have more fine-grained ACL which is based on > mapping of users with methods/operations. > > > Key: YARN-1968 > URL: https://issues.apache.org/jira/browse/YARN-1968 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du > > AdminService's operation today have different dimensions of management, some > are on user management while others are on cluster management, etc. > Today, we only check if user belongs to some authorized group to see if he > can execute operations in admin service. The result is who can either execute > all operations or none which is a simple strategy but not very precisely so > we cannot separate different management roles to several admins. We may need > more fine-grained ACLs which can authorized user with partial operations in > AdminService. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1968) YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations.
Junping Du created YARN-1968: Summary: YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations. Key: YARN-1968 URL: https://issues.apache.org/jira/browse/YARN-1968 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du AdminService's operation today have different dimensions of management, some is on user management while other is on cluster management. Today, we only check if user belongs to some authorized group to see if he can execute operations in admin service. The result is he can either execute all operations or none which is a simple strategy but not very precisely. We may need more fine-grained ACLs which can authorized user with partial operations in AdminService. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975708#comment-13975708 ] Zhijie Shen commented on YARN-1962: --- [~kamrul], thanks for the patch. I've some comments on it: 1. You need to change the default in yarn-default.xml as well. {code} yarn.timeline-service.enabled true {code} 2. By doing this, you probably need to update other test cases to make the timeline client enabled. Please search through all the calls of TimelineClient > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketCon
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975687#comment-13975687 ] Hadoop QA commented on YARN-1954: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641071/YARN-1954.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3601//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3601//console This message is automatically generated. > Add waitFor to AMRMClient(Async) > > > Key: YARN-1954 > URL: https://issues.apache.org/jira/browse/YARN-1954 > Project: Hadoop YARN > Issue Type: New Feature > Components: client >Affects Versions: 3.0.0, 2.4.0 >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch > > > Recently, I saw some use cases of AMRMClient(Async). The painful thing is > that the main non-daemon thread has to sit in a dummy loop to prevent AM > process exiting before all the tasks are done, while unregistration is > triggered on a separate another daemon thread by callback methods (in > particular when using AMRMClientAsync). IMHO, it should be beneficial to add > a waitFor method to AMRMClient(Async) to block the AM until unregistration or > user supplied check point, such that users don't need to write the loop > themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975658#comment-13975658 ] Tsuyoshi OZAWA commented on YARN-1954: -- Thank you for the comment, [~zjshen]. Updated a patch: 1. Changed APIs to wait infinitely. 2. Added log in main loop. I concerned that it can be overhead to log too much. Should we add logging interval as a new parameter? 3. Created YARN-1967 to address this issue. 4. Added AMRMClient support. 5. Added methods to test waitFor() in another methods. > Add waitFor to AMRMClient(Async) > > > Key: YARN-1954 > URL: https://issues.apache.org/jira/browse/YARN-1954 > Project: Hadoop YARN > Issue Type: New Feature > Components: client >Affects Versions: 3.0.0, 2.4.0 >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch > > > Recently, I saw some use cases of AMRMClient(Async). The painful thing is > that the main non-daemon thread has to sit in a dummy loop to prevent AM > process exiting before all the tasks are done, while unregistration is > triggered on a separate another daemon thread by callback methods (in > particular when using AMRMClientAsync). IMHO, it should be beneficial to add > a waitFor method to AMRMClient(Async) to block the AM until unregistration or > user supplied check point, such that users don't need to write the loop > themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1954: - Attachment: YARN-1954.3.patch > Add waitFor to AMRMClient(Async) > > > Key: YARN-1954 > URL: https://issues.apache.org/jira/browse/YARN-1954 > Project: Hadoop YARN > Issue Type: New Feature > Components: client >Affects Versions: 3.0.0, 2.4.0 >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch > > > Recently, I saw some use cases of AMRMClient(Async). The painful thing is > that the main non-daemon thread has to sit in a dummy loop to prevent AM > process exiting before all the tasks are done, while unregistration is > triggered on a separate another daemon thread by callback methods (in > particular when using AMRMClientAsync). IMHO, it should be beneficial to add > a waitFor method to AMRMClient(Async) to block the AM until unregistration or > user supplied check point, such that users don't need to write the loop > themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975610#comment-13975610 ] Tsuyoshi OZAWA commented on YARN-1964: -- Do you mean we will have DockerContainerExecutor instead of LinuxContainerExecutor? > Support Docker containers in YARN > - > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975599#comment-13975599 ] jay vyas commented on YARN-1964: This takes us away from the java idiom of packaging apps as jars Its a pretty bold step so let me play devils advocate, because I think it might not be the best idea. 1) Tying yarn's JVM NodeManagers to LCE's adds a new dependency to the YARN stack.adds new building/compiling/maintaining costs. 2) NodeManagers are being run in LinuxContainers quite commonly. This could lead to containers, launching NM's, which again launch containers.Seems kind of yucky dont you think? 3) It forces yarn to be aware of linux containers. This might lead to encouraging the creation of application code that doesnt run easily outside of containerized environment. As awesome as LCE's are, most java and bigdata apps running in YARN with the hadoop . > Support Docker containers in YARN > - > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1966) Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always
[ https://issues.apache.org/jira/browse/YARN-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-1966. -- Resolution: Duplicate This is a duplicate of YARN-1269 and related to YARN-1941 and YARN-1951. In any case I don't think we want to special-case the root queue, as the same issue could exist in a subtree where access to the subtree root allows access to any queue within the subtree. Actually I believe this is by design. It allows admins to configure access to an entire subtree of queues by giving access to the root of the subtree rather than having to add the access to each leaf queue. So for your example above you'll want to set the root queue's ACLs to be empty so that one must have access to the leaf queue in order to submit. > Capacity Scheduler acl_submit_applications in Leaf Queue finally considers > root queue default always > > > Key: YARN-1966 > URL: https://issues.apache.org/jira/browse/YARN-1966 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Sunil G > Attachments: Yarn-1966.1.patch > > > Given with below configurations, > > yarn.scheduler.capacity.root.queues > fast,medium > > > yarn.scheduler.capacity.root.fast.acl_submit_applications > hadoop > > > yarn.scheduler.capacity.root.slow.acl_submit_applications > hadoop > > In this case, the expectation is like "hadoop" user can only submit job to > "fast" or "slow" queue. > But now any user can submit job to these queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1967) onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called
Tsuyoshi OZAWA created YARN-1967: Summary: onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called Key: YARN-1967 URL: https://issues.apache.org/jira/browse/YARN-1967 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA When I taking on YARN-1954, I found that onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called. Should we fix it by calling onShutdownRequest() or adding new hook(onUnregisteredApplicationRequest) to CallbackHandler? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1657) Timeout occurs in TestNMClient
[ https://issues.apache.org/jira/browse/YARN-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved YARN-1657. - Resolution: Cannot Reproduce > Timeout occurs in TestNMClient > -- > > Key: YARN-1657 > URL: https://issues.apache.org/jira/browse/YARN-1657 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA > > A timeout occurs in TestNMClient when a patch is tested by Jenkins. > The following comment can be seen in YARN-1480, YARN-1611, and YARN-888. > {code} > {color:red}-1 core tests{color}. The following test timeouts occurred in > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: > org.apache.hadoop.yarn.client.api.impl.TestNMClient > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1657) Timeout occurs in TestNMClient
[ https://issues.apache.org/jira/browse/YARN-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975543#comment-13975543 ] Akira AJISAKA commented on YARN-1657: - Okay, closing this JIRA. I'll reopen this if the timeout occurs again. > Timeout occurs in TestNMClient > -- > > Key: YARN-1657 > URL: https://issues.apache.org/jira/browse/YARN-1657 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA > > A timeout occurs in TestNMClient when a patch is tested by Jenkins. > The following comment can be seen in YARN-1480, YARN-1611, and YARN-888. > {code} > {color:red}-1 core tests{color}. The following test timeouts occurred in > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: > org.apache.hadoop.yarn.client.api.impl.TestNMClient > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1657) Timeout occurs in TestNMClient
[ https://issues.apache.org/jira/browse/YARN-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975542#comment-13975542 ] Tsuyoshi OZAWA commented on YARN-1657: -- I cannot reproduce the problem too. [~ajisakaa], how about closing this JIRA and reopen it when we face the problem? > Timeout occurs in TestNMClient > -- > > Key: YARN-1657 > URL: https://issues.apache.org/jira/browse/YARN-1657 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA > > A timeout occurs in TestNMClient when a patch is tested by Jenkins. > The following comment can be seen in YARN-1480, YARN-1611, and YARN-888. > {code} > {color:red}-1 core tests{color}. The following test timeouts occurred in > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: > org.apache.hadoop.yarn.client.api.impl.TestNMClient > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975541#comment-13975541 ] Tsuyoshi OZAWA commented on YARN-556: - [~adhoot], I glanced over your patch. 1. Can you split your code into each subtasks? Your patch includes overall changes of this task. We should discuss small points on each subtask JIRA. 2. IMO, prototype is enough to validate the design. Do you have any additional comments about design docs? I'd like to include this feature in 2.5.0(maybe May - June?), so let's work togather :-) > RM Restart phase 2 - Work preserving restart > > > Key: YARN-556 > URL: https://issues.apache.org/jira/browse/YARN-556 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: Work Preserving RM Restart.pdf, > WorkPreservingRestartPrototype.001.patch > > > YARN-128 covered storing the state needed for the RM to recover critical > information. This umbrella jira will track changes needed to recover the > running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1966) Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always
[ https://issues.apache.org/jira/browse/YARN-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1966: -- Attachment: Yarn-1966.1.patch Here issue is happening because while submitting an application, hasAccess() check will always reach ParentQueue::hasAccess(). In this case, finally parent will come as "root" and it will pass this check. (* is default for acl_submit_applications and acl_administer_queue in root queue) So to ensure only one specified user to submit job in a leaf queue, below configurations are mandatory in "root" root.acl_submit_applications root.acl_administer_queue To submit a job, acl_administer_queue check has no relevance. But we are forced to configure this also, if we want to achieve what is mentioned in the problem statement of this issue. Also if each leaf queue wants to have its own set of users, all users finally are to be mentioned in root. This is not good. So it is better to skip hasAccess() check if parent Queue is "root" as below if(rootQueue){ return false; } > Capacity Scheduler acl_submit_applications in Leaf Queue finally considers > root queue default always > > > Key: YARN-1966 > URL: https://issues.apache.org/jira/browse/YARN-1966 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Sunil G > Attachments: Yarn-1966.1.patch > > > Given with below configurations, > > yarn.scheduler.capacity.root.queues > fast,medium > > > yarn.scheduler.capacity.root.fast.acl_submit_applications > hadoop > > > yarn.scheduler.capacity.root.slow.acl_submit_applications > hadoop > > In this case, the expectation is like "hadoop" user can only submit job to > "fast" or "slow" queue. > But now any user can submit job to these queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1966) Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always
Sunil G created YARN-1966: - Summary: Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always Key: YARN-1966 URL: https://issues.apache.org/jira/browse/YARN-1966 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Given with below configurations, yarn.scheduler.capacity.root.queues fast,medium yarn.scheduler.capacity.root.fast.acl_submit_applications hadoop yarn.scheduler.capacity.root.slow.acl_submit_applications hadoop In this case, the expectation is like "hadoop" user can only submit job to "fast" or "slow" queue. But now any user can submit job to these queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975504#comment-13975504 ] Hadoop QA commented on YARN-1962: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641040/YARN-1962.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3600//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3600//console This message is automatically generated. > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) >
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975499#comment-13975499 ] Tsuyoshi OZAWA commented on YARN-1879: -- [~xgong] and [~jianhe], do you have additional comments or opinions? Let me know if you have unclear points. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-313: - Attachment: YARN-313-sample.patch Hi, I implemented refreshResources option as a prototype. I know this ticket is already assigned. My aim is to bring forward, not to interfere. Please refer to this patch if you have interest. Thanks. - This patch contains RefreshResourcesRequest and RefreshResourcesResponse to call updateNodeResource from client indirectly. - DynamicResourceConfiguration class and dynamic-resources.xml are introduced to configure resource options. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-313-sample.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-996) REST API support for node resource configuration
[ https://issues.apache.org/jira/browse/YARN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-996: - Attachment: YARN-996-2.patch Updated for YARN-1949. (Added rm.start() in tests.) > REST API support for node resource configuration > > > Key: YARN-996 > URL: https://issues.apache.org/jira/browse/YARN-996 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, scheduler >Reporter: Junping Du >Assignee: Kenji Kikushima > Attachments: YARN-996-2.patch, YARN-996-sample.patch > > > Besides admin protocol and CLI, REST API should also be supported for node > resource configuration -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated YARN-1962: Attachment: YARN-1962.1.patch Thanks [~vinodkv]. Patch added > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: YARN-1962.1.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) >