[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975428#comment-13975428 ] Hadoop QA commented on YARN-1917: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641019/YARN-1917.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3599//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3599//console This message is automatically generated. Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.patch, YARN-1917.patch, YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated YARN-1962: Attachment: YARN-1962.1.patch Thanks [~vinodkv]. Patch added Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975499#comment-13975499 ] Tsuyoshi OZAWA commented on YARN-1879: -- [~xgong] and [~jianhe], do you have additional comments or opinions? Let me know if you have unclear points. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch, YARN-1879.1.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975504#comment-13975504 ] Hadoop QA commented on YARN-1962: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641040/YARN-1962.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3600//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3600//console This message is automatically generated. Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at
[jira] [Updated] (YARN-1966) Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always
[ https://issues.apache.org/jira/browse/YARN-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-1966: -- Attachment: Yarn-1966.1.patch Here issue is happening because while submitting an application, hasAccess() check will always reach ParentQueue::hasAccess(). In this case, finally parent will come as root and it will pass this check. (* is default for acl_submit_applications and acl_administer_queue in root queue) So to ensure only one specified user to submit job in a leaf queue, below configurations are mandatory in root root.acl_submit_applications root.acl_administer_queue To submit a job, acl_administer_queue check has no relevance. But we are forced to configure this also, if we want to achieve what is mentioned in the problem statement of this issue. Also if each leaf queue wants to have its own set of users, all users finally are to be mentioned in root. This is not good. So it is better to skip hasAccess() check if parent Queue is root as below if(rootQueue){ return false; } Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always Key: YARN-1966 URL: https://issues.apache.org/jira/browse/YARN-1966 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Attachments: Yarn-1966.1.patch Given with below configurations, property nameyarn.scheduler.capacity.root.queues/name valuefast,medium/value /property property nameyarn.scheduler.capacity.root.fast.acl_submit_applications/name valuehadoop/value /property property nameyarn.scheduler.capacity.root.slow.acl_submit_applications/name valuehadoop/value /property In this case, the expectation is like hadoop user can only submit job to fast or slow queue. But now any user can submit job to these queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975541#comment-13975541 ] Tsuyoshi OZAWA commented on YARN-556: - [~adhoot], I glanced over your patch. 1. Can you split your code into each subtasks? Your patch includes overall changes of this task. We should discuss small points on each subtask JIRA. 2. IMO, prototype is enough to validate the design. Do you have any additional comments about design docs? I'd like to include this feature in 2.5.0(maybe May - June?), so let's work togather :-) RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1657) Timeout occurs in TestNMClient
[ https://issues.apache.org/jira/browse/YARN-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved YARN-1657. - Resolution: Cannot Reproduce Timeout occurs in TestNMClient -- Key: YARN-1657 URL: https://issues.apache.org/jira/browse/YARN-1657 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0 Reporter: Akira AJISAKA A timeout occurs in TestNMClient when a patch is tested by Jenkins. The following comment can be seen in YARN-1480, YARN-1611, and YARN-888. {code} {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1967) onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called
Tsuyoshi OZAWA created YARN-1967: Summary: onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called Key: YARN-1967 URL: https://issues.apache.org/jira/browse/YARN-1967 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA When I taking on YARN-1954, I found that onShutdownRequest() is not called when AMRMClientAsyncImpl#unregisterApplicationMaster() is called. Should we fix it by calling onShutdownRequest() or adding new hook(onUnregisteredApplicationRequest) to CallbackHandler? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1966) Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always
[ https://issues.apache.org/jira/browse/YARN-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-1966. -- Resolution: Duplicate This is a duplicate of YARN-1269 and related to YARN-1941 and YARN-1951. In any case I don't think we want to special-case the root queue, as the same issue could exist in a subtree where access to the subtree root allows access to any queue within the subtree. Actually I believe this is by design. It allows admins to configure access to an entire subtree of queues by giving access to the root of the subtree rather than having to add the access to each leaf queue. So for your example above you'll want to set the root queue's ACLs to be empty so that one must have access to the leaf queue in order to submit. Capacity Scheduler acl_submit_applications in Leaf Queue finally considers root queue default always Key: YARN-1966 URL: https://issues.apache.org/jira/browse/YARN-1966 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Attachments: Yarn-1966.1.patch Given with below configurations, property nameyarn.scheduler.capacity.root.queues/name valuefast,medium/value /property property nameyarn.scheduler.capacity.root.fast.acl_submit_applications/name valuehadoop/value /property property nameyarn.scheduler.capacity.root.slow.acl_submit_applications/name valuehadoop/value /property In this case, the expectation is like hadoop user can only submit job to fast or slow queue. But now any user can submit job to these queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975599#comment-13975599 ] jay vyas commented on YARN-1964: This takes us away from the java idiom of packaging apps as jars Its a pretty bold step so let me play devils advocate, because I think it might not be the best idea. 1) Tying yarn's JVM NodeManagers to LCE's adds a new dependency to the YARN stack.adds new building/compiling/maintaining costs. 2) NodeManagers are being run in LinuxContainers quite commonly. This could lead to containers, launching NM's, which again launch containers.Seems kind of yucky dont you think? 3) It forces yarn to be aware of linux containers. This might lead to encouraging the creation of application code that doesnt run easily outside of containerized environment. As awesome as LCE's are, most java and bigdata apps running in YARN with the hadoop . Support Docker containers in YARN - Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Support Docker containers in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975610#comment-13975610 ] Tsuyoshi OZAWA commented on YARN-1964: -- Do you mean we will have DockerContainerExecutor instead of LinuxContainerExecutor? Support Docker containers in YARN - Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975687#comment-13975687 ] Hadoop QA commented on YARN-1954: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641071/YARN-1954.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3601//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3601//console This message is automatically generated. Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch, YARN-1954.2.patch, YARN-1954.3.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1968) YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations.
Junping Du created YARN-1968: Summary: YARN Admin service should have more fine-grained ACL which is based on mapping of users with methods/operations. Key: YARN-1968 URL: https://issues.apache.org/jira/browse/YARN-1968 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du AdminService's operation today have different dimensions of management, some is on user management while other is on cluster management. Today, we only check if user belongs to some authorized group to see if he can execute operations in admin service. The result is he can either execute all operations or none which is a simple strategy but not very precisely. We may need more fine-grained ACLs which can authorized user with partial operations in AdminService. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1969) Earliest Deadline First Scheduling in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1969: - Summary: Earliest Deadline First Scheduling in the Fair Scheduler (was: Earliest Deadline First Scheduling) Earliest Deadline First Scheduling in the Fair Scheduler Key: YARN-1969 URL: https://issues.apache.org/jira/browse/YARN-1969 Project: Hadoop YARN Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh What we are observing is that some big jobs with many allocated containers are waiting for a few containers to finish. Under *fair-share scheduling* however they have a low priority since there are other jobs (usually much smaller, new comers) that are using resources way below their fair share, hence new released containers are not offered to the big, yet close-to-be-finished job. Nevertheless, everybody would benefit from an unfair scheduling that offers the resource to the big job since the sooner the big job finishes, the sooner it releases its many allocated resources to be used by other jobs.In other words, what we require is a kind of variation of *Earliest Deadline First scheduling*, that takes into account the number of already-allocated resources and estimated time to finish. http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling For example, if a job is using MEM GB of memory and is expected to finish in TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). The expected time to finish can be estimated by the AppMaster using TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource request messages. To be less susceptible to the issue of apps gaming the system, we can have this scheduling limited to *only within a queue*: i.e., adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues to use it by setting the schedulingPolicy field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975885#comment-13975885 ] Mohammad Kamrul Islam commented on YARN-1962: - [~zjshen] Thanks for the feedback. I will upload a new patch. Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976010#comment-13976010 ] Junping Du commented on YARN-313: - Thanks [~kj-ki] for contributing a sample patch here. Although I think some code in sample patch is duplicated with YARN-312 that we already have (the proto staff on refreshResource), I will check if some code here can be integrated with my patch after YARN-1506 is figured out. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-313-sample.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976056#comment-13976056 ] Jian He commented on YARN-1506: --- some comments on the patch, mostly cosmetic changes: - Since it’s changed to asynchronous, we may change the log to not say *successfully*. {code} LOG.info(Update resource successfully on node( + node.getNodeID() +) with resource( + newResourceOption.toString() + )); {code} - Log inside UpdateNodeResourceWhenNonRunningTransition, good for debugging as this should be an unusual case. UpdateNodeResourceWhenNonRunningTransition - UpdateNodeResourceWhenNotRunningTransition ? - IMO, since UpdateNodeResourceWhenUnusableTransition and UpdateNodeResourceWhenNonRunningTransition are the same except one extra logging, we can do the logging for both and just keep one transition? - if possible, nodeResourceUpdate method can be moved into AbstractYarnScheduler, a new common base class for sharing common code among all the schedulers. - SchedulerNode.setTotalResource - SchedulerNode.updateTotalAndAvailableResource() ? - UpdateNodeResourceResponse should be an abstract class which implements newInstance() method. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976061#comment-13976061 ] Jian He commented on YARN-1506: --- AdminService.updateNodeResource should RMAuditLogger to log the operations as well. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated YARN-1962: Attachment: YARN-1962.2.patch Patch with review comments. Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976094#comment-13976094 ] Mohammad Kamrul Islam commented on YARN-1962: - Testing done: 1. Tested in 2.4.0 cluster of 100 nodes with [~tthompso] 2. Ran the relevant unit test including the new one. Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976127#comment-13976127 ] Hadoop QA commented on YARN-1962: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641129/YARN-1962.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3603//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3603//console This message is automatically generated. Timeline server is enabled by default - Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1962.1.patch, YARN-1962.2.patch Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at
[jira] [Commented] (YARN-1970) Prepare YARN codebase for JUnit 4.11.
[ https://issues.apache.org/jira/browse/YARN-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976205#comment-13976205 ] Hudson commented on YARN-1970: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5547 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5547/]) YARN-1970. Prepare YARN codebase for JUnit 4.11. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1589001) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/utils/TestSLSUtils.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/web/TestSLSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java Prepare YARN codebase for JUnit 4.11. - Key: YARN-1970 URL: https://issues.apache.org/jira/browse/YARN-1970 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: YARN-1970.1.patch HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the YARN code needs some minor updates to fix deprecation warnings and test isolation problems before the upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1969) Fair Scheduler: Add policy for Earliest Deadline First
[ https://issues.apache.org/jira/browse/YARN-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1969: --- Summary: Fair Scheduler: Add policy for Earliest Deadline First (was: Earliest Deadline First Scheduling in the Fair Scheduler) Fair Scheduler: Add policy for Earliest Deadline First -- Key: YARN-1969 URL: https://issues.apache.org/jira/browse/YARN-1969 Project: Hadoop YARN Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh What we are observing is that some big jobs with many allocated containers are waiting for a few containers to finish. Under *fair-share scheduling* however they have a low priority since there are other jobs (usually much smaller, new comers) that are using resources way below their fair share, hence new released containers are not offered to the big, yet close-to-be-finished job. Nevertheless, everybody would benefit from an unfair scheduling that offers the resource to the big job since the sooner the big job finishes, the sooner it releases its many allocated resources to be used by other jobs.In other words, what we require is a kind of variation of *Earliest Deadline First scheduling*, that takes into account the number of already-allocated resources and estimated time to finish. http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling For example, if a job is using MEM GB of memory and is expected to finish in TIME minutes, the priority in scheduling would be a function p of (MEM, TIME). The expected time to finish can be estimated by the AppMaster using TaskRuntimeEstimator#estimatedRuntime and be supplied to RM in the resource request messages. To be less susceptible to the issue of apps gaming the system, we can have this scheduling limited to *only within a queue*: i.e., adding a EarliestDeadlinePolicy extends SchedulingPolicy and let the queues to use it by setting the schedulingPolicy field. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1897: Attachment: YARN-1897.1.patch Define SignalContainerRequest and SignalContainerResponse - Key: YARN-1897 URL: https://issues.apache.org/jira/browse/YARN-1897 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Ming Ma Attachments: YARN-1897.1.patch We need to define SignalContainerRequest and SignalContainerResponse first as they are needed by other sub tasks. SignalContainerRequest should use OS-independent commands and provide a way to application to specify reason for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976268#comment-13976268 ] Xuan Gong commented on YARN-1897: - [~mingma] I uploaded an initial patch for this. Please take a look and feel free to do any editions, renaming, etc. Define SignalContainerRequest and SignalContainerResponse - Key: YARN-1897 URL: https://issues.apache.org/jira/browse/YARN-1897 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Ming Ma Attachments: YARN-1897.1.patch We need to define SignalContainerRequest and SignalContainerResponse first as they are needed by other sub tasks. SignalContainerRequest should use OS-independent commands and provide a way to application to specify reason for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976278#comment-13976278 ] Hadoop QA commented on YARN-1796: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633282/YARN-1796.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3604//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3604//console This message is automatically generated. container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976282#comment-13976282 ] Sandy Ryza commented on YARN-1796: -- +1 container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976310#comment-13976310 ] Ming Ma commented on YARN-1897: --- Thanks, Xuan. I will merge this one with the version I have and provide an update shortly. BTW, why does SignalContainerResponse needs to provide diagnosis string, to explain why the request can't be processed? Define SignalContainerRequest and SignalContainerResponse - Key: YARN-1897 URL: https://issues.apache.org/jira/browse/YARN-1897 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Ming Ma Attachments: YARN-1897.1.patch We need to define SignalContainerRequest and SignalContainerResponse first as they are needed by other sub tasks. SignalContainerRequest should use OS-independent commands and provide a way to application to specify reason for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1897) Define SignalContainerRequest and SignalContainerResponse
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976408#comment-13976408 ] Hadoop QA commented on YARN-1897: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641175/YARN-1897-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/3605//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3605//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3605//console This message is automatically generated. Define SignalContainerRequest and SignalContainerResponse - Key: YARN-1897 URL: https://issues.apache.org/jira/browse/YARN-1897 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Ming Ma Attachments: YARN-1897-2.patch, YARN-1897.1.patch We need to define SignalContainerRequest and SignalContainerResponse first as they are needed by other sub tasks. SignalContainerRequest should use OS-independent commands and provide a way to application to specify reason for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.2#6252)