[jira] [Comment Edited] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472837#comment-15472837
 ] 

Varun Saxena edited comment on YARN-5585 at 9/8/16 5:29 AM:


IIUC, Tez DAG ID is a combination of YARN App ID and DAG sequence ID.
Isnt this DAG sequence ID monotonically increasing and assigned to DAGs' as 
they are run and assigned to them in sequence ?
I was assuming they were. That is why I suggested storing DAG ID as 16 bytes (8 
bytes of inverted cluster timsetamp from app id +  4 bytes of inverted seq id 
from app id + 4 bytes of inverted DAG seq number). Padding in this case wont be 
required.

Anyways other solutions have been proposed and we can come back to this only if 
necessary.
Or maybe we can have both above solution and below one as well.


was (Author: varun_saxena):
IIUC, Tez DAG ID is a combination of YARN App ID and DAG sequence ID.
Isnt this DAG sequence ID monotonically increasing and assigned to DAGs' as 
they are run in sequence ?
I was assuming they were. That is why I suggested storing DAG ID as 16 bytes (8 
bytes of inverted cluster timsetamp from app id +  4 bytes of inverted seq id 
from app id + 4 bytes of inverted DAG seq number). Padding in this case wont be 
required.

Anyways other solutions have been proposed and we can come back to this only if 
necessary.
Or maybe we can have both above solution and below one as well.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472837#comment-15472837
 ] 

Varun Saxena commented on YARN-5585:


IIUC, Tez DAG ID is a combination of YARN App ID and DAG sequence ID.
Isnt this DAG sequence ID monotonically increasing and assigned to DAGs' as 
they are run in sequence ?
I was assuming they were. That is why I suggested storing DAG ID as 16 bytes (8 
bytes of inverted cluster timsetamp from app id +  4 bytes of inverted seq id 
from app id + 4 bytes of inverted DAG seq number). Padding in this case wont be 
required.

Anyways other solutions have been proposed and we can come back to this only if 
necessary.
Or maybe we can have both above solution and below one as well.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5621) Support LinuxContainerExecutor to create symlinks

2016-09-07 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472814#comment-15472814
 ] 

Varun Vasudev edited comment on YARN-5621 at 9/8/16 5:17 AM:
-

Thanks for the patch [~jianhe].

1)
{code}
+  File srcFile =
+  writeScriptToNMPrivateDir(nmPrivateDir, scriptBuilder.toString());
+  String dstFile =
+  container.getWorkDir() + Path.SEPARATOR + srcFile.getName();
{code}

Rename srcFile to privateScriptFile and dstFile to userScriptFile

2)
{code}
+} catch (IOException e) {
+  LOG.error("Error when creating symlink for  " + 
container.getContainerId()
+  + ": " + src + " -> " + symlink, e);
+}
{code}
We should continue the exception or at least surface that the resource 
localization has failed?

3)
{code}
+  LOG.info("Copy " + srcFile + " to " + dstFile);
{code}
Debug instead of info? We already log successful symlink creation in 
ContainerImpl.java

4)
{code}
+} catch (PrivilegedOperationException e) {
+  int exitCode = e.getExitCode();
+  LOG.error("Error when running script [" + dstFile + "], exitcode = "
+  + exitCode + ", output: " + e.getOutput(), e);
+}
{code}
Like (2) above, we should surface the error to the AM somehow?

5)
{code}
-  if (dst.isAbsolute()) {
-throw new IOException("Destination must be relative");
-  }
{code}
Can you explain why we need to remove this check? Can’t we just pass the 
absolute path of linkFile?

6)
{code}
+// give up root privs
+if (change_user(user_detail->pw_uid, user_detail->pw_gid) != 0) {
+unlink(src_script_file);
+return -1;
+}
+
{code}
Instead of -1 we should return an error code saying the change user failed.

7)
{code}
+unlink(src_script_file);
+unlink(dst_script_file);
+return 0;
{code}
This code will never be executed - excel doesn’t return unless there’s an error 
- you’ll need to use fork/exec or clean up in the NodeManager itself


was (Author: vvasudev):
1)
{code}
+  File srcFile =
+  writeScriptToNMPrivateDir(nmPrivateDir, scriptBuilder.toString());
+  String dstFile =
+  container.getWorkDir() + Path.SEPARATOR + srcFile.getName();
{code}

Rename srcFile to privateScriptFile and dstFile to userScriptFile

2)
{code}
+} catch (IOException e) {
+  LOG.error("Error when creating symlink for  " + 
container.getContainerId()
+  + ": " + src + " -> " + symlink, e);
+}
{code}
We should continue the exception or at least surface that the resource 
localization has failed?

3)
{code}
+  LOG.info("Copy " + srcFile + " to " + dstFile);
{code}
Debug instead of info? We already log successful symlink creation in 
ContainerImpl.java

4)
{code}
+} catch (PrivilegedOperationException e) {
+  int exitCode = e.getExitCode();
+  LOG.error("Error when running script [" + dstFile + "], exitcode = "
+  + exitCode + ", output: " + e.getOutput(), e);
+}
{code}
Like (2) above, we should surface the error to the AM somehow?

5)
{code}
-  if (dst.isAbsolute()) {
-throw new IOException("Destination must be relative");
-  }
{code}
Can you explain why we need to remove this check? Can’t we just pass the 
absolute path of linkFile?

6)
{code}
+// give up root privs
+if (change_user(user_detail->pw_uid, user_detail->pw_gid) != 0) {
+unlink(src_script_file);
+return -1;
+}
+
{code}
Instead of -1 we should return an error code saying the change user failed.

7)
{code}
+unlink(src_script_file);
+unlink(dst_script_file);
+return 0;
{code}
This code will never be executed - excel doesn’t return unless there’s an error 
- you’ll need to use fork/exec or clean up in the NodeManager itself

> Support LinuxContainerExecutor to create symlinks
> -
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks

2016-09-07 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472814#comment-15472814
 ] 

Varun Vasudev commented on YARN-5621:
-

1)
{code}
+  File srcFile =
+  writeScriptToNMPrivateDir(nmPrivateDir, scriptBuilder.toString());
+  String dstFile =
+  container.getWorkDir() + Path.SEPARATOR + srcFile.getName();
{code}

Rename srcFile to privateScriptFile and dstFile to userScriptFile

2)
{code}
+} catch (IOException e) {
+  LOG.error("Error when creating symlink for  " + 
container.getContainerId()
+  + ": " + src + " -> " + symlink, e);
+}
{code}
We should continue the exception or at least surface that the resource 
localization has failed?

3)
{code}
+  LOG.info("Copy " + srcFile + " to " + dstFile);
{code}
Debug instead of info? We already log successful symlink creation in 
ContainerImpl.java

4)
{code}
+} catch (PrivilegedOperationException e) {
+  int exitCode = e.getExitCode();
+  LOG.error("Error when running script [" + dstFile + "], exitcode = "
+  + exitCode + ", output: " + e.getOutput(), e);
+}
{code}
Like (2) above, we should surface the error to the AM somehow?

5)
{code}
-  if (dst.isAbsolute()) {
-throw new IOException("Destination must be relative");
-  }
{code}
Can you explain why we need to remove this check? Can’t we just pass the 
absolute path of linkFile?

6)
{code}
+// give up root privs
+if (change_user(user_detail->pw_uid, user_detail->pw_gid) != 0) {
+unlink(src_script_file);
+return -1;
+}
+
{code}
Instead of -1 we should return an error code saying the change user failed.

7)
{code}
+unlink(src_script_file);
+unlink(dst_script_file);
+return 0;
{code}
This code will never be executed - excel doesn’t return unless there’s an error 
- you’ll need to use fork/exec or clean up in the NodeManager itself

> Support LinuxContainerExecutor to create symlinks
> -
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472777#comment-15472777
 ] 

Jian He edited comment on YARN-5620 at 9/8/16 5:06 AM:
---

[~asuresh], thanks for the explanation
bq. Only the AM knows if the upgrade is actually successful.
How does AM determine whether the upgrade is successful (like what kind signal 
should AM depend on)? I feel once the container starts running, even for AM, 
it's hard to distinguish whether the failure is caused by upgrade or runtime.  
IMO, if container fails to launch on upgrade, it should be considered as 
upgrade failure. Once the container starts running, if the container fails, it 
can be considered as runtime failure. If user does want to rollback, user call 
the upgardeContainer/rollback command again to roll back. 
bq.  But, in my opinion rollback should not be provided with an explicit 
launchContext, it should always be the just previous context.
I also agree AM can take care of tying the context with version. In our case, 
the slider AM (also Yarn code) will have the prior context and call the 
upgardeContainer with the corresponding context, and so NM does not need to 
remember prior context.

I think for upgrade itself, it is enough work for a single jira with enough 
corner cases and we have consensus on that. Could you separate the patch to 
include only the upgrade piece in this jira ? that also makes review easier..




was (Author: jianhe):
[~asuresh], thanks for the explanation
bq. Only the AM knows if the upgrade is actually successful.
How does AM determine whether the upgrade is successful (like what kind signal 
should AM depend on)? I feel once the container starts running, even for AM, 
it's hard to distinguish whether the failure is caused by upgrade or runtime.  
IMO, if container fails to launch on upgrade, it should be considered as 
upgrade failure. Once the container starts running, if the container fails, it 
can be considered as runtime failure. If user does want to rollback, user call 
the upgardeContainer/rollback command again to roll back. 
bq.  But, in my opinion rollback should not be provided with an explicit 
launchContext, it should always be the just previous context.
I also agree AM can take care of tying the context with version. In our case, 
the slider AM (also Yarn code) will have the prior context and call the 
upgardeContainer with the corresponding context, and so NM does not need to 
remember prior context.

I think for upgrade itself, it is enough work for a single jira with enough 
corner cases. Could you separate the patch to include only the upgrade piece in 
this jira ? that also makes review easier..



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472777#comment-15472777
 ] 

Jian He commented on YARN-5620:
---

[~asuresh], thanks for the explanation
bq. Only the AM knows if the upgrade is actually successful.
How does AM determine whether the upgrade is successful (like what kind signal 
should AM depend on)? I feel once the container starts running, even for AM, 
it's hard to distinguish whether the failure is caused by upgrade or runtime.  
IMO, if container fails to launch on upgrade, it should be considered as 
upgrade failure. Once the container starts running, if the container fails, it 
can be considered as runtime failure. If user does want to rollback, user call 
the upgardeContainer/rollback command again to roll back. 
bq.  But, in my opinion rollback should not be provided with an explicit 
launchContext, it should always be the just previous context.
I also agree AM can take care of tying the context with version. In our case, 
the slider AM (also Yarn code) will have the prior context and call the 
upgardeContainer with the corresponding context, and so NM does not need to 
remember prior context.

I think for upgrade itself, it is enough work for a single jira with enough 
corner cases. Could you separate the patch to include only the upgrade piece in 
this jira ? that also makes review easier..



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472737#comment-15472737
 ] 

Wangda Tan commented on YARN-4091:
--

Failed tests are not related, will commit it next Monday if no opposite 
opinions.

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Fix For: 3.0.0-alpha2
>
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-branch-2.001.patch, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, 
> YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch, 
> YARN-4091.6.patch, YARN-4091.7.patch, YARN-4091.8.patch, 
> YARN-4091.preliminary.1.patch, app_activities v2.json, app_activities.json, 
> node_activities v2.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472732#comment-15472732
 ] 

Vinod Kumar Vavilapalli commented on YARN-4205:
---

bq. And I believe monitoring interval should be configurable at RM level. 
Okay, that makes sense. As long as we have a reasonable default that no-one 
touches in practice (like the AM-liveliness configs), we are good.

Also yarn.app.lifetime.monitor.interval-sec -> 
yarn.resourcemanager.app-timeouts-monitor.interval-sec ?

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> YARN-4205_01.patch, YARN-4205_02.patch, YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5626) Support long running apps handling multiple flows

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472701#comment-15472701
 ] 

Varun Saxena commented on YARN-5626:


As discussed in call we can still serve this use case by allowing flow context 
information to be passed in TimelineEntity and take it if supplied to make 
writes to different tables.

But we also need to devise a mechanism to distribute writes to different 
collectors for such a use case.

This JIRA has been opened with the intention to discuss more on this.

> Support long running apps handling multiple flows
> -
>
> Key: YARN-5626
> URL: https://issues.apache.org/jira/browse/YARN-5626
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Many applications which can potentially use ATS have one or a few long 
> running AMs' which handle multiple tasks or serve multiple queries. As ATS 
> scopes everything within an app, its not possible for us to differentiate 
> different flows.
> Moreover, all entities will be written to one or very few node collectors as 
> writers are distributed based on app



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5626) Support long running apps handling multiple flows

2016-09-07 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-5626:
--

Assignee: Varun Saxena

> Support long running apps handling multiple flows
> -
>
> Key: YARN-5626
> URL: https://issues.apache.org/jira/browse/YARN-5626
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Many applications which can potentially use ATS have one or a few long 
> running AMs' which handle multiple tasks or serve multiple queries. As ATS 
> scopes everything within an app, its not possible for us to differentiate 
> different flows.
> Moreover, all entities will be written to one or very few node collectors as 
> writers are distributed based on app



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5626) Support long running apps handling multiple flows

2016-09-07 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-5626:
--

 Summary: Support long running apps handling multiple flows
 Key: YARN-5626
 URL: https://issues.apache.org/jira/browse/YARN-5626
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Saxena


Many applications which can potentially use ATS have one or a few long running 
AMs' which handle multiple tasks or serve multiple queries. As ATS scopes 
everything within an app, its not possible for us to differentiate different 
flows.
Moreover, all entities will be written to one or very few node collectors as 
writers are distributed based on app



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472689#comment-15472689
 ] 

Varun Saxena commented on YARN-5585:


Also [~rohithsharma] if its feasible, kindly consolidate all the use cases of 
Tez (from ATS perspective) and send out a mail to ATS team so that we can have 
further discussion on it with everyone in the team.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472676#comment-15472676
 ] 

Varun Saxena commented on YARN-5585:


Another option would be to change the row key of entity table to 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}} and have another table to map 
{{cluster!user!flow!flowrun!app!entitytype!entityid}} to entity created time.
So for a single entity call (HBase Get) we will have to first peek into the new 
table and then get records from entity table.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472674#comment-15472674
 ] 

Hadoop QA commented on YARN-4091:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 39s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
13s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 28s 
{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 79 
new + 368 unchanged - 0 fixed = 447 total (was 368) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 29s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_101. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 38m 17s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed 
with JDK v1.8.0_101. {color} |
| 

[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472668#comment-15472668
 ] 

Hadoop QA commented on YARN-4091:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
57s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 27s 
{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 79 
new + 368 unchanged - 0 fixed = 447 total (was 368) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 12s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_101. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 38m 32s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed 
with JDK v1.8.0_101. {color} |
| 

[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472669#comment-15472669
 ] 

Varun Saxena commented on YARN-5585:


Another solution which comes to mind is that we keep another table, say 
EntityCreationTable with row key 
{{cluster!user!flow!flowrun!app!entitytype!reverse entity creation 
time!entityid}}. So we will make an entry into this table  whenever created 
time is reported for the entity. The real data would still reside in the main 
entity table. Entities in this table will be sorted descendingly 

And as the goal is to achieve pagination, we can introduce something like 
fromCreatedTime query param.
The pagination use case will be to get chunks of data. Let us say we want first 
10 records. In this case, we will send a query with limit of 10 and no 
fromCreatedTime query param.
So when a query arrives and fromCreatedTime is not there, we start reading from 
this table with start row as {{cluster!user!flow!flowrun!app!entitytype!}} upto 
number of records specified by {{limit}} query param.  We can break as soon as 
10 records are found and need not parse through all rows as is done right now 
for entity table.

Now if what we want is to return only the default view of the entity i.e. 
entity id, type and created time we can return a result set straight away. 
Otherwise, to get more detailed data, we need to get hold of first entity and 
last entity retrieved from EntityCreationTable and make a scan to EntityTable 
with Single Column Value filter with a created time range (the code is already 
there for this). 

This would still require full scan within the scope of entity type but most 
results will be removed by HBase at server end itself because of created time 
range filter. Which approach will be better. Directly dipping into Entity Table 
or querying 2 tables depends entirely on how many records we have in entity 
table within the scope of that entity type.

Now once, client gets a first 10 records, it can make next query to get record 
11-20 by populating fromCreatedTime with created time of 10th record. Next scan 
in EntityCreationTable can be made on the basis of that. fromId must also be 
used in conjunction with fromCreatedTime though.

For this solution client must not report duplicate created time multiple times.

Also not a 100% sure but a coprocessor can be used for this extra call ? So 
that client is not involved.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472636#comment-15472636
 ] 

Li Lu commented on YARN-5585:
-

bq. DAG ID seems to be generated same way.
Unfortunately this is not true for Tez... There is no proper padding for the 
DAG number, so we cannot do the pagination by the entity ID itself... 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472526#comment-15472526
 ] 

Varun Saxena commented on YARN-5585:


bq. Just realized that a normal converter will not address the use case where 
users really want entities sorted by their creation time
Yes, this is just for use cases where entity IDs' are structured in a manner 
where there is direct correlation between entity ID and being sorted by 
creation time. DAG ID seems to be generated same way. And seems to be the case 
in Spark too. If row keys are sorted that will be the best solution from a 
performance perspective. However, there is that disadvantage of putting the 
burden of it on the application to write some extra code and make sure that 
JARs' are placed during deployment. That is why I asked if this will be 
acceptable to Tez.
On further thought though, we can also break ATS related behavior for the 
application if they do choose to change their IDs' in a manner where its no 
longer sorted in future, however unlikely that may be. 

Let me think of something else then.

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472384#comment-15472384
 ] 

Allen Wittenauer commented on YARN-5567:


-1  Please revert this change.

The exit code getting ignored is *intentional*.  We don't want to bring the 
nodemanager down in case the script has a syntax error in it.  Such a condition 
would bring down *entire clusters* at once, instantaneously.



> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-07 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened YARN-5567:


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472330#comment-15472330
 ] 

Li Lu commented on YARN-5561:
-

Sequence 2 in this list seems to be replacing the majority part of the old AHS 
APIs, except for container logs. We may use them as a starting point to support 
AHS-like use cases in timeline v2. 

> [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and 
> entities via REST
> ---
>
> Key: YARN-5561
> URL: https://issues.apache.org/jira/browse/YARN-5561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5561.patch, YARN-5561.v0.patch
>
>
> ATSv2 model lacks retrieval of {{list-of-all-apps}}, 
> {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via 
> REST API's. And also it is required to know about all the entities in an 
> applications.
> It is pretty much highly required these URLs for Web  UI.
> New REST URL would be 
> # GET {{/ws/v2/timeline/apps}}
> # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}.
> # GET 
> {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}}
> # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of 
> entities that can be queried.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5561) [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and entities via REST

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472318#comment-15472318
 ] 

Li Lu commented on YARN-5561:
-

OK I revisited the patch and our reader REST APIs. Right now we have 26 (!) 
APIs for reader server. I tried to organize them in a way easier to understand:
{code}
0. cluster activity (2)
  - /flows/
  - /clusters/{clusterid}/flows/

1. (cluster - )user - flow - run - app - entity_type - entity_id sequence
  
  1.1 hierarchical (12)
  
- 
/clusters/{clusterid}/users/{userid}/flows/{flowname}/runs/{flowrunid}/apps/
- /clusters/{clusterid}/users/{userid}/flows/{flowname}/runs/{flowrunid}/
- /clusters/{clusterid}/users/{userid}/flows/{flowname}/runs/
- /users/{userid}/flows/{flowname}/runs/{flowrunid}/apps/
- /users/{userid}/flows/{flowname}/runs/{flowrunid}/
- /users/{userid}/flows/{flowname}/runs/

(user, flow, run information omitted since an application is unique in a 
cluster)
- /clusters/{clusterid}/apps/{appid}/entities/{entitytype}/{entityid}/
- /clusters/{clusterid}/apps/{appid}/entities/{entitytype}
- /apps/{appid}/entities/{entitytype}/{entityid}/
- /apps/{appid}/entities/{entitytype}
  
- /clusters/{clusterid}/users/{userid}/flows/{flowname}/apps/ (looks weird, 
jumping levels)
- /users/{userid}/flows/{flowname}/apps/ (looks weird, jumping levels)
  
  1.2 uid (6)
  
- /flow-uid/{uid}/runs/
- /run-uid/{uid}/
- /run-uid/{uid}/apps
- /app-uid/{uid}/
- /app-uid/{uid}/entities/{entitytype} (entity type looks weird)
- /entity-uid/{uid}/
  
2. (cluster - )app - app_attempt - container sequence (6)

  - /clusters/{clusterid}/apps/{appid}/appattempts/{appattemptid}/containers
  - /clusters/{clusterid}/apps/{appid}/appattempts
  - /clusters/{clusterid}/apps/{appid}/
  - /apps/{appid}/appattempts/{appattemptid}/containers
  - /apps/{appid}/appattempts
  - /apps/{appid}/
  {code}

So the new addition looks fine to me. Do we want to reorganize the code in a 
way consistent with this list? Right now the code seems to be a little bit 
messy. We can do it in this JIRA, or we can open a new JIRA to reorganize these 
APIs and discuss the endpoints that marked as weird? Thanks! 

> [Atsv2] : Support for ability to retrieve apps/app-attempt/containers and 
> entities via REST
> ---
>
> Key: YARN-5561
> URL: https://issues.apache.org/jira/browse/YARN-5561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5561.patch, YARN-5561.v0.patch
>
>
> ATSv2 model lacks retrieval of {{list-of-all-apps}}, 
> {{list-of-all-app-attempts}} and {{list-of-all-containers-per-attempt}} via 
> REST API's. And also it is required to know about all the entities in an 
> applications.
> It is pretty much highly required these URLs for Web  UI.
> New REST URL would be 
> # GET {{/ws/v2/timeline/apps}}
> # GET {{/ws/v2/timeline/apps/\{app-id\}/appattempts}}.
> # GET 
> {{/ws/v2/timeline/apps/\{app-id\}/appattempts/\{attempt-id\}/containers}}
> # GET {{/ws/v2/timeline/apps/\{app id\}/entities}} should display list of 
> entities that can be queried.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472306#comment-15472306
 ] 

Karthik Kambatla commented on YARN-5605:


Thanks for the reviews, [~templedf]. Just pushed a second commit to the PR to 
address feedback, and left comments for things that I did not fix. 

> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472292#comment-15472292
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77930504
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
+  continue;
+}
+  } catch (InterruptedException e) {
+LOG.info("Preemption thread interrupted! Exiting.");
+return;
--- End diff --

Are you suggesting having the run method return a boolean? 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)

2016-09-07 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472220#comment-15472220
 ] 

Giovanni Matteo Fumarola edited comment on YARN-5323 at 9/8/16 12:45 AM:
-

+1 (non-binding)
Thanks [~curino].


was (Author: giovanni.fumarola):
+1
Thanks [~curino].

> Policies APIs (for Router and AMRMProxy policies)
> -
>
> Key: YARN-5323
> URL: https://issues.apache.org/jira/browse/YARN-5323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: YARN-2915
>
> Attachments: YARN-5323-YARN-2915.05.patch, 
> YARN-5323-YARN-2915.06.patch, YARN-5323-YARN-2915.07.patch, 
> YARN-5323-YARN-2915.08.patch, YARN-5323-YARN-2915.09.patch, 
> YARN-5323-YARN-2915.10.patch, YARN-5323-YARN-2915.11.patch, 
> YARN-5323.01.patch, YARN-5323.02.patch, YARN-5323.03.patch, YARN-5323.04.patch
>
>
> This JIRA tracks APIs for the policies that will guide the Router and 
> AMRMProxy decisions on where to fwd the jobs submission/query requests as 
> well as ResourceRequests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472287#comment-15472287
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77930405
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
+  continue;
+}
+  } catch (InterruptedException e) {
+LOG.info("Preemption thread interrupted! Exiting.");
+return;
--- End diff --

FairScheduler#serviceStop calls preemptionThread.interrupt. Are you 
referring to that? 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472279#comment-15472279
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77930153
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
+Resource desiredShare = Resources.min(policy.getResourceCalculator(),
+scheduler.getClusterResource(), getMinShare(), getDemand());
+
+Resource starvation = Resources.subtract(desiredShare, 
getResourceUsage());
+boolean starved = Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(), starvation, none());
+
+long now = scheduler.getClock().getTime();
+if (!starved) {
+  setLastTimeAtMinShare(now);
+}
+
+if (starved &&
+(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) {
+  return starvation;
+} else {
+  return Resources.clone(Resources.none());
--- End diff --

Adding a new variable and returning that seems like an overkill. Can't use 
the same variable or initialize to none() as that requires creating a new 
object. 

Leaving it as is. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472267#comment-15472267
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77929638
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
--- End diff --

You are right that:

starvation = desiredShare - currentUsage;
starved = starvation > 0;

However, you are missing desiredShare = min(minShare, demand). Essentially, 
the missing case is when the demand is less than minShare but still more than 
current allocation. 

That said, I see a minor simplification. Will do that. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472262#comment-15472262
 ] 

Wangda Tan commented on YARN-4945:
--

[~eepayne],

bq. I actually would like to have the priority policy and the 
minumum-user-limit-percent policy be turned on separately. I'm not sure of the 
best way to do that, but our users don't use application priority very much.

Now application priority is not very popular because it's not included by any 
release yet, but I bet it will be a popular feature in the future. :)

I would prefer to have a unified preemption policy to handle all intra-queue 
preemption, because:
- We have different combination of intra preemption criteria, like priority, 
fairness, user-limit, fifo. We cannot have a separate policy for each different 
combination. For example, fairness + user-limit and priority + user-limit.
- Lots of common part of intra-queue preemption policies, especially after 
ideal-allocation resource is calculated for each apps, we have the common logic 
to select containers.
- We may need different implementation for ideal-allocation resource 
calculator, one for fairness and one for fifo, both consider user-limit and 
priority.

bq. Perhaps CapacitySchedulerConfiguration could have something like:
As I mentioned above, we should enable intra-queue preemption by default for 
all limits. If we really need some parameters for better control, we can add 
them in the future. Otherwise it cause troubles, for example: consider priority 
without consider user-limit, excessive preemption could happen.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)

2016-09-07 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472253#comment-15472253
 ] 

Subru Krishnan commented on YARN-5323:
--

The latest patch LGTM too, committing it shortly.

> Policies APIs (for Router and AMRMProxy policies)
> -
>
> Key: YARN-5323
> URL: https://issues.apache.org/jira/browse/YARN-5323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5323-YARN-2915.05.patch, 
> YARN-5323-YARN-2915.06.patch, YARN-5323-YARN-2915.07.patch, 
> YARN-5323-YARN-2915.08.patch, YARN-5323-YARN-2915.09.patch, 
> YARN-5323-YARN-2915.10.patch, YARN-5323-YARN-2915.11.patch, 
> YARN-5323.01.patch, YARN-5323.02.patch, YARN-5323.03.patch, YARN-5323.04.patch
>
>
> This JIRA tracks APIs for the policies that will guide the Router and 
> AMRMProxy decisions on where to fwd the jobs submission/query requests as 
> well as ResourceRequests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472254#comment-15472254
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77929020
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
+context.getStarvedApps().addStarvedApp(app);
--- End diff --

I was tempted to, but then thought FSContext is just a structure to hold 
context and should only have fields with getters and setters. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472246#comment-15472246
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77928672
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
--- End diff --

The else is required. We are iterating through all apps with unmet demand. 
Since the list is sorted by fairshare starvation, we could stop iterating when 
we hit an app that is at or above its fairshare. Added a comment to clarify 
that. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472237#comment-15472237
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77928395
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
--- End diff --

I am not clear on the convention. Always thought there should be a space 
before and after the colon. Online references seem to use both. Leaving it as 
is. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472235#comment-15472235
 ] 

Eric Payne commented on YARN-4945:
--

[~sunilg], thanks again for all of the great work you are doing on this issue.

\\
- Separate switches for priority and user-limit-percent preemption?

{{ProportionalCapacityPreemptionPolicy#init}} uses 
{{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} to turn on all intra-queue 
preemption policies, but the config property name for 
{{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} is 
{{select_based_on_priority_of_applications}}.

I actually would like to have the priority policy and the 
minumum-user-limit-percent policy be turned on separately. I'm not sure of the 
best way to do that, but our users don't use application priority very much.

Perhaps {{CapacitySchedulerConfiguration}} could have something like:
{code}
  /**
   * For intra-queue preemption, priority based selector can help to preempt
   * containers of lowest priority apps to find resources for high priority
   * apps.
   */
  public static final String 
PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY =
  PREEMPTION_CONFIG_PREFIX + "select_based_on_priority_of_applications";
  public static final boolean 
DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY = false;

  /**
   * For intra-queue preemption, minimum-user-limit-percent based selector can
   * help to preempt containers to ensure users are not starved of their
   * guaranteed percentage of a queue.
   */
  public static final String 
PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE =
  PREEMPTION_CONFIG_PREFIX + "select_based_on_user_percentage_guarantee";
  public static final boolean 
DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE = false;

{code}

And then {{ProportionalCapacityPreemptionPolicy#init}} can turn on intra-queue 
preemption if either one is set:
{code}
boolean selectIntraQueuePreemptCandidatesByPriority = csConfig.getBoolean(

CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY,

CapacitySchedulerConfiguration.DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY);
boolean selectIntraQueuePreemptCandidatesByUserPercentGuarantee = 
csConfig.getBoolean(

CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE,

CapacitySchedulerConfiguration.DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE);
if (selectIntraQueuePreemptCandidatesByPriority || 
selectIntraQueuePreemptCandidatesByUserPercentGuarantee) {
  candidatesSelectionPolicies.add(new IntraQueueCandidatesSelector(this));
}
{code}

Then, in {{IntraQueueCandidatesSelector}} logic could be added to do either one 
or both intra-queue preemption policies. What do you think?
\\
\\

\\
- Could headroom check allow priority inversion?

{{PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue}}:
{code}
  // Can skip apps which are already crossing user-limit.
  // For this, Get the userlimit from scheduler and ensure that app is
  // not crossing userlimit here. Such apps can be skipped.
  Resource userHeadroom = leafQueue.getUserLimitHeadRoomPerApp(
  a1.getFiCaSchedulerApp(), context.getPartitionResource(partition),
  partition);
  if (Resources.lessThanOrEqual(rc,
  context.getPartitionResource(partition), userHeadroom,
  Resources.none())) {
continue;
  }
{code}
I think this code will allow a priority inversion when a user has apps of 
different priorities. For example, in a situation like the following, {{App1}} 
from {{User1}} is already taking up all of the resources, so its headroom is 0. 
But, since {{App2}} is also from {{User1}}, the above code will never allow 
preemption to occur. Is that correct?

||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||
|QUEUE1|User1|App1|1|200|0|
|QUEUE1|User1|App2|10|0|50|


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472223#comment-15472223
 ] 

Wangda Tan commented on YARN-5620:
--

Just have an offline chat with [~asuresh], since this is a implementation 
change only, and also, container change (like update binary) is orthogonal to 
allocation change (like update resources). I'm OK with the overall approach.

Thanks,

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472225#comment-15472225
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77928093
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -45,16 +45,19 @@
 import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt;
 import org.apache.hadoop.yarn.util.resource.Resources;
 
+import static org.apache.hadoop.yarn.util.resource.Resources.none;
+
 @Private
 @Unstable
 public class FSLeafQueue extends FSQueue {
   private static final Log LOG = LogFactory.getLog(
   FSLeafQueue.class.getName());
+  private FairScheduler scheduler;
--- End diff --

Filed YARN-5625. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5625) FairScheduler should use FSContext more aggressively to avoid constructors with many parameters

2016-09-07 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-5625:
--

 Summary: FairScheduler should use FSContext more aggressively to 
avoid constructors with many parameters
 Key: YARN-5625
 URL: https://issues.apache.org/jira/browse/YARN-5625
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.9.0
Reporter: Karthik Kambatla


YARN-5609 introduces FSContext, a structure to capture basic FairScheduler 
information. In addition to preemption details, it could host references to the 
scheduler, QueueManager, AllocationConfiguration etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)

2016-09-07 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472220#comment-15472220
 ] 

Giovanni Matteo Fumarola commented on YARN-5323:


+1
Thanks [~curino].

> Policies APIs (for Router and AMRMProxy policies)
> -
>
> Key: YARN-5323
> URL: https://issues.apache.org/jira/browse/YARN-5323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5323-YARN-2915.05.patch, 
> YARN-5323-YARN-2915.06.patch, YARN-5323-YARN-2915.07.patch, 
> YARN-5323-YARN-2915.08.patch, YARN-5323-YARN-2915.09.patch, 
> YARN-5323-YARN-2915.10.patch, YARN-5323-YARN-2915.11.patch, 
> YARN-5323.01.patch, YARN-5323.02.patch, YARN-5323.03.patch, YARN-5323.04.patch
>
>
> This JIRA tracks APIs for the policies that will guide the Router and 
> AMRMProxy decisions on where to fwd the jobs submission/query requests as 
> well as ResourceRequests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472217#comment-15472217
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77927875
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -45,16 +45,19 @@
 import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt;
 import org.apache.hadoop.yarn.util.resource.Resources;
 
+import static org.apache.hadoop.yarn.util.resource.Resources.none;
+
 @Private
 @Unstable
 public class FSLeafQueue extends FSQueue {
   private static final Log LOG = LogFactory.getLog(
   FSLeafQueue.class.getName());
+  private FairScheduler scheduler;
--- End diff --

Did that initially. That bloated up the patch quite a lot, and is somewhat 
orthogonal. Will file a follow-up JIRA for that work. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472206#comment-15472206
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77927367
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
 ---
@@ -535,6 +535,23 @@ public synchronized Resource 
getResource(SchedulerRequestKey schedulerKey) {
   }
 
   /**
+   * Method to return the next resource request to be serviced.
+   *
+   * In the initial implementation, we just pick any {@link 
ResourceRequest}
+   * corresponding to the highest priority.
+   *
+   * @return next {@link ResourceRequest} to allocate resources for.
+   */
+  @Unstable
+  public synchronized ResourceRequest getNextResourceRequest() {
+for (ResourceRequest rr:
+resourceRequestMap.get(schedulerKeys.first()).values()) {
+  return rr;
--- End diff --

Adding another variable and breaking out of the for loop seems more 
complicated that it is worth. Leaving it as is unless you insist. 


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472191#comment-15472191
 ] 

Eric Payne commented on YARN-4945:
--

Thank you [~leftnoteasy]. I see now that 
{{IntraQueueCandidatesSelector#tryPreemptContainerAndDeductResToObtain}} is 
checking {{totalPreemptionAllowed}} before selecting each container.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-4091) Add REST API to retrieve scheduler activity

2016-09-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reopened YARN-4091:
--

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Fix For: 3.0.0-alpha2
>
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-branch-2.001.patch, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, 
> YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch, 
> YARN-4091.6.patch, YARN-4091.7.patch, YARN-4091.8.patch, 
> YARN-4091.preliminary.1.patch, app_activities v2.json, app_activities.json, 
> node_activities v2.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4091) Add REST API to retrieve scheduler activity

2016-09-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4091:
-
Attachment: YARN-4091-branch-2.001.patch

Attached patch for branch-2, it will be good to put to branch-2 for better 
debugability and less divergences between trunk and branch-2.

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Fix For: 3.0.0-alpha2
>
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-branch-2.001.patch, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, 
> YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch, 
> YARN-4091.6.patch, YARN-4091.7.patch, YARN-4091.8.patch, 
> YARN-4091.preliminary.1.patch, app_activities v2.json, app_activities.json, 
> node_activities v2.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472129#comment-15472129
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77923085
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
--- End diff --

Unused import


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472111#comment-15472111
 ] 

Li Lu commented on YARN-5585:
-

Just realized that a normal converter will not address the use case where users 
really want entities sorted by their creation time, unless we introduce a 
second table to index those data... 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2016-09-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472088#comment-15472088
 ] 

Ming Ma commented on YARN-5464:
---

Maybe this was discussed in the other jira, currently the timeout value 
specified in the config is relative, not the absolute timestamp, how does it 
work if RM is restarted with the same config; but given the value is relative 
the expiration will be extended? In other words, if we should change the config 
from the relative to the absolute.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Add a new filter fromId in REST endpoints

2016-09-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472083#comment-15472083
 ] 

Li Lu commented on YARN-5585:
-

I think we're overcomplicating the problem here... I believe the general use 
case of this JIRA is mostly on pagination: given an uniquely defined type of 
entities in one application, if the total number of entities is greater than 
the given limit, can we provide an API to allow fetching data in multiple 
batches. So right now we have , , ..., , 
and limit = 10. What we want is initially we fetch  to 
, then given fromId = entity_010, we fetch  to 
, and so on and so forth. According to Rohith's use case, I think 
it's totally fine to say that all entities are ordered by their Ids 
lexicographically (especially for entities with proper padding on numbers like 
container id). Actually, any consistent order will do the work for pagination, 
the only problem is how to make it makes sense to the users. 

The real problem here is we need to return everything in an order sorted by 
their creation time, which seems to be quite hard in our current data model. 
This was pretty easy in ATS v1, where creation time is baked in the row key for 
each entity. I remember there were some discussions about this a while ago, but 
the general conclusion was that we mainly rely on the use cases themselves to 
guarantee consistency between creation time and entity id. To me, the potential 
problem of sorting entities according to their creation time to implement 
pagination is that we have to firstly fetch _all_ of them from HBase to form 
the order, which really kills the most advantage of pagination. 

An ID encoder/decoder will be very helpful to this use case. However, having 
the application write the encode/decode process seems to be introducing more 
load to application programmers. It also introduces extra work for deployments 
since cluster operators need to handle third-party plugins. Can we provide 
several "SORT BY" options for timeline entity types, so that we store their ids 
accordingly? 

> [Atsv2] Add a new filter fromId in REST endpoints
> -
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-10. But to retrieve next 5 apps, it is 
> difficult.
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472064#comment-15472064
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77919563
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
--- End diff --

Spurious space before the colon


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-09-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472063#comment-15472063
 ] 

Ming Ma commented on YARN-4676:
---

Thanks all! [~djp], sorry I missed your earlier question about the format. Yes, 
making YARN-5536 a blocker for 2.9 will work.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch, YARN-4676.020.patch, 
> YARN-4676.021.patch, YARN-4676.022.patch, YARN-4676.023.patch, 
> YARN-4676.024.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472060#comment-15472060
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77919411
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
+context.getStarvedApps().addStarvedApp(app);
--- End diff --

Feels like FSContext should have a wrapper method for addStarvedApp().


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472054#comment-15472054
 ] 

Wangda Tan commented on YARN-5620:
--

[~asuresh],

I still think two APIs should be merged.

In YARN-5221, it update properties of container in RM, like resource / 
execution-type
So I think we should have a symmetric API in NM to update properties of 
container in NM. 

To me, localized resource, resource, execution type, any fields in 
ContainerLaunchContext are all properties of a container in NM, we should be 
able to use a unified API to update them.

And should we expose a rollback API to application? 

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472056#comment-15472056
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77919341
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
--- End diff --

I don't like this name.  It feels to me like an identifyX() should return 
something.  I'd rather have a name that says what it does, e.g. 
updateStarvedApplications().


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472049#comment-15472049
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77918966
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
 ---
@@ -80,13 +80,13 @@ public void removeChildQueue(FSQueue child) {
   }
 
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(childQueues, getFairShare());
   for (FSQueue childQueue : childQueues) {
 childQueue.getMetrics().setFairShare(childQueue.getFairShare());
--- End diff --

Seems like this line should be pushed down into 
FSChildQueue.updateInternal()


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472021#comment-15472021
 ] 

Arun Suresh edited comment on YARN-5620 at 9/7/16 10:43 PM:


[~leftnoteasy], I feel that YARN-5221 and this are orthogonal.

YARN-5221 deals with the update of a _container allocation_, properties of the 
container that are of interest to the Scheduler for allocation of the resource 
request like Resource size, and ExecutionType.

This JIRA is pertaining to upgrade/rollback of the of the _container Process_, 
which are of interest only to the NM for the running of the container, which 
are encapsulated in the *ContainerLaunchContext* and Resources that need to be 
localized. The ContainerLaunchContext and LocalResources are not even known to 
the RM/Scheduler.

With regard to the AM-NM protocol, I feel we should keep both different.
# One for updating the container allocation which is handled by YARN-5221. 
Maybe we should rename those to *updateContainerAllocation* rather than just 
*updateContainer* ?
# One for updating the container process, which this JIRA proposes to call 
*upgrade* and *rollback*




was (Author: asuresh):
[~leftnoteasy], I feel that YARN-5221 and this are orthogonal.

YARN-5221 deals with the update of a _container allocation_, properties of the 
container that are of interest to the Scheduler for allocation of the resource 
request like Resource size, and ExecutionType.

This JIRA is pertaining to upgrade/rollback of the of the _container Process_, 
which are of interest only to the NM for the running of the container, which 
are encapsulated in the *ContainerLaunchContext* and Resources that need to be 
localized.

With regard to the AM-NM protocol, I feel we should keep both different.
# One for updating the container allocation which is handled by YARN-5221. 
Maybe we should rename those to *updateContainerAllocation* rather than just 
*updateContainer* ?
# One for updating the container process, which this JIRA proposes to call 
*upgrade* and *rollback*



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472021#comment-15472021
 ] 

Arun Suresh edited comment on YARN-5620 at 9/7/16 10:42 PM:


[~leftnoteasy], I feel that YARN-5221 and this are orthogonal.

YARN-5221 deals with the update of a _container allocation_, properties of the 
container that are of interest to the Scheduler for allocation of the resource 
request like Resource size, and ExecutionType.

This JIRA is pertaining to upgrade/rollback of the of the _container Process_, 
which are of interest only to the NM for the running of the container, which 
are encapsulated in the *ContainerLaunchContext* and Resources that need to be 
localized.

With regard to the AM-NM protocol, I feel we should keep both different.
# One for updating the container allocation which is handled by YARN-5221. 
Maybe we should rename those to *updateContainerAllocation* rather than just 
*updateContainer* ?
# One for updating the container process, which this JIRA proposes to call 
*upgrade* and *rollback*




was (Author: asuresh):
[~leftnoteasy], I feel the YARN-5221 and this are orthogonal.

YARN-5221 deals with the update of a _container allocation_, properties of the 
container that are of interest to the Scheduler for allocation of the resource 
request like Resource size, and ExecutionType.

This JIRA is pertaining to upgrade/rollback of the of the _container Process_, 
which are of interest only to the NM for the running of the container, which 
are encapsulated in the *ContainerLaunchContext* and Resources that need to be 
localized.

With regard to the AM-NM protocol, I feel we should keep both different.
# One for updating the container allocation which is handled by YARN-5221. 
Maybe we should rename those to *updateContainerAllocation* rather than just 
*updateContainer* ?
# One for updating the container process, which this JIRA proposes to call 
*upgrade* and *rollback*



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472021#comment-15472021
 ] 

Arun Suresh commented on YARN-5620:
---

[~leftnoteasy], I feel the YARN-5221 and this are orthogonal.

YARN-5221 deals with the update of a _container allocation_, properties of the 
container that are of interest to the Scheduler for allocation of the resource 
request like Resource size, and ExecutionType.

This JIRA is pertaining to upgrade/rollback of the of the _container Process_, 
which are of interest only to the NM for the running of the container, which 
are encapsulated in the *ContainerLaunchContext* and Resources that need to be 
localized.

With regard to the AM-NM protocol, I feel we should keep both different.
# One for updating the container allocation which is handled by YARN-5221. 
Maybe we should rename those to *updateContainerAllocation* rather than just 
*updateContainer* ?
# One for updating the container process, which this JIRA proposes to call 
*upgrade* and *rollback*



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471981#comment-15471981
 ] 

Wangda Tan commented on YARN-5620:
--

[~asuresh],

In YARN-5221 we have merged update container resource and execution type. I 
think it's better to merge the API in NM side as well. We have updated NM for 
container resizing in YARN-1643/YARN-1449, and there's on pending ticket for 
updating cgroups: YARN-4166.

I would suggest to have:
- A unified API in AM-NM protocol to update container
- A unified implementation inside to update container manager / CGroups. 

To avoid making API/Implementation fragmented.

Thoughts?

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471959#comment-15471959
 ] 

Wangda Tan commented on YARN-4734:
--

[~sunilg], Javadocs warnings should related to changes for UI hosting, could 
you check?

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5539) AM fails due to "java.net.SocketTimeoutException: Read timed out"

2016-09-07 Thread Sumana Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish resolved YARN-5539.
--
Resolution: Cannot Reproduce

Not able to reproduce the issue. 

> AM fails due to "java.net.SocketTimeoutException: Read timed out"
> -
>
> Key: YARN-5539
> URL: https://issues.apache.org/jira/browse/YARN-5539
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
>
> AM fails with the following exception
> {code}
> FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
> com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:170)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
>   at 
> 

[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471809#comment-15471809
 ] 

Arun Suresh edited comment on YARN-5620 at 9/7/16 9:17 PM:
---

Thanks for the review [~jianhe]

bq. The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
Consider an AM that upgrades a container with a new binary and the process is 
subsequently restarted. Now after say around 10 mins the process dies. There is 
no way form the NM to know if the process died because of the upgrade (memory 
leak ?) or due to some transient failure.. and therefore it cannot make the 
decision to Retry the process or Rollback the upgrade. Only the AM knows if the 
upgrade is actually successful. Essentially, the commit API should be used by 
the AM to notify the NM that upgrade is fine and any subsequent failure can be 
handled by the existing Retry Policy AFTER it has performed some upgrade 
diagnostics on the container. We can provide an *autoCommit* convenience method 
that clubs upgrade + commit. But I feel it is important we keep the explicit 
commit API.

bq. The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
I agree AM should be able to move to any previous version, but,
# I feel the versioning should NOT be managed by the NM, since a) the launch 
context is provided and managed by the AM, the AM should take care of tying the 
context with the version b) There are (possibly huge) storage implications the 
NM would have to deal with to keep track of all the earlier versions.
# It should not be called *rollback*. The AM should call 
{{upgradeContainer(launchContext)}} with some previous context. 



bq. IMHO, we probably can use one API restartContainer(context) for both 
upgrade and downgrade
I agree that both *rollback* (explicit rollback via API) and *upgrade* can be 
implemented as wrappers over {{restartContainer(launchContext)}}. But, in my 
opinion *rollback* should not be provided with an _explicit_ launchContext, it 
should always be the just previous context.

bq. Also, Forcing containers to be restarted with previous version if upgrade 
fails may not be suitable in all cases, User wants to troubleshoot the failure 
first before triggering a new wave of restarts.
Agreed... I can include an UpgradePolicy which allows users to *terminate* or 
*rollBack* (implicit rollback) on failure. Also COMMIT is useful here if the 
user wants to verify if one wave has successfully upgraded, commit upgrade in 
those instances and then move on to the next wave.

bq. IMO, as first cut implementation, we can fail the container if upgrade 
fails. we can add retry,  rollback, or release the container as RetryPolicy on 
failure later. your opinion ?
Yup.. will include a policy, as I mentioned above. Don't think *retry* makes 
sense though.





was (Author: asuresh):
Thanks for the review [~jianhe]

bq. The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
Consider an AM that upgrades a container with a new binary and the process is 
subsequently restarted. Now after say around 10 mins the process dies. There is 
no way form the NM to know if the process died because of the upgrade (memory 
leak ?) or due to some transient failure.. and therefore it cannot make the 
decision to Retry the process or Rollback the upgrade. Only the AM knows if the 
upgrade is actually successful. Essentially, the commit API should be used by 
the AM to notify the NM that upgrade is fine and any subsequent failure can be 
handled by the existing Retry Policy AFTER it has performed some upgrade 
diagnostics on the container. We can provide an *autoCommit* convenience method 
that clubs upgrade + commit. But I feel it is important we keep the explicit 
commit API.

bq. The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
I agree AM should be able to move to any previous version, but,
# I feel the versioning should NOT be managed by the NM, since a) the launch 
context is provided and managed by the AM, the AM should take care of tying the 
context with the version b) There are (possibly huge) storage implications the 
NM would have to deal with to keep track of all the earlier versions.
# It should not be called *rollback*. The AM should call 
{{restartContainer(launchContext)}} with some previous context. 


bq. IMHO, we probably can use one API restartContainer(context) for both 
upgrade and downgrade
I agree that both *rollback* (explicit rollback via API) and *upgrade* can be 
implemented as wrappers over 

[jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471809#comment-15471809
 ] 

Arun Suresh edited comment on YARN-5620 at 9/7/16 9:07 PM:
---

Thanks for the review [~jianhe]

bq. The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
Consider an AM that upgrades a container with a new binary and the process is 
subsequently restarted. Now after say around 10 mins the process dies. There is 
no way form the NM to know if the process died because of the upgrade (memory 
leak ?) or due to some transient failure.. and therefore it cannot make the 
decision to Retry the process or Rollback the upgrade. Only the AM knows if the 
upgrade is actually successful. Essentially, the commit API should be used by 
the AM to notify the NM that upgrade is fine and any subsequent failure can be 
handled by the existing Retry Policy AFTER it has performed some upgrade 
diagnostics on the container. We can provide an *autoCommit* convenience method 
that clubs upgrade + commit. But I feel it is important we keep the explicit 
commit API.

bq. The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
I agree AM should be able to move to any previous version, but,
# I feel the versioning should NOT be managed by the NM, since a) the launch 
context is provided and managed by the AM, the AM should take care of tying the 
context with the version b) There are (possibly huge) storage implications the 
NM would have to deal with to keep track of all the earlier versions.
# It should not be called *rollback*. The AM should call 
{{restartContainer(launchContext)}} with some previous context. 


bq. IMHO, we probably can use one API restartContainer(context) for both 
upgrade and downgrade
I agree that both *rollback* (explicit rollback via API) and *upgrade* can be 
implemented as wrappers over {{restartContainer(launchContext)}}. But, in my 
opinion *rollback* should not be provided with an _explicit_ launchContext, it 
should always be the just previous context.







was (Author: asuresh):
Thanks for the review [~jianhe]

bq. The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
Consider an AM that upgrades a container with a new binary and the process is 
subsequently restarted. Now after say around 10 mins the process dies. There is 
no way form the NM to know if the process died because of the upgrade (memory 
leak ?) or due to some transient failure.. and therefore it cannot make the 
decision to Retry the process or Rollback the upgrade. Only the AM knows if the 
upgrade is actually successful. Essentially, the commit API should be used by 
the AM to notify the NM that upgrade is fine and any subsequent failure can be 
handled by the existing Retry Policy AFTER it has performed some upgrade 
diagnostics on the container. We can provide an *autoCommit* convenience method 
that clubs upgrade + commit. But I feel it is important we keep the explicit 
commit API.

bq. The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
I agree AM should be able to move to any previous version, but,
# I feel the versioning should NOT be managed by the NM, since a) the launch 
context is provided and managed by the AM, the AM should take care of tying the 
context with the version b) There are (possibly huge) storage implications the 
NM would have to deal with to keep track of all the earlier versions.
# It should not be called *rollback*. The AM should call 
{{restartContainer(launchContext)}} with some previous context. 

bq. IMHO, we probably can use one API restartContainer(context) for both 
upgrade and downgrade
I agree that both *rollback* (explicit rollback via API) and *upgrade* can be 
implemented as wrappers over {{restartContainer(launchContext)}}. But, in my 
opinion *rollback* should not be provided with an _explicit_ launchContext, it 
should always be the just previous context.






> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with 

[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471809#comment-15471809
 ] 

Arun Suresh commented on YARN-5620:
---

Thanks for the review [~jianhe]

bq. The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
Consider an AM that upgrades a container with a new binary and the process is 
subsequently restarted. Now after say around 10 mins the process dies. There is 
no way form the NM to know if the process died because of the upgrade (memory 
leak ?) or due to some transient failure.. and therefore it cannot make the 
decision to Retry the process or Rollback the upgrade. Only the AM knows if the 
upgrade is actually successful. Essentially, the commit API should be used by 
the AM to notify the NM that upgrade is fine and any subsequent failure can be 
handled by the existing Retry Policy AFTER it has performed some upgrade 
diagnostics on the container. We can provide an *autoCommit* convenience method 
that clubs upgrade + commit. But I feel it is important we keep the explicit 
commit API.

bq. The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
I agree AM should be able to move to any previous version, but,
# I feel the versioning should NOT be managed by the NM, since a) the launch 
context is provided and managed by the AM, the AM should take care of tying the 
context with the version b) There are (possibly huge) storage implications the 
NM would have to deal with to keep track of all the earlier versions.
# It should not be called *rollback*. The AM should call 
{{restartContainer(launchContext)}} with some previous context. 

bq. IMHO, we probably can use one API restartContainer(context) for both 
upgrade and downgrade
I agree that both *rollback* (explicit rollback via API) and *upgrade* can be 
implemented as wrappers over {{restartContainer(launchContext)}}. But, in my 
opinion *rollback* should not be provided with an _explicit_ launchContext, it 
should always be the just previous context.






> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471794#comment-15471794
 ] 

Wangda Tan commented on YARN-4945:
--

Hi [~eepayne],

Maybe there're some misunderstandings, first of all, I'm not quite sure about 
why in step.5, priority-intra-queue preemption policy can select 5 more 
resources.
Step.7 is reasonable to me, imbalance between queues has higher priority than 
priority inversion within a queue.
 
In my mind, the whole preemption process will be (as same as your examples)
Assume each queue has total-preemption-per-round, which is the total preemption 
allowed for inter-queue + intra-queue preemption.

Step 1-4 will be same as what you described
Step 5 will not happen because there's 10 containers marked for preemption 
already.

So Step 4 will be repeated until:

{code}
Queue 1:
   User1, Used=100, Pending=50
   User2, Used=0, Pending=50
Queue 2: Used=100,
{code}

Once the inter-queue resource usage back to balanced, intra-queue preemption 
policy can start to preempt resources.

So the Step 5 will be:

{code}
10 container marked to be preemption for User1 from Queue 1, and after these 
container preempted, they will be picked up by User2 from Queue 1.
{code}

[~sunilg] please add your thoughts if you think different.

Thanks,


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471783#comment-15471783
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77902532
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
+  continue;
+}
+  } catch (InterruptedException e) {
+LOG.info("Preemption thread interrupted! Exiting.");
+return;
--- End diff --

You could also replace this return with starvedApp = null, and then put a 
guard at the beginning of identifyContainersToPreempt(), which is arguably a 
good thing to do in any case.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471772#comment-15471772
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77901904
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
+  continue;
+}
+  } catch (InterruptedException e) {
+LOG.info("Preemption thread interrupted! Exiting.");
+return;
--- End diff --

You should call interrupt();


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471764#comment-15471764
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77901500
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -0,0 +1,173 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.yarn.api.records.ApplicationAttemptId;
+import org.apache.hadoop.yarn.api.records.ContainerStatus;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer;
+import 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerEventType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
+import org.apache.hadoop.yarn.util.resource.Resources;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Timer;
+import java.util.TimerTask;
+
+/**
+ * Thread that handles FairScheduler preemption
+ */
+public class FSPreemptionThread extends Thread {
+  private static final Log LOG = 
LogFactory.getLog(FSPreemptionThread.class);
+  private final FSContext context;
+  private final FairScheduler scheduler;
+  private final long warnTimeBeforeKill;
+  private final Timer preemptionTimer;
+
+  public FSPreemptionThread(FairScheduler scheduler) {
+this.scheduler = scheduler;
+this.context = scheduler.getContext();
+FairSchedulerConfiguration fsConf = scheduler.getConf();
+context.setPreemptionEnabled();
+context.setPreemptionUtilizationThreshold(
+fsConf.getPreemptionUtilizationThreshold());
+warnTimeBeforeKill = fsConf.getWaitTimeBeforeKill();
+preemptionTimer = new Timer("Preemption Timer", true);
+
+setDaemon(true);
+setName("FSPreemptionThread");
+  }
+
+  public void run() {
+while (!Thread.interrupted()) {
+  FSAppAttempt starvedApp;
+  try{
+starvedApp = context.getStarvedApps().take();
+if (Resources.none().equals(starvedApp.getStarvation())) {
--- End diff --

Feels like this should be a Resources.isNone() call.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-09-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471662#comment-15471662
 ] 

Eric Payne commented on YARN-:
--

Any objections if I backport this to branch-2.8?

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically 
> nested.
> 
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: PctOfQueueIsInaccurate.jpg, YARN-.001.patch
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, 
> {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section 
> of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct 
> child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4954) TestYarnClient.testReservationAPIs fails on machines with less than 4 GB available memory

2016-09-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4954:
--
Attachment: YARN-4954.002.patch

Attaching a new patch for trunk, since the context was changed by YARN-4957

> TestYarnClient.testReservationAPIs fails on machines with less than 4 GB 
> available memory
> -
>
> Key: YARN-4954
> URL: https://issues.apache.org/jira/browse/YARN-4954
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha1
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Attachments: YARN-4954.001.patch, YARN-4954.002.patch
>
>
> TestYarnClient.testReservationAPIs sometimes fails with this error:
> {noformat}
> java.lang.AssertionError: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The request cannot be satisfied
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1254)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:457)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:515)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
> Caused by: 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException:
>  The request cannot be satisfied
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.IterativePlanner.computeJobAllocation(IterativePlanner.java:151)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:64)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1237)
>   ... 10 more
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testReservationAPIs(TestYarnClient.java:1227)
> {noformat}
> This is caused by really not having enough available memory to complete the 
> reservation (4 * 1024 MB). In my opinion lowering the required memory (either 
> by lowering the number of containers to 2, or the memory to 512 MB) would 
> make the test more stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5622) TestYarnCLI.testGetContainers fails due to mismatched date formats

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471591#comment-15471591
 ] 

Hadoop QA commented on YARN-5622:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 55s 
{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
14s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 56s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827427/YARN-5622.001.patch |
| JIRA Issue | YARN-5622 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux caf7aec27c64 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f414d5e |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13037/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13037/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> TestYarnCLI.testGetContainers fails due to mismatched date formats
> --
>
> Key: YARN-5622
> URL: https://issues.apache.org/jira/browse/YARN-5622
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5622.001.patch
>
>
> ApplicationCLI.listContainers uses Times.format to print timestamps, while 
> TestYarnCLI.testGetContainers formats them using dateFormat.format with its 
> own defined format. The 

[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471536#comment-15471536
 ] 

Eric Payne commented on YARN-4945:
--

[~leftnoteasy] and [~sunilg], I'm still concerned about having the intra-queue 
preemption policies adding containers to the {{selectedCandidates}} list if the 
inter-queue policies have already added containers. In that case, the 
containers selected by the intra-queue policies may not go back to the correct 
queue. Consider this use case:

Queues (all are preemptable):
||Queue Name||Guaranteed Resources||Max 
Resources||{{total_preemption_per_round}}||
|root|200|200|0.1|
|QUEUE1|100|200|0.1|
|QUEUE2|100|200|0.1|

# {{User1}} starts {{App1}} on {{QUEUE1}} and uses all 200 resources. These 
containers are long-running and will not be released any time soon:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
# {{User2}} starts {{App2}} on {{QUEUE2}} and requests 100 resources:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
|QUEUE2|User2|App2|1|0|100|0|
# {{User1}} starts {{App3}} at a high priority on {{QUEUE1}} and requests 50 
resources:
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|0|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# Since {{total_preemption_per_round}} is 0.1, only 10% of the needed resources 
will be selected per round. So, the inter-queue preemption policies select 10 
resources to be preempted from {{App1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|10|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# Then, the priority-intra-queue preemption policy selects 5 more resources to 
be preempted from {{App1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|200|0|15|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|0|100|0|
# At this point, 15 resources are preempted from {{App1}}.
# Since {{QUEUE2}} is asking for 100 resources, and is extremely underserved 
(from an inter-queue point of view), the capacity scheduler gives all 15 
resources to {{QUEUE2}}, and the priority inversion remains in {{QUEUE1}}.
||Queue Name||User Name||App Name||App Priority||Used Resources||Pending 
Resources||Selected For Preemption||
|QUEUE1|User1|App1|1|185|15|0|
|QUEUE1|User1|App3|10|0|50|0|
|QUEUE2|User2|App2|1|15|85|0|

This is why I am concerned that when containers are already selected by the 
inter-queue preemption policies, it may not be beneficial to have the 
intra-queue policies preempt containers as well.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5622) TestYarnCLI.testGetContainers fails due to mismatched date formats

2016-09-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5622:
--
Attachment: YARN-5622.001.patch

Attaching patches that uses Times.format() in TestYarnCLI.testGetContainers 
instead of dateFormat.format(). 

> TestYarnCLI.testGetContainers fails due to mismatched date formats
> --
>
> Key: YARN-5622
> URL: https://issues.apache.org/jira/browse/YARN-5622
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5622.001.patch
>
>
> ApplicationCLI.listContainers uses Times.format to print timestamps, while 
> TestYarnCLI.testGetContainers formats them using dateFormat.format with its 
> own defined format. The test should be consistent and use Times.format. 
> {noformat}
> org.junit.ComparisonFailure: expected:<...1234_0005_01_01 [Thu Jan 01 
> 00:00:01 + 1970 Thu Jan 01 00:00:05 + 1970  COMPLETE  
>  host:1234http://host:2345 
> logURL
>  container_1234_0005_01_02Thu Jan 01 00:00:01 + 1970  Thu Jan 
> 01 00:00:05 + 1970  COMPLETE   host:1234
> http://host:2345 logURL
>  container_1234_0005_01_03Thu Jan 01 00:00:01 + 1970] 
>  N/...> but was:<...1234_0005_01_01 [ 1-Jan-1970 00:00:01
> 1-Jan-1970 00:00:05COMPLETE   host:1234
> http://host:2345 logURL
>  container_1234_0005_01_02 1-Jan-1970 00:00:01 1-Jan-1970 
> 00:00:05COMPLETE   host:1234
> http://host:2345 logURL
>  container_1234_0005_01_03 1-Jan-1970 00:00:01]   
>  N/...>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.client.cli.TestYarnCLI.testGetContainers(TestYarnCLI.java:330)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4232) TopCLI console support for HA mode

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471511#comment-15471511
 ] 

Hadoop QA commented on YARN-4232:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch 
generated 0 new + 164 unchanged - 1 fixed = 164 total (was 165) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 3s 
{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 19s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827416/YARN-4232.005.patch |
| JIRA Issue | YARN-4232 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2031d2848342 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f414d5e |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13035/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13035/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> TopCLI console support for HA mode
> 

[jira] [Commented] (YARN-5623) Apply SLIDER-1166 to yarn-native-services branch

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471499#comment-15471499
 ] 

Hadoop QA commented on YARN-5623:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 in yarn-native-services has 317 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-slider-core in yarn-native-services failed. 
{color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core:
 The patch generated 2 new + 402 unchanged - 1 fixed = 404 total (was 403) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-slider-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 17s 
{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 17s 
{color} | {color:red} The patch generated 10 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827420/YARN-5623-yarn-native-services.001.patch
 |
| JIRA Issue | YARN-5623 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux abc1ab94e072 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / c64a |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/13036/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core-warnings.html
 |
| javadoc | 

[jira] [Commented] (YARN-5586) Update the Resources class to consider all resource types

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471463#comment-15471463
 ] 

Hadoop QA commented on YARN-5586:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 3s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch 
generated 0 new + 42 unchanged - 1 fixed = 42 total (was 43) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 41m 47s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 32s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827402/YARN-5586-YARN-3926.002.patch
 |
| JIRA Issue | YARN-5586 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4b5771a827c2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3926 / d39495c |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13034/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13034/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |



[jira] [Updated] (YARN-5623) Apply SLIDER-1166 to yarn-native-services branch

2016-09-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-5623:

Attachment: YARN-5623-yarn-native-services.001.patch

> Apply SLIDER-1166 to yarn-native-services branch
> 
>
> Key: YARN-5623
> URL: https://issues.apache.org/jira/browse/YARN-5623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-5623-yarn-native-services.001.patch
>
>
> SLIDER-1166 fixes a critical issue in SliderClient when used as a service. It 
> needs to be merged into yarn-native-services branch as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4232) TopCLI console support for HA mode

2016-09-07 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: YARN-4232.005.patch

Attaching patch after handling checkstyle issue.

> TopCLI console support for HA mode
> --
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch, 0002-YARN-4232.patch, 
> YARN-4232.003.patch, YARN-4232.004.patch, YARN-4232.005.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

2016-09-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471359#comment-15471359
 ] 

Sunil G commented on YARN-3224:
---

Yes Sure. There is a related ticket from preemption framework YARN-3784. With 
that patch, we can send preemption timeout per container to AM. I will try 
rebase both and will check with other community members for reviews.

> Notify AM with containers (on decommissioning node) could be preempted after 
> timeout.
> -
>
> Key: YARN-3224
> URL: https://issues.apache.org/jira/browse/YARN-3224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3224.patch, 0002-YARN-3224.patch
>
>
> We should leverage YARN preemption framework to notify AM that some 
> containers will be preempted after a timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471319#comment-15471319
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77872281
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
+Resource desiredShare = Resources.min(policy.getResourceCalculator(),
+scheduler.getClusterResource(), getMinShare(), getDemand());
+
+Resource starvation = Resources.subtract(desiredShare, 
getResourceUsage());
+boolean starved = Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(), starvation, none());
+
+long now = scheduler.getClock().getTime();
+if (!starved) {
+  setLastTimeAtMinShare(now);
+}
+
+if (starved &&
+(now - lastTimeAtMinShare > getMinSharePreemptionTimeout())) {
+  return starvation;
+} else {
+  return Resources.clone(Resources.none());
--- End diff --

Single exit point, please


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471313#comment-15471313
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77871756
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -557,28 +599,33 @@ private boolean preemptContainerPreCheck() {
 getFairShare());
   }
 
-  /**
-   * Is a queue being starved for its min share.
-   */
-  @VisibleForTesting
-  boolean isStarvedForMinShare() {
-return isStarved(getMinShare());
+  private Resource minShareStarvation() {
--- End diff --

This seems convoluted.  why not:

boolean starved = false;
Resource starvation = none();
if (Resources.greaterThan(policy.getResourceCalculator(),
scheduler.getClusterResource(), getDemand(), getMinShare())) {
  starvation = Resources.subtract(getMinShare(), getResourceUsage());
  starved = Resources.greaterThan(policy.getResourceCalculator(),
scheduler.getClusterResource(), starvation, none());
}

Nets out to the same thing, but the logic isn't as hard (for me) to follow.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5624) config.getResource is giving wrong information in case of FileSystemBasedConfigurationProvider

2016-09-07 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5624:
--
Description: 
When FileSystemBasedConfigurationProvider is configured, {{conf.getResource}} 
has to give patch from HDFS. Currently it returns local path.

Also improve logs during init and in rmAdmin commands to indicate clear 
information about the FileSystemBasedConfigurationProvider path (hdfs).

  was:
When FileSystemBasedConfigurationProvider is configures, {{conf.getResource}} 
has to give patch from HDFS. Currently it returns local patch.

Also improve logs during init and in rmAdmin commands to indicate clear 
information about the file system provided.


> config.getResource is giving wrong information in case of 
> FileSystemBasedConfigurationProvider 
> ---
>
> Key: YARN-5624
> URL: https://issues.apache.org/jira/browse/YARN-5624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sunil G
>Assignee: Sunil G
>
> When FileSystemBasedConfigurationProvider is configured, {{conf.getResource}} 
> has to give patch from HDFS. Currently it returns local path.
> Also improve logs during init and in rmAdmin commands to indicate clear 
> information about the FileSystemBasedConfigurationProvider path (hdfs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5624) config.getResource is giving wrong information in case of FileSystemBasedConfigurationProvider

2016-09-07 Thread Sunil G (JIRA)
Sunil G created YARN-5624:
-

 Summary: config.getResource is giving wrong information in case of 
FileSystemBasedConfigurationProvider 
 Key: YARN-5624
 URL: https://issues.apache.org/jira/browse/YARN-5624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Sunil G
Assignee: Sunil G


When FileSystemBasedConfigurationProvider is configures, {{conf.getResource}} 
has to give patch from HDFS. Currently it returns local patch.

Also improve logs during init and in rmAdmin commands to indicate clear 
information about the file system provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471261#comment-15471261
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77868830
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -316,26 +377,12 @@ public Resource assignContainer(FSSchedulerNode node) 
{
   return assigned;
 }
 
-// Apps that have resource demands.
-TreeSet pendingForResourceApps =
-new TreeSet(policy.getComparator());
-readLock.lock();
-try {
-  for (FSAppAttempt app : runnableApps) {
-Resource pending = app.getAppAttemptResourceUsage().getPending();
-if (!pending.equals(Resources.none())) {
-  pendingForResourceApps.add(app);
-}
-  }
-} finally {
-  readLock.unlock();
-}
-for (FSAppAttempt sched : pendingForResourceApps) {
+for (FSAppAttempt sched : fetchAppsWithDemand()) {
   if (SchedulerAppUtils.isPlaceBlacklisted(sched, node, LOG)) {
 continue;
--- End diff --

It would be nice to replace this _continue_ by wrapping the next few lines 
in the _if_.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)

2016-09-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471260#comment-15471260
 ] 

Hadoop QA commented on YARN-5323:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
47s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 25s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12827400/YARN-5323-YARN-2915.11.patch
 |
| JIRA Issue | YARN-5323 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7ec8d32f72dc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / f2985a3 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13033/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13033/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Policies APIs (for Router and AMRMProxy policies)
> -
>
> Key: YARN-5323
> URL: https://issues.apache.org/jira/browse/YARN-5323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5323-YARN-2915.05.patch, 
> 

[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471241#comment-15471241
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77868142
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
+context.getStarvedApps().addStarvedApp(app);
+Resources.addTo(fairShareStarvation, appStarvation);
+  }
+}
+
+// Compute extent of minshare starvation
+Resource minShareStarvation = minShareStarvation();
+
+// Compute minshare starvation that is not subsumed by fairshare 
starvation
+Resources.subtractFrom(minShareStarvation, fairShareStarvation);
+
+// Keep adding apps to the starved list until the unmet demand goes 
over
+// the remaining minshare
+for (FSAppAttempt app : appsWithDemand) {
--- End diff --

If minShareStarvation in 0, you're doing a bunch of looping for no reason.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-07 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4945:
--
Attachment: YARN-2009.v0.patch

Attaching v0 patch.

Thanks [~eepayne] for the comments

bq.keep looking for another container to mark as preemptable. Is that correct?
Yes. My intention was more  or less the same. In this new patch, I used a 
similar approach done in FiFoCandidatesSelector. Please help to check the same.

bq.I think that priority and user-limit-percent preemption policies should be 
separate policies
I was planning to make {{IntraQueueCandidatesSelector}} as a basic framework. 
Then use preemption/usr-limit specific policy to calculate 
{{resourceTOObtain}}. I have made this separation in the new patch. And also 
defined a basic interface for {{IntraQueuePreemtionPolicy}}.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5586) Update the Resources class to consider all resource types

2016-09-07 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-5586:

Attachment: YARN-5586-YARN-3926.002.patch

Good point. The root bug here really in the getVirtualCores function which 
returns an int. The special handling was just to get round that. I've fixed it 
in a cleaner way.

> Update the Resources class to consider all resource types
> -
>
> Key: YARN-5586
> URL: https://issues.apache.org/jira/browse/YARN-5586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-5586-YARN-3926.001.patch, 
> YARN-5586-YARN-3926.002.patch
>
>
> The Resources class provides a bunch of useful functions like clone, addTo, 
> etc. These need to be updated to consider all resource types instead of just 
> memory and cpu.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471191#comment-15471191
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77866033
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -223,17 +225,76 @@ public void setPolicy(SchedulingPolicy policy)
 }
 super.policy = policy;
   }
-  
+
   @Override
-  public void recomputeShares() {
+  public void updateInternal(boolean checkStarvation) {
 readLock.lock();
 try {
   policy.computeShares(runnableApps, getFairShare());
+  if (checkStarvation) {
+identifyStarvedApplications();
+  }
 } finally {
   readLock.unlock();
 }
   }
 
+  /**
+   * Helper method to identify starved applications. This needs to be 
called
+   * ONLY from {@link #updateInternal}, after the application shares
+   * are updated.
+   *
+   * A queue can be starving due to fairshare or minshare.
+   *
+   * Minshare is defined only on the queue and not the applications.
+   * Fairshare is defined for both the queue and the applications.
+   *
+   * If this queue is starved due to minshare, we need to identify the most
+   * deserving apps if they themselves are not starved due to fairshare.
+   *
+   * If this queue is starving due to fairshare, there must be at least
+   * one application that is starved. And, even if the queue is not
+   * starved due to fairshare, there might still be starved applications.
+   */
+  private void identifyStarvedApplications() {
+// First identify starved applications and track total amount of
+// starvation (in resources)
+Resource fairShareStarvation = Resources.clone(none());
+TreeSet appsWithDemand = fetchAppsWithDemand();
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (Resources.equals(Resources.none(), appStarvation))  {
+break;
+  } else {
--- End diff --

Drop the else.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)

2016-09-07 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-5323:
---
Attachment: YARN-5323-YARN-2915.11.patch

> Policies APIs (for Router and AMRMProxy policies)
> -
>
> Key: YARN-5323
> URL: https://issues.apache.org/jira/browse/YARN-5323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-2915
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5323-YARN-2915.05.patch, 
> YARN-5323-YARN-2915.06.patch, YARN-5323-YARN-2915.07.patch, 
> YARN-5323-YARN-2915.08.patch, YARN-5323-YARN-2915.09.patch, 
> YARN-5323-YARN-2915.10.patch, YARN-5323-YARN-2915.11.patch, 
> YARN-5323.01.patch, YARN-5323.02.patch, YARN-5323.03.patch, YARN-5323.04.patch
>
>
> This JIRA tracks APIs for the policies that will guide the Router and 
> AMRMProxy decisions on where to fwd the jobs submission/query requests as 
> well as ResourceRequests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5079) [Umbrella] Native YARN framework layer for services and beyond

2016-09-07 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471170#comment-15471170
 ] 

Gour Saha commented on YARN-5079:
-

I agree with you [~asuresh]. This is what we did for YARN-5505. I think you 
were implicit on this point, but I will explicitly mention here. For bug fixes 
and enhancements, we should start with a SLIDER Jira since standalone Slider 
will be supported for a while, for e.g. SLIDER-1166. Then if the patch contains 
Slider core and client specific code we should file a YARN counterpart like 
YARN-5623 and merge it into yarn-native-services branch as well.

> [Umbrella] Native YARN framework layer for services and beyond
> --
>
> Key: YARN-5079
> URL: https://issues.apache.org/jira/browse/YARN-5079
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> (See overview doc at YARN-4692, modifying and copy-pasting some of the 
> relevant pieces and sub-section 3.3.1 to track the specific sub-item.)
> (This is a companion to YARN-4793 in our effort to simplify the entire story, 
> but focusing on APIs)
> So far, YARN by design has restricted itself to having a very low-­level API 
> that can support any type of application. Frameworks like Apache Hadoop 
> MapReduce, Apache Tez, Apache Spark, Apache REEF, Apache Twill, Apache Helix 
> and others ended up exposing higher level APIs that end­-users can directly 
> leverage to build their applications on top of YARN. On the services side, 
> Apache Slider has done something similar.
> With our current attention on making services first­-class and simplified, 
> it's time to take a fresh look at how we can make Apache Hadoop YARN support 
> services well out of the box. Beyond the functionality that I outlined in the 
> previous sections in the doc on how NodeManagers can be enhanced to help 
> services, the biggest missing piece is the framework itself. There is a lot 
> of very important functionality that a services' framework can own together 
> with YARN in executing services end­-to­-end.
> In this JIRA I propose we look at having a native Apache Hadoop framework for 
> running services natively on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471165#comment-15471165
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77864746
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -64,19 +67,18 @@
   
   // Variables used for preemption
   private long lastTimeAtMinShare;
-  private long lastTimeAtFairShareThreshold;
-  
+
--- End diff --

Spurious newline


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers

2016-09-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471166#comment-15471166
 ] 

Jian He commented on YARN-5620:
---

Thanks Arun, some questions on the API:
- The COMMIT_UPGRADE API: I don’t quite get the necessity of this API. Could 
you explain under what scenario should the user call this API ?
- The ROLLBACK_UPGRADE API: I think it should be able to rollback to any 
previous version, rather than only the immediate previous one. In some sense, 
it’s the same as upgrade.
IMHO, we probably can use one API restartContainer(context) for both upgrade 
and downgrade, if the upgrade fails, user should be notified. And the AM(in our 
case slider), on user’s instruction, can choose to restartContainer with 
different version. 
- Also, Forcing containers to be restarted with previous version if upgrade 
fails may not be suitable in all cases, as often times, containers are upgraded 
in waves. User wants to troubleshoot the failure first before triggering a new 
wave of restarts. 
IMO, as first cut implementation, we can fail the container if upgrade fails. 
we can add retry,  rollback, or release the container as RetryPolicy on failure 
later. your opinion ?

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471161#comment-15471161
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77864635
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -45,16 +45,19 @@
 import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt;
 import org.apache.hadoop.yarn.util.resource.Resources;
 
+import static org.apache.hadoop.yarn.util.resource.Resources.none;
+
 @Private
 @Unstable
 public class FSLeafQueue extends FSQueue {
   private static final Log LOG = LogFactory.getLog(
   FSLeafQueue.class.getName());
+  private FairScheduler scheduler;
--- End diff --

Maybe put the scheduler into the context?


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471149#comment-15471149
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77863924
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -478,6 +492,31 @@ public void clearPreemptedResources() {
 preemptedResources.setVirtualCores(0);
   }
 
+  public boolean canContainerBePreempted(RMContainer container) {
+// Sanity check that the app owns this container
+if (!liveContainers.containsKey(container.getContainerId()) &&
+!newlyAllocatedContainers.contains(container)) {
+  LOG.error("Looking to preempt container " + container +
+  ". Container does not belong to app " + getApplicationId());
+  return false;
+}
+
+// Check if any of the parent queues are not preemptable
+// TODO (KK): Propagate the "preemptable" flag all the way down to the 
app
+// to avoid recursing up every time.
+FSQueue queue = getQueue();
+while (!queue.getQueueName().equals("root")) {
+  if (!queue.isPreemptable()) {
+return false;
+  }
+}
+
+// Check if the app's allocation will be over its fairshare even
+// after preempting this container
+return (Resources.fitsIn(container.getAllocatedResource(),
+Resources.subtract(getResourceUsage(), getFairShare(;
+  }
--- End diff --

Indentation


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-09-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471139#comment-15471139
 ] 

Yufei Gu commented on YARN-4743:


Thanks [~benedict jin]'s valuable information. Nice catch, I will try it later. 
Could you please rewrite it in English? So that more people can understand. 

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>Assignee: Yufei Gu
> Attachments: YARN-4743-cdh5.4.7.patch
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471136#comment-15471136
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77863007
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -442,25 +448,33 @@ public synchronized void resetAllowedLocalityLevel(
 allowedLocalityLevel.put(schedulerKey, level);
   }
 
-  // related methods
-  public void addPreemption(RMContainer container, long time) {
-assert preemptionMap.get(container) == null;
-preemptionMap.put(container, time);
-Resources.addTo(preemptedResources, container.getAllocatedResource());
+  @Override
+  public FSLeafQueue getQueue() {
+return (FSLeafQueue)super.getQueue();
--- End diff --

Seems like this should be more defensive, i.e. check for the type before 
casting.


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5605) Preempt containers (all on one node) to meet the requirement of starved applications

2016-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471118#comment-15471118
 ] 

ASF GitHub Bot commented on YARN-5605:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/124#discussion_r77861935
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
 ---
@@ -535,6 +535,23 @@ public synchronized Resource 
getResource(SchedulerRequestKey schedulerKey) {
   }
 
   /**
+   * Method to return the next resource request to be serviced.
+   *
+   * In the initial implementation, we just pick any {@link 
ResourceRequest}
+   * corresponding to the highest priority.
+   *
+   * @return next {@link ResourceRequest} to allocate resources for.
+   */
+  @Unstable
+  public synchronized ResourceRequest getNextResourceRequest() {
+for (ResourceRequest rr:
+resourceRequestMap.get(schedulerKeys.first()).values()) {
+  return rr;
--- End diff --

Can we please have one exit point?


> Preempt containers (all on one node) to meet the requirement of starved 
> applications
> 
>
> Key: YARN-5605
> URL: https://issues.apache.org/jira/browse/YARN-5605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5605-1.patch
>
>
> Required items:
> # Identify starved applications
> # Identify a node that has enough containers from applications over their 
> fairshare.
> # Preempt those containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5623) Apply SLIDER-1166 to yarn-native-services branch

2016-09-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-5623:

Assignee: Gour Saha

Assigning it to myself as I worked on SLIDER-1166

> Apply SLIDER-1166 to yarn-native-services branch
> 
>
> Key: YARN-5623
> URL: https://issues.apache.org/jira/browse/YARN-5623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
>
> SLIDER-1166 fixes a critical issue in SliderClient when used as a service. It 
> needs to be merged into yarn-native-services branch as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5623) Apply SLIDER-1166 to yarn-native-services branch

2016-09-07 Thread Gour Saha (JIRA)
Gour Saha created YARN-5623:
---

 Summary: Apply SLIDER-1166 to yarn-native-services branch
 Key: YARN-5623
 URL: https://issues.apache.org/jira/browse/YARN-5623
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Gour Saha
 Fix For: yarn-native-services


SLIDER-1166 fixes a critical issue in SliderClient when used as a service. It 
needs to be merged into yarn-native-services branch as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4232) TopCLI console support for HA mode

2016-09-07 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: YARN-4232.004.patch

Attaching patch after update.

> TopCLI console support for HA mode
> --
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch, 0002-YARN-4232.patch, 
> YARN-4232.003.patch, YARN-4232.004.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5616) Clean up WeightAdjuster

2016-09-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471092#comment-15471092
 ] 

Yufei Gu commented on YARN-5616:


Thanks [~kasha] for the review and committing. 

> Clean up WeightAdjuster
> ---
>
> Key: YARN-5616
> URL: https://issues.apache.org/jira/browse/YARN-5616
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-5616.001.patch
>
>
> {{WeightAdjuster}} and its implementation {{NewAppWeightBooster}} are never  
> used. We should clean up these code. 
> Seems like it hasn't got clean when we migrated fair scheduler from MR1 to 
> YARN. The original documentation is here 
> https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >