[jira] [Commented] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails

2016-10-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604291#comment-15604291
 ] 

Akira Ajisaka commented on YARN-5777:
-

I ran git bisect and found HADOOP-7352 broke this.

> TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
> --
>
> Key: YARN-5777
> URL: https://issues.apache.org/jira/browse/YARN-5777
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>
> {noformat}
> Running org.apache.hadoop.yarn.client.cli.TestLogsCLI
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI
> testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI)
>   Time elapsed: 0.199 sec  <<< ERROR!
> java.io.IOException: Invalid directory or I/O error occurred for dir: 
> /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000
> at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169)
> at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519)
> at 
> org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890)
> at 
> org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888)
> at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492)
> at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494)
> at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592)
> at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106)
> at 
> org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604287#comment-15604287
 ] 

Bibin A Chundatt commented on YARN-5773:


{quote}
3. As mentioned by Bibin A Chundatt, when each app fails to get activated due 
to the upper cut of resource limit, one INFO log is emitted (because amLimit is 
0). During recovery, this is costly.  
{quote}
Thanks [~sunilg] for mentioning about logging missed out to mention in my 
earlier comment.
Costly logging during recovery always the amLimit will be zero
{noformat}
  LOG.info("Not activating application " + applicationId
+ " as  amIfStarted: " + amIfStarted + " exceeds amLimit: "
+ amLimit);
{noformat}

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5754) Null check missing for earliest in FifoPolicy

2016-10-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5754:
---
Summary: Null check missing for earliest in FifoPolicy  (was: Variable 
earliest missing null check in computeShares() in FifoPolicy.java)

> Null check missing for earliest in FifoPolicy
> -
>
> Key: YARN-5754
> URL: https://issues.apache.org/jira/browse/YARN-5754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-5754.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails

2016-10-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-5777:

Description: 
{noformat}
Running org.apache.hadoop.yarn.client.cli.TestLogsCLI
Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI
testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI)
  Time elapsed: 0.199 sec  <<< ERROR!
java.io.IOException: Invalid directory or I/O error occurred for dir: 
/Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000
at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148)
at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469)
at 
org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169)
at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519)
at 
org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890)
at 
org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888)
at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492)
at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494)
at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592)
at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348)
at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971)
at 
org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299)
at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106)
at 
org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868)
{noformat}

> TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
> --
>
> Key: YARN-5777
> URL: https://issues.apache.org/jira/browse/YARN-5777
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>
> {noformat}
> Running org.apache.hadoop.yarn.client.cli.TestLogsCLI
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.876 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI
> testFetchApplictionLogsAsAnotherUser(org.apache.hadoop.yarn.client.cli.TestLogsCLI)
>   Time elapsed: 0.199 sec  <<< ERROR!
> java.io.IOException: Invalid directory or I/O error occurred for dir: 
> /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/logs/priority/logs/application_1477371285256_1000
> at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1148)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:469)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:169)
> at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:519)
> at 
> org.apache.hadoop.fs.AbstractFileSystem$1.(AbstractFileSystem.java:890)
> at 
> org.apache.hadoop.fs.AbstractFileSystem.listStatusIterator(AbstractFileSystem.java:888)
> at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1492)
> at org.apache.hadoop.fs.FileContext$22.next(FileContext.java:1487)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1494)
> at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.getRemoteNodeFileDir(LogCLIHelpers.java:592)
> at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:348)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:971)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:299)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:106)
> at 
> org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogsAsAnotherUser(TestLogsCLI.java:868)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5313) TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk

2016-10-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604224#comment-15604224
 ] 

Akira Ajisaka commented on YARN-5313:
-

Now the unit test is failing by another reason, so I filed another jira: 
YARN-5777

> TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk
> ---
>
> Key: YARN-5313
> URL: https://issues.apache.org/jira/browse/YARN-5313
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Xuan Gong
>Priority: Blocker
>
> We have reverted HADOOP-12718 recently which caused this failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5777) TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails

2016-10-24 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created YARN-5777:
---

 Summary: TestLogsCLI#testFetchApplictionLogsAsAnotherUser fails
 Key: YARN-5777
 URL: https://issues.apache.org/jira/browse/YARN-5777
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Akira Ajisaka






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5313) TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk

2016-10-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-5313.
-
Resolution: Not A Problem

Closing this issue because HADOOP-12718 has been reverted.

> TestLogsCLI.testFetchApplictionLogsAsAnotherUser fails in trunk
> ---
>
> Key: YARN-5313
> URL: https://issues.apache.org/jira/browse/YARN-5313
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Xuan Gong
>Priority: Blocker
>
> We have reverted HADOOP-12718 recently which caused this failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5575) Many classes use bare yarn. properties instead of the defined constants

2016-10-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604194#comment-15604194
 ] 

Akira Ajisaka commented on YARN-5575:
-

Hi [~templedf], would you fix checkstyle warnings? I'm +1 if that is addressed. 
Thanks.

> Many classes use bare yarn. properties instead of the defined constants
> ---
>
> Key: YARN-5575
> URL: https://issues.apache.org/jira/browse/YARN-5575
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-5575.001.patch, YARN-5575.002.patch, 
> YARN-5575.003.patch
>
>
> MAPREDUCE-5870 introduced the following line:
> {code}
>   conf.setInt("yarn.cluster.max-application-priority", 10);
> {code}
> It should instead be:
> {code}
>   conf.setInt(YarnConfiguration.MAX_CLUSTER_LEVEL_APPLICATION_PRIORITY, 
> 10);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604144#comment-15604144
 ] 

Sunil G edited comment on YARN-5773 at 10/25/16 4:46 AM:
-

*Issues in Recovery of apps:*
1. activateApplications works under a write lock.
2. If one application is found of overflowing AM resource limit, instead of 
breaking from loop, we continue and play complete apps from 
pendingOrderingPolicy. We may need to iterate all apps because we have apps 
belongs to different partition and pendingOrderingPolicy does not provide any 
order for apps based on partition.
3. As mentioned by [~bibinchundatt], when each app fails to get activated due 
to the upper cut of resource  limit, one INFO log is emitted (because *amLimit* 
is 0). During recovery, this is costly.

[~leftnoteasy] and [~rohithsharma]
bq.If a given app's AM resource amount > AM headroom, should we skip the AM and 
activate following app which AM resource amount <= AM headroom?
bq.But one point to be considered is for each Node registration, head room 
changes. So, user head room changes as new node registered. This need to be 
taken care.
Currently activateApplications is invoked when there is a change in cluster 
resource. So any change in cluster resource will ensure a call to 
activateApplications and we can recalculate this headroom. I am not very sure 
about the suggested map. Will this check be coming before we do the existing AM 
resource percentage check for queue/partition (not user based) ? OR are we 
replacing this checks?


was (Author: sunilg):
*Issues in Recovery of apps:*
1. activateApplications works under a write lock.
2. If one application is found of overflowing AM resource limit, instead of 
breaking from loop, we continue and play complete apps from 
pendingOrderingPolicy. We may need to iterate all apps because we have apps 
belongs to different partition and pendingOrderingPolicy does not provide any 
order for apps based on partition.
3. As mentioned by [~bibinchundatt], when each app fails to get activated due 
to the upper cut of resource  limit, one INFO log is emitted. During recovery, 
this is costly.

[~leftnoteasy] and [~rohithsharma]
bq.If a given app's AM resource amount > AM headroom, should we skip the AM and 
activate following app which AM resource amount <= AM headroom?
bq.But one point to be considered is for each Node registration, head room 
changes. So, user head room changes as new node registered. This need to be 
taken care.
Currently activateApplications is invoked when there is a change in cluster 
resource. So any change in cluster resource will ensure a call to 
activateApplications and we can recalculate this headroom. I am not very sure 
about the suggested map. Will this check be coming before we do the existing AM 
resource percentage check for queue/partition (not user based) ? OR are we 
replacing this checks?

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604144#comment-15604144
 ] 

Sunil G commented on YARN-5773:
---

*Issues in Recovery of apps:*
1. activateApplications works under a write lock.
2. If one application is found of overflowing AM resource limit, instead of 
breaking from loop, we continue and play complete apps from 
pendingOrderingPolicy. We may need to iterate all apps because we have apps 
belongs to different partition and pendingOrderingPolicy does not provide any 
order for apps based on partition.
3. As mentioned by [~bibinchundatt], when each app fails to get activated due 
to the upper cut of resource  limit, one INFO log is emitted. During recovery, 
this is costly.

[~leftnoteasy] and [~rohithsharma]
bq.If a given app's AM resource amount > AM headroom, should we skip the AM and 
activate following app which AM resource amount <= AM headroom?
bq.But one point to be considered is for each Node registration, head room 
changes. So, user head room changes as new node registered. This need to be 
taken care.
Currently activateApplications is invoked when there is a change in cluster 
resource. So any change in cluster resource will ensure a call to 
activateApplications and we can recalculate this headroom. I am not very sure 
about the suggested map. Will this check be coming before we do the existing AM 
resource percentage check for queue/partition (not user based) ? OR are we 
replacing this checks?

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5587) Add support for resource profiles

2016-10-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604055#comment-15604055
 ] 

Arun Suresh commented on YARN-5587:
---

Thanks [~vvasudev]. First pass comments:

# In {{Resources}}, you moved the Suppress deprecation warning from the 
{{setMemorySize(long)}} method to the {{setMemory(int)}}. Was that intentional ?
# {{AMRMClient::ContainerRequest}} : Wondering if we need to allow a Container 
request to specify both a profile name and a Resource (capability). If they do 
specify both, what does that mean ?
# Similarly, in the {{RemoteRequestTable}}, the RR should be keyed using the 
Resource (capability) derived from the profileName.



> Add support for resource profiles
> -
>
> Key: YARN-5587
> URL: https://issues.apache.org/jira/browse/YARN-5587
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-5587-YARN-3926.001.patch, 
> YARN-5587-YARN-3926.002.patch, YARN-5587-YARN-3926.003.patch, 
> YARN-5587-YARN-3926.004.patch, YARN-5587-YARN-3926.005.patch
>
>
> Add support for resource profiles on the RM side to allow users to use 
> shorthands to specify resource requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603964#comment-15603964
 ] 

Rohith Sharma K S commented on YARN-5773:
-

Thanks folks for discussion..
I went through overall above discussion, I have one doubt that How can *RM 
recovery* is too slow? Because in current RM Restart, there are 2 stages.
# Recover : Read all the application data from ZooKeeper and replay it. 
Basically, for running/pending apps, an event will be triggered to scheduler, 
and scheduler has *separate dispatcher* to handle it. 
# Service Start : Once recover process is completed, all the RM services are 
started. 
IICU, RM service is up and able to accept a new requests from clients. So, 
problem is after RM service start, activating applications are being delayed 
because Nodes are not yet registered but not actual recovery. It would be 
better if JIRA summary is updated something like, "Scheduler takes longer time 
for activating recovered apps when RM is restarted" or any other. 

As far as improvement, as wangda suggested may be we can keep Map which would optimize in activateApplication for head room. 
But one point to be considered is  for each Node registration, head room 
changes. So, user head room changes as new node registered. This need to be 
taken care. 

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603947#comment-15603947
 ] 

Bibin A Chundatt commented on YARN-5773:


{quote}
If a given app's AM resource amount > AM headroom, should we skip the AM and 
activate following app which AM resource amount <= AM headroom? 
{quote}
Skip all apps only {{queueUsage.getAMUsed > amLimit}}. Since AM can be from 
different partition and each partition can be have different AM limit so AM 
limit for all partition also have to exceed

Checking both the cases before iterating through all the apps.

{noformat}
  if (!Resources.greaterThan(resourceCalculator, lastClusterResource,
  lastClusterResource, Resources.none())
  && !(getNumActiveApplications() < 1)) {
return;
  }

  Map userAmPartitionLimit =
  new HashMap();

  // AM Resource Limit for accessible labels can be pre-calculated.
  // This will help in updating AMResourceLimit for all labels when queue
  // is initialized for the first time (when no applications are present).
  for (String nodePartition : getNodeLabelsForQueue()) {
calculateAndGetAMResourceLimitPerPartition(nodePartition);
  }

   if(allpatitionLimitexeed()&&!(getNumActiveApplications() < 1)){
   return;
   }
{noformat}



> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-10-24 Thread Zephyr Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603931#comment-15603931
 ] 

Zephyr Guo commented on YARN-4743:
--

[~yufeigu], thanks for reviewing.

{quote}
5. Not sure why startTimeColloection and nameCollection are needed. Can you 
explain a little bit?
{quote}

Because some pieces of codes involve these two variables.
{code:title=FairShareComparator}
  if (res == 0) {
// Apps are tied in fairness ratio. Break the tie by submit time and job
// name to get a deterministic ordering, which is useful for unit tests.
res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
if (res == 0)
  res = s1.getName().compareTo(s2.getName());
  }
{code}

I will submit a new patch this week.

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha1
>Reporter: Zephyr Guo
>Assignee: Zephyr Guo
> Attachments: YARN-4743-v1.patch, YARN-4743-v2.patch, timsort.log
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not 
> transitive.
> We get NaN when memorySize=0 and weight=0.
> {code:title=FairSharePolicy.java}
> useToWeightRatio1 = s1.getResourceUsage().getMemorySize() /
>   s1.getWeights().getWeight(ResourceType.MEMORY)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5776) Checkstyle: MonitoringThread.Run method length is too long

2016-10-24 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603924#comment-15603924
 ] 

Miklos Szegedi commented on YARN-5776:
--

Note: The javac warning in the result file is in ContainerManagerImpl.java not 
in ContainersMonitorImpl.java that I changed

> Checkstyle: MonitoringThread.Run method length is too long
> --
>
> Key: YARN-5776
> URL: https://issues.apache.org/jira/browse/YARN-5776
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Attachments: YARN-5776.000.patch
>
>
> YARN-5725 had a check style violation that should be resolved by refactoring 
> the function
> Details:
> ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method 
> length is 233 lines (max allowed is 150).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5711) Propogate exceptions back to client when using hedging RM failover provider

2016-10-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603925#comment-15603925
 ] 

Hudson commented on YARN-5711:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10669 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10669/])
YARN-5711. Propogate exceptions back to client when using hedging RM (subru: 
rev 0a166b13472213db0a0cd2dfdaddb2b1746b3957)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RequestHedgingRMFailoverProxyProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestHedgingRequestRMFailoverProxyProvider.java


> Propogate exceptions back to client when using hedging RM failover provider
> ---
>
> Key: YARN-5711
> URL: https://issues.apache.org/jira/browse/YARN-5711
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, 
> YARN-5711.v1.1.patch
>
>
> When RM failsover, it does _not_ auto re-register running apps and so they 
> need to re-register when reconnecting to new primary. This is done by 
> catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
> re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
> propagate {{YarnException}} as the actual invocation is done asynchronously 
> using seperate threads, so AMs cannot reconnect to RM after failover. 
> This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
> any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5776) Checkstyle: MonitoringThread.Run method length is too long

2016-10-24 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5776:
-
Summary: Checkstyle: MonitoringThread.Run method length is too long  (was: 
Checkstyle: MonitioringThread.Run method length is too long)

> Checkstyle: MonitoringThread.Run method length is too long
> --
>
> Key: YARN-5776
> URL: https://issues.apache.org/jira/browse/YARN-5776
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Attachments: YARN-5776.000.patch
>
>
> YARN-5725 had a check style violation that should be resolved by refactoring 
> the function
> Details:
> ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method 
> length is 233 lines (max allowed is 150).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603905#comment-15603905
 ] 

Hadoop QA commented on YARN-5776:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 23s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 1 new + 16 unchanged - 1 fixed = 17 total (was 17) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 0 unchanged - 18 fixed = 0 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 7s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 9s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835052/YARN-5776.000.patch |
| JIRA Issue | YARN-5776 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 14bf13a50894 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / dc3272b |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/13495/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13495/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13495/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This 

[jira] [Updated] (YARN-5711) Propogate exceptions back to client when using hedging RM failover provider

2016-10-24 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-5711:
-
Summary: Propogate exceptions back to client when using hedging RM failover 
provider  (was: Propogate exceptions back to client after RM failover when 
using hedging failover provider)

> Propogate exceptions back to client when using hedging RM failover provider
> ---
>
> Key: YARN-5711
> URL: https://issues.apache.org/jira/browse/YARN-5711
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, 
> YARN-5711.v1.1.patch
>
>
> When RM failsover, it does _not_ auto re-register running apps and so they 
> need to re-register when reconnecting to new primary. This is done by 
> catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
> re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
> propagate {{YarnException}} as the actual invocation is done asynchronously 
> using seperate threads, so AMs cannot reconnect to RM after failover. 
> This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
> any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5711) Propogate exceptions back to client after RM failover when using RequestHedgingRMFailoverProxyProvider

2016-10-24 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-5711:
-
Summary: Propogate exceptions back to client after RM failover when using 
RequestHedgingRMFailoverProxyProvider  (was: AM cannot reconnect to RM after 
failover when using RequestHedgingRMFailoverProxyProvider)

> Propogate exceptions back to client after RM failover when using 
> RequestHedgingRMFailoverProxyProvider
> --
>
> Key: YARN-5711
> URL: https://issues.apache.org/jira/browse/YARN-5711
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, 
> YARN-5711.v1.1.patch
>
>
> When RM failsover, it does _not_ auto re-register running apps and so they 
> need to re-register when reconnecting to new primary. This is done by 
> catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
> re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
> propagate {{YarnException}} as the actual invocation is done asynchronously 
> using seperate threads, so AMs cannot reconnect to RM after failover. 
> This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
> any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5711) Propogate exceptions back to client after RM failover when using hedging failover provider

2016-10-24 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-5711:
-
Summary: Propogate exceptions back to client after RM failover when using 
hedging failover provider  (was: Propogate exceptions back to client after RM 
failover when using RequestHedgingRMFailoverProxyProvider)

> Propogate exceptions back to client after RM failover when using hedging 
> failover provider
> --
>
> Key: YARN-5711
> URL: https://issues.apache.org/jira/browse/YARN-5711
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: YARN-5711-v1.patch, YARN-5711-v2.patch, 
> YARN-5711.v1.1.patch
>
>
> When RM failsover, it does _not_ auto re-register running apps and so they 
> need to re-register when reconnecting to new primary. This is done by 
> catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and 
> re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ 
> propagate {{YarnException}} as the actual invocation is done asynchronously 
> using seperate threads, so AMs cannot reconnect to RM after failover. 
> This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate 
> any {{YarnException}} that it encounters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long

2016-10-24 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-5776:
-
Attachment: YARN-5776.000.patch

Remove all relevant checkstyle violations for YARN-5725. Note: no unit test 
changes, the behavior should be identical.

> Checkstyle: MonitioringThread.Run method length is too long
> ---
>
> Key: YARN-5776
> URL: https://issues.apache.org/jira/browse/YARN-5776
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Attachments: YARN-5776.000.patch
>
>
> YARN-5725 had a check style violation that should be resolved by refactoring 
> the function
> Details:
> ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method 
> length is 233 lines (max allowed is 150).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5775) Bug fixes in swagger definition

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603837#comment-15603837
 ] 

Hadoop QA commented on YARN-5775:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
25s {color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 10s 
{color} | {color:red} hadoop-yarn-services-api in yarn-native-services failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 7s 
{color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s 
{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} The patch generated 10 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 10m 33s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835051/YARN-5775-yarn-native-services.001.patch
 |
| JIRA Issue | YARN-5775 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  |
| uname | Linux ff1ac33f8536 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / 023be93 |
| Default Java | 1.8.0_101 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services-api.txt
 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services-api.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13494/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/13494/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13494/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Bug fixes in swagger definition
> ---
>
> Key: YARN-5775
> URL: https://issues.apache.org/jira/browse/YARN-5775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: 

[jira] [Commented] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host

2016-10-24 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603810#comment-15603810
 ] 

Miklos Szegedi commented on YARN-5725:
--

All right, I opened YARN-5776 for the checkstyle violation.

> Test uncaught exception in 
> TestContainersMonitorResourceChange.testContainersResourceChange when setting 
> IP and host
> 
>
> Key: YARN-5725
> URL: https://issues.apache.org/jira/browse/YARN-5725
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-5725.000.patch, YARN-5725.001.patch, 
> YARN-5725.002.patch, YARN-5725.003.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The issue is a warning but it prevents container monitor to continue
> 2016-10-12 14:38:23,280 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - 
> Uncaught exception in ContainersMonitorImpl while monitoring resource of 
> container_123456_0001_01_01
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455)
> 2016-10-12 14:38:23,281 WARN  [Container Monitor] 
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(613)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long

2016-10-24 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5776:


 Summary: Checkstyle: MonitioringThread.Run method length is too 
long
 Key: YARN-5776
 URL: https://issues.apache.org/jira/browse/YARN-5776
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Trivial


YARN-5725 had a check style violation that should be resolved by refactoring 
the function

Details:
ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method length 
is 233 lines (max allowed is 150).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5775) Bug fixes in swagger definition

2016-10-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-5775:

Attachment: YARN-5775-yarn-native-services.001.patch

> Bug fixes in swagger definition
> ---
>
> Key: YARN-5775
> URL: https://issues.apache.org/jira/browse/YARN-5775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-5775-yarn-native-services.001.patch
>
>
> All enums have been listed in lowercase. Need to convert all of them to 
> uppercase.
> For e.g. ContainerState:
> {noformat}
> enum:
>   - init
>   - ready
> {noformat}
> needs to be changed to -
> {noformat}
> enum:
>   - INIT
>   - READY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5556) Support for deleting queues without requiring a RM restart

2016-10-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-5556:

Attachment: YARN-5556.v1.003.patch

Thanks for the review [~templedf], have attached a patch after fixing your 
review comments

> Support for deleting queues without requiring a RM restart
> --
>
> Key: YARN-5556
> URL: https://issues.apache.org/jira/browse/YARN-5556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Xuan Gong
>Assignee: Naganarasimha G R
> Attachments: YARN-5556.v1.001.patch, YARN-5556.v1.002.patch, 
> YARN-5556.v1.003.patch
>
>
> Today, we could add or modify queues without restarting the RM, via a CS 
> refresh. But for deleting queue, we have to restart the ResourceManager. We 
> could support for deleting queues without requiring a RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5775) Bug fixes in swagger definition

2016-10-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-5775:
---

Assignee: Gour Saha

> Bug fixes in swagger definition
> ---
>
> Key: YARN-5775
> URL: https://issues.apache.org/jira/browse/YARN-5775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
>
> All enums have been listed in lowercase. Need to convert all of them to 
> uppercase.
> For e.g. ContainerState:
> {noformat}
> enum:
>   - init
>   - ready
> {noformat}
> needs to be changed to -
> {noformat}
> enum:
>   - INIT
>   - READY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5123) SQL based RM state store

2016-10-24 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603768#comment-15603768
 ] 

Subru Krishnan commented on YARN-5123:
--

[~lavkesh], are you planning to update the patch based on the above discussions 
(and addressing Yetus warnings)? I feel this will be a nice addition, hence 
following up. Thanks.

> SQL based RM state store
> 
>
> Key: YARN-5123
> URL: https://issues.apache.org/jira/browse/YARN-5123
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: 0001-SQL-Based-RM-state-store-trunk.patch, High 
> Availability In YARN Resource Manager using SQL Based StateStore.pdf, 
> sqlstatestore.patch
>
>
> In our setup,  zookeeper based RM state store didn't work. We ended up 
> implementing our own SQL based state store. Here is a patch, if anybody else 
> wants to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5775) Bug fixes in swagger definition

2016-10-24 Thread Gour Saha (JIRA)
Gour Saha created YARN-5775:
---

 Summary: Bug fixes in swagger definition
 Key: YARN-5775
 URL: https://issues.apache.org/jira/browse/YARN-5775
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Gour Saha
 Fix For: yarn-native-services


All enums have been listed in lowercase. Need to convert all of them to 
uppercase.

For e.g. ContainerState:
{noformat}
enum:
  - init
  - ready
{noformat}
needs to be changed to -
{noformat}
enum:
  - INIT
  - READY
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603673#comment-15603673
 ] 

Wangda Tan commented on YARN-4734:
--

[~aw],

Maybe, since you're pretty familiar with AltKerberos stuffs, probably you can 
also share what is the problem on YARN-4006. [~vvasudev] asked you a 
[question|https://issues.apache.org/jira/browse/YARN-4006?focusedCommentId=15297651=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15297651]
 but you may missed it.

I will add a note to the doc to say running under security environment is not 
tested, which includes the AltKerberos setup I think.

Anyway, I think our goal is to make sure it doesn't break any other components, 
plz let us know if you see any critical issues for the merge.

Thanks,



> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, 
> YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-10-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603642#comment-15603642
 ] 

Allen Wittenauer commented on YARN-4734:


Oh oh oh.  That means this likely won't work with AltKerberos deployments since 
YARN REST is completely broken with it too. The docs will need an update to 
mention that.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, 
> YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0.

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Description: 
MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because 
there is no resource request for the AM. This happened when you configure 
{{yarn.scheduler.minimum-allocation-mb}} to zero.

The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
{{scheduler.increment-allocation-mb}} is a concept in FS, but not CS. So the 
common code in class RMAppManager passes the 
{{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is no 
incremental one for CS when it tried to normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}

  was:
MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because 
there is no resource request for the AM. This happened when you configure 
{{yarn.scheduler.minimum-allocation-mb}} to zero.

The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}


> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if 
> {{yarn.scheduler.minimum-allocation-mb}} is 0.
> 
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. This happened when you 
> configure {{yarn.scheduler.minimum-allocation-mb}} to zero.
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. {{scheduler.increment-allocation-mb}} is a concept in FS, but not 
> CS. So the common code in class RMAppManager passes the 
> {{yarn.scheduler.minimum-allocation-mb}} as incremental one because there is 
> no incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603469#comment-15603469
 ] 

Hadoop QA commented on YARN-5767:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 58s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 4s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835025/YARN-5767-trunk-v3.patch
 |
| JIRA Issue | YARN-5767 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 815705b05614 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9d17585 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13493/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13493/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 

[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5767:
---
Attachment: YARN-5767-trunk-v3.patch

V3 attached. Fixed whitespace.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, 
> YARN-5767-trunk-v3.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603373#comment-15603373
 ] 

Varun Saxena commented on YARN-5773:


[~sunilg], we still need to add the apps to pendingOrderingPolicy. Its just 
that there is no need of running over all the pending apps on recovery of each 
unfinished app as NMs' have not yet registered (they wont till recovery 
finishes). Iterating over all the apps on recovery of each unfinished app I 
feel is unnecessary as it will time and again hit the same condition and will 
be unable to activate application.

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603367#comment-15603367
 ] 

Hadoop QA commented on YARN-5767:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 160 unchanged - 27 fixed = 160 total (was 187) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 53s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 42s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835016/YARN-5767-trunk-v2.patch
 |
| JIRA Issue | YARN-5767 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2f2b8c5e8486 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a1a0281 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/13492/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13492/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13492/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: 

[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0.

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Summary: MR Job stuck in ACCEPTED status without any progress in Fair 
Scheduler if {{yarn.scheduler.minimum-allocation-mb}} is 0.  (was: MR Job stuck 
in ACCEPTED status without any progress in Fair Scheduler)

> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if 
> {{yarn.scheduler.minimum-allocation-mb}} is 0.
> 
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. This happened when you 
> configure {{yarn.scheduler.minimum-allocation-mb}} to zero.
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. 
> So the common code in class RMAppManager passes the 
> yarn.scheduler.minimum-allocation-mb as incremental one because there is no 
> incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Description: 
MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because 
there is no resource request for the AM. This happened when you configure 
{{yarn.scheduler.minimum-allocation-mb}} to zero.

The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}

  was:
MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because 
there is no resource request for the AM. 

The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}


> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler
> --
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. This happened when you 
> configure {{yarn.scheduler.minimum-allocation-mb}} to zero.
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. 
> So the common code in class RMAppManager passes the 
> yarn.scheduler.minimum-allocation-mb as incremental one because there is no 
> incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.

2016-10-24 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5765:
-
Summary: LinuxContainerExecutor creates appcache and its subdirectories 
with wrong group owner.  (was: LinuxContainerExecutor creates appcache/{appId} 
with wrong group owner.)

> LinuxContainerExecutor creates appcache and its subdirectories with wrong 
> group owner.
> --
>
> Key: YARN-5765
> URL: https://issues.apache.org/jira/browse/YARN-5765
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
>
> LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with 
> wrong group owner, causing Log aggregation and ShuffleHandler to fail because 
> node manager process does not have permission to read the files under the 
> directory.
> This can be easily reproduced by enabling LCE and submitting a MR example 
> job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Summary: MR Job stuck in ACCEPTED status without any progress in Fair 
Scheduler  (was: A wrong parameter is passed when normalizing resource requests)

> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler
> --
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. 
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. 
> So the common code in class RMAppManager passes the 
> yarn.scheduler.minimum-allocation-mb as incremental one because there is no 
> incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5774) A wrong parameter is passed when normalizing resource requests

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Description: 
MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because 
there is no resource request for the AM. 

The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}

  was:
The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}


> A wrong parameter is passed when normalizing resource requests
> --
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler 
> because there is no resource request for the AM. 
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. 
> So the common code in class RMAppManager passes the 
> yarn.scheduler.minimum-allocation-mb as incremental one because there is no 
> incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5774) A wrong parameter is passed when normalizing resource requests

2016-10-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5774:
---
Description: 
The problem is in the code used by both Capacity Scheduler and Fair Scheduler. 
scheduler.increment-allocation-mb is a concept in FS, but not CS. So the common 
code in class RMAppManager passes the yarn.scheduler.minimum-allocation-mb as 
incremental one because there is no incremental one for CS when it tried to 
normalize the resource requests.
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}

  was:
{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}


> A wrong parameter is passed when normalizing resource requests
> --
>
> Key: YARN-5774
> URL: https://issues.apache.org/jira/browse/YARN-5774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> The problem is in the code used by both Capacity Scheduler and Fair 
> Scheduler. scheduler.increment-allocation-mb is a concept in FS, but not CS. 
> So the common code in class RMAppManager passes the 
> yarn.scheduler.minimum-allocation-mb as incremental one because there is no 
> incremental one for CS when it tried to normalize the resource requests.
> {code}
>  SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>   scheduler.getClusterResource(),
>   scheduler.getMinimumResourceCapability(),
>   scheduler.getMaximumResourceCapability(),
>   scheduler.getMinimumResourceCapability());  --> incrementResource 
> should be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache/{appId} with wrong group owner.

2016-10-24 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603320#comment-15603320
 ] 

Haibo Chen commented on YARN-5765:
--

I believe this is broken by YARN-5287. 

"chmod clears the set-group-ID bit of a regular file if the file's group ID 
does not match the user's effective group ID or one of the user's supplementary 
group IDs, unless the user has appropriate privileges. " According to linux man 
page. This is inline with the reproduction setup I had. 

Walking through the container-executor.c code, {nm_root}/usercache/{userName} 
is created with correct permission with the group owner being that of the nm 
process and Setgid set. However, in create_validate_dir(),  "mkdir(npath, perm) 
!= 0" returns false on directory {nm_root}/usercache/{userName}/appcache, so 
chmod(npath, perm) is executed on the directory, clearing the Setgid Bits. 
Consequentially, all directories/files created under the appcache directory 
have the wrong group owner. 

The container working directory is also created with the same code, therefore, 
having wrong group owner as well.



> LinuxContainerExecutor creates appcache/{appId} with wrong group owner.
> ---
>
> Key: YARN-5765
> URL: https://issues.apache.org/jira/browse/YARN-5765
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
>
> LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with 
> wrong group owner, causing Log aggregation and ShuffleHandler to fail because 
> node manager process does not have permission to read the files under the 
> directory.
> This can be easily reproduced by enabling LCE and submitting a MR example 
> job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches

2016-10-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-5767:
---
Attachment: YARN-5767-trunk-v2.patch

Attaching a v2 patch for trunk. This new version simply fixes checkstyles and 
findbugs. Here is a summary:

# Add javadoc comments and fix spacing.
# Add a hashcode method and serializable interface to 
{{LocalCacheCleaner#LRUComparator}}.

> Fix the order that resources are cleaned up from the local Public/Private 
> caches
> 
>
> Key: YARN-5767
> URL: https://issues.apache.org/jira/browse/YARN-5767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can 
> see that public resources are added to the {{ResourceRetentionSet}} first 
> followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
> new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
> retain.addResources(t);
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
> }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see 
> that this means public resources are deleted first until the target cache 
> size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
> currentSize += resource.getSize();
> if (resource.getRefCount() > 0) {
>   // always retain resources in use
>   continue;
> }
> retain.put(resource, newTracker);
>   }
>   for (Iterator> i =
>  retain.entrySet().iterator();
>currentSize - delSize > targetSize && i.hasNext();) {
> Map.Entry rsrc = i.next();
> LocalizedResource resource = rsrc.getKey();
> LocalResourcesTracker tracker = rsrc.getValue();
> if (tracker.remove(resource, delService)) {
>   delSize += resource.getSize();
>   i.remove();
> }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in 
> the cases where:
> # The cache size is larger than the target cache size and the public cache is 
> empty.
> # The cache size is larger than the target cache size and everything in the 
> public cache is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared 
> cache), this means that the most commonly used resources can be deleted 
> before old resources in the private cache. Furthermore, the private cache can 
> continue to grow over time causing more and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since 
> resources are added to the retention set on a user by user basis, resources 
> will get cleaned up one user at a time in the order that privateRsrc.values() 
> returns the LocalResourcesTracker. So if user1 has 10MB in their cache and 
> user2 has 100MB in their cache and the target size of the cache is 50MB, 
> user1 could potentially have their entire cache removed before anything is 
> deleted from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603244#comment-15603244
 ] 

Wangda Tan commented on YARN-2009:
--

Thanks for update [~sunilg],

I have one doubt: should we deduct sum of all am-used for each user from 
user-limit? Behavior in the patch is deducting sum of all am-used across users 
in the queue.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, 
> YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5774) A wrong parameter is passed when normalizing resource requests

2016-10-24 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-5774:
--

 Summary: A wrong parameter is passed when normalizing resource 
requests
 Key: YARN-5774
 URL: https://issues.apache.org/jira/browse/YARN-5774
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0-alpha1
Reporter: Yufei Gu
Assignee: Yufei Gu


{code}
 SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
  scheduler.getClusterResource(),
  scheduler.getMinimumResourceCapability(),
  scheduler.getMaximumResourceCapability(),
  scheduler.getMinimumResourceCapability());  --> incrementResource 
should be passed here.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603166#comment-15603166
 ] 

Wangda Tan commented on YARN-5773:
--

I feel we may need a overhaul to existing activateApplication:

If we describe what activateApplications target to solve:
A set of pending applications in a queue, each application belongs to one user, 
different application has different AM request, each user has a quota, and 
queue has a total quota, get which application will be activated.

There's an additional questions:
If a given app's AM resource amount > AM headroom, should we skip the AM and 
activate following app which AM resource amount <= AM headroom? 

If answer to the above question yes, we can maintain a map: Map, when doing application activation, we don't need to check 
all the apps, instead we only need to check each user once in most cases.

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603128#comment-15603128
 ] 

Hadoop QA commented on YARN-5716:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 58s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 139 
new + 1470 unchanged - 164 fixed = 1609 total (was 1634) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} hadoop-yarn-project_hadoop-yarn generated 0 new + 6484 
unchanged - 10 fixed = 6484 total (was 6494) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 0 new + 928 unchanged - 10 fixed = 928 total (was 938) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 30s {color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 35m 30s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization |
|   | hadoop.yarn.server.TestContainerManagerSecurity |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA 

[jira] [Comment Edited] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603108#comment-15603108
 ] 

Wangda Tan edited comment on YARN-4734 at 10/24/16 8:28 PM:


Thanks [~aw],

bq. A question. Given the ... circumstances... lately of patches going into 
YARN, what's the security status of this branch?

Offline double confirmed with [~sunilg] / [~hsreenath] about your question and 
following answers:

a. Existing security status:
Current the new UI on the same HttpServer2 instance which hosts REST service / 
old UI, so we should be able to get security support from existing logics. 
However, before we can do sufficient tests for security support, I would prefer 
to suggest user do not expect security support for the UI for now.

b. Any possible vulnerabilities?
- This feature can be completely disabled, new added code are all packaged 
inside a war file. When this feature disabled, we are not even placing the WAR 
file in class path where jersey server will extract it.
- As you know ours new UI is not a conventional web application, its an SPA 
(Single Page application). In conventional apps there were server side code 
that had to consider security.
Our app just uses REST APIs to get data from the server. In other words, every 
hack that an user could possible do with the UI, he would be able to do it 
using other tools like Postman. The user can also inject code from the console 
and tweek the UI functionality.
What is basically implies is that its not worth to worry about security at the 
UI side :)
Instead we just need to ensure that the REST end points are secure.

bq. Has anyone done an audit? (Web security is outside my area of expertise, so 
I'd prefer another set of eyes on this one.)

Many folks have looked at new added code and we believe it is safe. It is more 
than welcome that if you or any other folks want to do this check, just let us 
know if you have any questions/concerns.


was (Author: leftnoteasy):
Thanks [~aw],

bq. A question. Given the ... circumstances... lately of patches going into 
YARN, what's the security status of this branch?

Offline double confirmed with [~sunilg] / [~hsreenath] about your question and 
following answers:

a. Existing security status:
Current the new UI on the same HttpServer2 instance which hosts REST service / 
old UI, so we should be able to get security support from existing logics. 
However, before we can do sufficient tests for security support, I would prefer 
to suggest user do not expect security support for the UI for now.

b. Any possible vulnerabilities?

1)
This feature can be completely disabled, new added code are all packaged inside 
a war file. When this feature disabled, we are not even placing the WAR file in 
class path where jersey server will extract it.

2)
As you know ours new UI is not a conventional web application, its an SPA 
(Single Page application). In conventional apps there were server side code 
that had to consider security.
Our app just uses REST APIs to get data from the server. In other words, every 
hack that an user could possible do with the UI, he would be able to do it 
using other tools like Postman. The user can also inject code from the console 
and tweek the UI functionality.

What is basically implies is that its not worth to worry about security at the 
UI side :)
Instead we just need to ensure that the REST end points are secure.

bq. Has anyone done an audit? (Web security is outside my area of expertise, so 
I'd prefer another set of eyes on this one.)

Many folks have looked at new added code and we believe it is safe. It is more 
than welcome that if you or any other folks want to do this check, just let us 
know if you have any questions/concerns.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, 
> YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603108#comment-15603108
 ] 

Wangda Tan commented on YARN-4734:
--

Thanks [~aw],

bq. A question. Given the ... circumstances... lately of patches going into 
YARN, what's the security status of this branch?

Offline double confirmed with [~sunilg] / [~hsreenath] about your question and 
following answers:

a. Existing security status:
Current the new UI on the same HttpServer2 instance which hosts REST service / 
old UI, so we should be able to get security support from existing logics. 
However, before we can do sufficient tests for security support, I would prefer 
to suggest user do not expect security support for the UI for now.

b. Any possible vulnerabilities?

1)
This feature can be completely disabled, new added code are all packaged inside 
a war file. When this feature disabled, we are not even placing the WAR file in 
class path where jersey server will extract it.

2)
As you know ours new UI is not a conventional web application, its an SPA 
(Single Page application). In conventional apps there were server side code 
that had to consider security.
Our app just uses REST APIs to get data from the server. In other words, every 
hack that an user could possible do with the UI, he would be able to do it 
using other tools like Postman. The user can also inject code from the console 
and tweek the UI functionality.

What is basically implies is that its not worth to worry about security at the 
UI side :)
Instead we just need to ensure that the REST end points are secure.

bq. Has anyone done an audit? (Web security is outside my area of expertise, so 
I'd prefer another set of eyes on this one.)

Many folks have looked at new added code and we believe it is safe. It is more 
than welcome that if you or any other folks want to do this check, just let us 
know if you have any questions/concerns.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, 
> YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602941#comment-15602941
 ] 

Bibin A Chundatt commented on YARN-5773:


[~sunilg]
Till then, only one app will activated and rest all apps will be in pending 
state. 
- So for N-1 application the AM check happens about (N-1)(N-2)/2 rt? Which we 
are sure that will not be satisfied since registration happens later. Correct 
me if i am wrong. So all those apps its not required to check for AM limit rt?

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602907#comment-15602907
 ] 

Sunil G commented on YARN-5773:
---

bq.1.If cluster resource is zero don't check AM limit. 2. Skip all apps if 
queue's AM limit is reached.
I am not so sure about this. {{recover}} happens first for all apps and 
{{Recover}} event will be fired for all apps. {{serviceStart}} happens later, 
so NMs will be connected to RM later. Till then, only one app will activated 
and rest all apps will be in pending state. As NMs are up/registered, remaining 
apps will become activated from {{pendingOrderingPolicy}}.

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602871#comment-15602871
 ] 

Bibin A Chundatt commented on YARN-5773:


Thank you [~leftnoteasy] for review comment.

{quote}
I'm not sure if this is safe: activeApplication is majorly to avoid too many 
applications are running inside one queue. if we skip the AM limit check for 
recovering apps, it looks like some problem may occur. apps,
{quote}
Yes.we should not skip activate application.

RM restart issue with too many pending apps was the main intention of this 
jira. If too many pending apps in leaf queue and RM is restarted for each app 
attempt submit event the Leaf#activateApplication() gets invoked and for each 
pending apps the am limit is checked. Restart time increases as the number of 
apps increases consuming too much time on restart.

Will handle following two
# If cluster resource is zero don't check AM limit.
# Skip all apps if queue's AM limit is reached.
Will upload a patch soon

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602841#comment-15602841
 ] 

Hadoop QA commented on YARN-4597:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 17s {color} 
| {color:red} root generated 2 new + 701 unchanged - 2 fixed = 703 total (was 
703) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 40s 
{color} | {color:red} root: The patch generated 7 new + 1018 unchanged - 15 
fixed = 1025 total (was 1033) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 236 unchanged - 1 fixed = 236 total (was 237) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| 

[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602831#comment-15602831
 ] 

Hadoop QA commented on YARN-5773:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 209 unchanged - 0 fixed = 210 total (was 209) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 40s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12834978/YARN-5773.0002.patch |
| JIRA Issue | YARN-5773 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e6af7d98acab 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b18f35f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13490/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| javadoc | 

[jira] [Resolved] (YARN-5750) YARN-4126 broke Oozie on unsecure cluster

2016-10-24 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter resolved YARN-5750.
-
Resolution: Duplicate

YARN-4126 has been reverted from branch-2 and 2.8.  It's now only in 3, where 
it's okay to break this.

> YARN-4126 broke Oozie on unsecure cluster
> -
>
> Key: YARN-5750
> URL: https://issues.apache.org/jira/browse/YARN-5750
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Peter Cseh
>
> Oozie is using a DummyRenewer on unsecure clusters and can't submit workflows 
> on an unsecure cluster after YARN-4126.
> {noformat}
> org.apache.oozie.action.ActionExecutorException: JA009: 
> org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: 
> Delegation Token can be issued only with kerberos authentication
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1092)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:335)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:515)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:663)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2423)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2419)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2419)
> Caused by: java.io.IOException: Delegation Token can be issued only with 
> kerberos authentication
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1065)
>   ... 10 more
>   at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:457)
>   at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:437)
>   at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1128)
>   at 
> org.apache.oozie.action.hadoop.TestJavaActionExecutor.submitAction(TestJavaActionExecutor.java:343)
>   at 
> org.apache.oozie.action.hadoop.TestJavaActionExecutor.submitAction(TestJavaActionExecutor.java:363)
>   at 
> org.apache.oozie.action.hadoop.TestJavaActionExecutor.testKill(TestJavaActionExecutor.java:602)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at junit.framework.TestCase.runTest(TestCase.java:168)
>   at junit.framework.TestCase.runBare(TestCase.java:134)
>   at junit.framework.TestResult$1.protect(TestResult.java:110)
>   at junit.framework.TestResult.runProtected(TestResult.java:128)
>   at junit.framework.TestResult.run(TestResult.java:113)
>   at junit.framework.TestCase.run(TestCase.java:124)
>   at junit.framework.TestSuite.runTest(TestSuite.java:232)
>   at junit.framework.TestSuite.run(TestSuite.java:227)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:24)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: 
> org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: 
> Delegation Token can be issued only with kerberos authentication
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1092)
>   at 
> 

[jira] [Updated] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2016-10-24 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4126:

Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)
Release Note: Yarn now only issues and allows delegation tokens in secure 
clusters.  Clients should no longer request delegation tokens in a non-secure 
cluster, or they'll receive an IOException.

I've also marked this as incompatible and put something in the Release Note 
field.  I'll also close YARN-5750.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Fix For: 3.0.0-alpha1
>
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, 
> 0006-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2016-10-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602796#comment-15602796
 ] 

Robert Kanter commented on YARN-4126:
-

Thanks!

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Fix For: 3.0.0-alpha1
>
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, 
> 0006-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5369) Improve Yarn logs command to get container logs based on Node Id

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602794#comment-15602794
 ] 

Hadoop QA commented on YARN-5369:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
0s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 
new + 100 unchanged - 1 fixed = 101 total (was 101) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 31s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12834980/YARN-5369.5.patch |
| JIRA Issue | YARN-5369 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 51f67dd2227c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b18f35f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13489/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13489/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit test logs |  

[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602776#comment-15602776
 ] 

Wangda Tan commented on YARN-5773:
--

Thanks [~bibinchundatt] for reporting and working on this issue.

I'm not sure if this is safe: activeApplication is majorly to avoid too many 
applications are running inside one queue. if we skip the AM limit check for 
recovering apps, it looks like some problem may occur. For example, a cluster 
with 4K nodes and then restart only left 2K nodes, should we activate only some 
of the original submitted apps? 

In my mind we need to optimize activeApplications method, now it scan through 
all pending apps inside the queue under all conditions. We should be able to 
optimize this, for example, skip all apps if queue's AM limit reached.

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.

2016-10-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5716:
-
Attachment: YARN-5716.008.patch

[~sunilg], make sense.

I just uploaded ver.8 patch, which removed all set interfaces from new added 
APIs. And I also added a comment to ResourceCommitRequest constructor to 
explain the to-release resource behavior.

> Add global scheduler interface definition and update CapacityScheduler to use 
> it.
> -
>
> Key: YARN-5716
> URL: https://issues.apache.org/jira/browse/YARN-5716
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5716.001.patch, YARN-5716.002.patch, 
> YARN-5716.003.patch, YARN-5716.004.patch, YARN-5716.005.patch, 
> YARN-5716.006.patch, YARN-5716.007.patch, YARN-5716.008.patch
>
>
> Target of this JIRA:
> - Definition of interfaces / objects which will be used by global scheduling, 
> this will be shared by different schedulers.
> - Modify CapacityScheduler to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5369) Improve Yarn logs command to get container logs based on Node Id

2016-10-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-5369:

Attachment: YARN-5369.5.patch

> Improve Yarn logs command to get container logs based on Node Id
> 
>
> Key: YARN-5369
> URL: https://issues.apache.org/jira/browse/YARN-5369
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5369.1.patch, YARN-5369.2.patch, YARN-5369.3.patch, 
> YARN-5369.4.patch, YARN-5369.5.patch
>
>
> It is helpful if we could have yarn logs --applicationId appId --nodeAddress 
> ${nodeId} to get all the container logs which ran on the specific nm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602665#comment-15602665
 ] 

Hadoop QA commented on YARN-5773:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 210 unchanged - 0 fixed = 211 total (was 210) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 3 new + 938 unchanged - 0 fixed = 941 total (was 938) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 35m 22s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 39s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12834967/YARN-5773.0001.patch |
| JIRA Issue | YARN-5773 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 03b6c1d73bc8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b18f35f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/13488/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13488/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5773:
---
Attachment: YARN-5773.0002.patch

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602622#comment-15602622
 ] 

Akira Ajisaka commented on YARN-5772:
-

LGTM, +1.

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5587) Add support for resource profiles

2016-10-24 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602519#comment-15602519
 ] 

Varun Vasudev commented on YARN-5587:
-

[~leftnoteasy], [~asuresh] - can you take a look at the latest patch? It adds 
the core resource profile functionality. Thanks!

> Add support for resource profiles
> -
>
> Key: YARN-5587
> URL: https://issues.apache.org/jira/browse/YARN-5587
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-5587-YARN-3926.001.patch, 
> YARN-5587-YARN-3926.002.patch, YARN-5587-YARN-3926.003.patch, 
> YARN-5587-YARN-3926.004.patch, YARN-5587-YARN-3926.005.patch
>
>
> Add support for resource profiles on the RM side to allow users to use 
> shorthands to specify resource requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5770) Performance improvement of native-services REST API service

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602512#comment-15602512
 ] 

Hadoop QA commented on YARN-5770:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 52s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
15s {color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 10s 
{color} | {color:red} hadoop-yarn-services-api in yarn-native-services failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core
 in yarn-native-services has 314 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 29s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api
 in yarn-native-services has 5 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-slider-core in yarn-native-services failed. 
{color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications: 
The patch generated 3 new + 460 unchanged - 6 fixed = 463 total (was 466) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 6s 
{color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api
 generated 0 new + 1 unchanged - 4 fixed = 1 total (was 5) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 1s 
{color} | {color:red} hadoop-yarn-slider-core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 52s {color} 
| {color:red} hadoop-yarn-slider-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s 
{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 45s 
{color} | {color:red} The patch generated 10 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
slider.core.registry.docstore.TestPublishedConfigurationOutputter |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-10-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602497#comment-15602497
 ] 

Allen Wittenauer commented on YARN-4734:


Just got back from a trip. I'll try to take a look at this over the next few 
days.

A question.  Given the ... circumstances... lately of patches going into YARN, 
what's the security status of this branch?  Has anyone done an audit?  (Web 
security is outside my area of expertise, so I'd prefer another set of eyes on 
this one.)

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.10-NOT_READY.patch, 
> YARN-4734.11-NOT_READY.patch, YARN-4734.12-NOT_READY.patch, 
> YARN-4734.13.patch, YARN-4734.14.patch, YARN-4734.15.patch, 
> YARN-4734.2.patch, YARN-4734.3.patch, YARN-4734.4.patch, YARN-4734.5.patch, 
> YARN-4734.6.patch, YARN-4734.7.patch, YARN-4734.8.patch, 
> YARN-4734.9-NOT_READY.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5773:
---
Attachment: YARN-5773.0001.patch

Attaching patch for the same. Capacity scheduler on recovery provides whether 
attempts is of type recovery or not. Skipping LeafQueue#activateApplication() 
when the attempt is of type recovery.

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5773.0001.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5770) Performance improvement of native-services REST API service

2016-10-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-5770:

Attachment: YARN-5770-yarn-native-services.phase1.002.patch

> Performance improvement of native-services REST API service
> ---
>
> Key: YARN-5770
> URL: https://issues.apache.org/jira/browse/YARN-5770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-5770-yarn-native-services.phase1.001.patch, 
> YARN-5770-yarn-native-services.phase1.002.patch
>
>
> Make enhancements and bug-fixes to eliminate frequent full GC of the REST API 
> Service. Dependent on few Slider fixes like SLIDER-1168 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5770) Performance improvement of native-services REST API service

2016-10-24 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602408#comment-15602408
 ] 

Gour Saha edited comment on YARN-5770 at 10/24/16 3:59 PM:
---

Uploading 002 patch with 2 findbugs fixes.


was (Author: gsaha):
Uploading 002 patch with 2 findbug fixes.

> Performance improvement of native-services REST API service
> ---
>
> Key: YARN-5770
> URL: https://issues.apache.org/jira/browse/YARN-5770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-5770-yarn-native-services.phase1.001.patch, 
> YARN-5770-yarn-native-services.phase1.002.patch
>
>
> Make enhancements and bug-fixes to eliminate frequent full GC of the REST API 
> Service. Dependent on few Slider fixes like SLIDER-1168 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5770) Performance improvement of native-services REST API service

2016-10-24 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602408#comment-15602408
 ] 

Gour Saha commented on YARN-5770:
-

Uploading 002 patch with 2 findbug fixes.

> Performance improvement of native-services REST API service
> ---
>
> Key: YARN-5770
> URL: https://issues.apache.org/jira/browse/YARN-5770
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
> Fix For: yarn-native-services
>
> Attachments: YARN-5770-yarn-native-services.phase1.001.patch, 
> YARN-5770-yarn-native-services.phase1.002.patch
>
>
> Make enhancements and bug-fixes to eliminate frequent full GC of the REST API 
> Service. Dependent on few Slider fixes like SLIDER-1168 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602390#comment-15602390
 ] 

Sunil G commented on YARN-5716:
---

Thanks [~leftnoteasy].

bq.do you want to add comments to indicate it should be a read-only class or 
you want to remove writing APIs from these classes?
I was expecting to remove setter api's from this interface. Thoughts?

bq.continuous-reservation-looking
I think the code is slightly complicated, but functionality seems fine. I am 
checking lazy-preemption now.

> Add global scheduler interface definition and update CapacityScheduler to use 
> it.
> -
>
> Key: YARN-5716
> URL: https://issues.apache.org/jira/browse/YARN-5716
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5716.001.patch, YARN-5716.002.patch, 
> YARN-5716.003.patch, YARN-5716.004.patch, YARN-5716.005.patch, 
> YARN-5716.006.patch, YARN-5716.007.patch
>
>
> Target of this JIRA:
> - Definition of interfaces / objects which will be used by global scheduling, 
> this will be shared by different schedulers.
> - Modify CapacityScheduler to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5690) Integrate native services modules into maven build

2016-10-24 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602350#comment-15602350
 ] 

Billie Rinaldi commented on YARN-5690:
--

This could be due to log4j settings; the slider command uses log4j for output. 
Are you using the default log4j.properties?

> Integrate native services modules into maven build
> --
>
> Key: YARN-5690
> URL: https://issues.apache.org/jira/browse/YARN-5690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-5690-yarn-native-services.001.patch, 
> YARN-5690-yarn-native-services.002.patch, 
> YARN-5690-yarn-native-services.003.patch
>
>
> The yarn dist assembly should include jars for the new modules as well as 
> their new dependencies. We may want to create new lib directories in the 
> tarball for the dependencies of the slider-core and services API modules, to 
> avoid adding these dependencies into the general YARN classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()

2016-10-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5773:
---
Summary: RM recovery too slow due to LeafQueue#activateApplication()  (was: 
Skip LeafQueue#activateApplication for running application on recovery)

> RM recovery too slow due to LeafQueue#activateApplication()
> ---
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-24 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4597:
--
Attachment: YARN-4597.005.patch

Updated patch v005 (also updated the Pull Request).

* Added testcase to verify the situation noted by [~jianhe] is correctly 
handled and does not happen:
bq. The logic to select opportunisitic container: we may kill more 
opportunistic containers than required. e.g...
* Added some more javadocs and fixed some checkstyles.
* Rebased with trunk.

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery

2016-10-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974
 ] 

Varun Saxena edited comment on YARN-5773 at 10/24/16 1:30 PM:
--

Thanks [~bibinchundatt] for filing the JIRA.
Agree that we do not need to iterate over all the pending apps on recovery as 
NMs' are not yet registered.
If there are large number of running apps, RM unnecessarily spends quite a bit 
of time in this loop.

Applications can be activated as and when NMs' register again.




was (Author: varun_saxena):
Thanks [~bibinchundatt] for filing the JIRA.
Agree that we do not need to iterate over all the pending apps on recovery as 
NMs' are not yet registered.
If there are large number of running apps, RM unnecessarily spends quite a bit 
of time in this loop.

Applications can be activated as and when nodes are added.



> Skip LeafQueue#activateApplication for running application on recovery
> --
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery

2016-10-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974
 ] 

Varun Saxena edited comment on YARN-5773 at 10/24/16 1:30 PM:
--

Thanks [~bibinchundatt] for filing the JIRA.
Agree that we do not need to iterate over all the pending apps on recovery as 
NMs' are not yet registered.
If there are large number of running apps, RM unnecessarily spends quite a bit 
of time in this loop.

Applications can be activated as and when NMs' register again.




was (Author: varun_saxena):
Thanks [~bibinchundatt] for filing the JIRA.
Agree that we do not need to iterate over all the pending apps on recovery as 
NMs' are not yet registered.
If there are large number of running apps, RM unnecessarily spends quite a bit 
of time in this loop.

Applications can be activated as and when NMs' register again.



> Skip LeafQueue#activateApplication for running application on recovery
> --
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery

2016-10-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601974#comment-15601974
 ] 

Varun Saxena commented on YARN-5773:


Thanks [~bibinchundatt] for filing the JIRA.
Agree that we do not need to iterate over all the pending apps on recovery as 
NMs' are not yet registered.
If there are large number of running apps, RM unnecessarily spends quite a bit 
of time in this loop.

Applications can be activated as and when nodes are added.



> Skip LeafQueue#activateApplication for running application on recovery
> --
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601944#comment-15601944
 ] 

Bibin A Chundatt commented on YARN-5773:


*Solution*
The following code to skip {{activateApplication()}} on recovery solved the 
problem.
{noformat}
private synchronized void activateApplications() {
if (!Resources.greaterThan(resourceCalculator, lastClusterResource,
lastClusterResource, Resources.none())) {
return;
}
...
{noformat}

Thoughts ???

> Skip LeafQueue#activateApplication for running application on recovery
> --
>
> Key: YARN-5773
> URL: https://issues.apache.org/jira/browse/YARN-5773
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is 
> invoked.Resulting in AM limit check to be done even before Node managers are 
> getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} 
> application   {{5000}} iterations causing time take for Rm to be active 
> more than 10 min.
> Since NM resources are not yet added to during recovery we should skip 
> {{activateApplicaiton()}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5773) Skip LeafQueue#activateApplication for running application on recovery

2016-10-24 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-5773:
--

 Summary: Skip LeafQueue#activateApplication for running 
application on recovery
 Key: YARN-5773
 URL: https://issues.apache.org/jira/browse/YARN-5773
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


# Submit application 10K application to default queue.
# All applications are in accepted state
# Now restart resourcemanager

For each application recovery {{LeafQueue#activateApplications()}} is 
invoked.Resulting in AM limit check to be done even before Node managers are 
getting registered.

Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application 
  {{5000}} iterations causing time take for Rm to be active more than 10 
min.

Since NM resources are not yet added to during recovery we should skip 
{{activateApplicaiton()}} 











--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-10-24 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-5547:
--
Attachment: YARN-5547.02.patch

Please review

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch, YARN-5547.02.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5771) Provide option to send env to be whitelisted in ContainerLaunchContext

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601851#comment-15601851
 ] 

Bibin A Chundatt edited comment on YARN-5771 at 10/24/16 12:32 PM:
---

In addition to above implementation the following changes also need to be done.
# Configuration in nodemanager side to enable this feature.
# Add configuration for ENV properties of NM which should never get whitelisted 
even if send as part of ContainerLaunchContext.

Thoughts?



was (Author: bibinchundatt):
Additional above implementation the following changes also need to be done.
# Configuration in nodemanager side to enable this feature.
# Add configuration for ENV properties of NM which should never get whitelisted 
even if send as part of ContainerLaunchContext.

Thoughts?


> Provide option to send env to be whitelisted in ContainerLaunchContext 
> ---
>
> Key: YARN-5771
> URL: https://issues.apache.org/jira/browse/YARN-5771
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: container-whitelist-env-wip.patch
>
>
> As per current implementation ENV to be white listed for container launch is 
> are configured as part of {{yarn.nodemanager.env-whitelist}}
> Specific to container we cannot specify additional ENV properties to be 
> whitelisted. As part of this jira we are providing an option to provide 
> additional whitelist ENV.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5771) Provide option to send env to be whitelisted in ContainerLaunchContext

2016-10-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601851#comment-15601851
 ] 

Bibin A Chundatt commented on YARN-5771:


Additional above implementation the following changes also need to be done.
# Configuration in nodemanager side to enable this feature.
# Add configuration for ENV properties of NM which should never get whitelisted 
even if send as part of ContainerLaunchContext.

Thoughts?


> Provide option to send env to be whitelisted in ContainerLaunchContext 
> ---
>
> Key: YARN-5771
> URL: https://issues.apache.org/jira/browse/YARN-5771
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: container-whitelist-env-wip.patch
>
>
> As per current implementation ENV to be white listed for container launch is 
> are configured as part of {{yarn.nodemanager.env-whitelist}}
> Specific to container we cannot specify additional ENV properties to be 
> whitelisted. As part of this jira we are providing an option to provide 
> additional whitelist ENV.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601694#comment-15601694
 ] 

Hadoop QA commented on YARN-5772:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 43s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 4m 21s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5a4801a |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12834918/YARN-5772-YARN-3368.0001.patch
 |
| JIRA Issue | YARN-5772 |
| Optional Tests |  asflicense  |
| uname | Linux d47170ae53d6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / 9690f29 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13485/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601658#comment-15601658
 ] 

Sunil G edited comment on YARN-5772 at 10/24/16 11:09 AM:
--

Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot.

[~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll 
commit the change after jenkins is run.


was (Author: sunilg):
Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot.

[~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll 
commit the change.

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5772:
--
Attachment: YARN-5772-YARN-3368.0001.patch

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: YARN-5772-YARN-3368.0001.patch, ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601658#comment-15601658
 ] 

Sunil G commented on YARN-5772:
---

Thanks [~akhilpb]. It looks fine for me.. Also attached the screen shot.

[~ajisakaa] and [~leftnoteasy]/[~Sreenath]. pls take a look. If its fine, i ll 
commit the change.

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5772:
--
Attachment: ui2-with-newlogo.png

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
> Attachments: ui2-with-newlogo.png
>
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Akhil PB (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reassigned YARN-5772:
--

Assignee: Akhil PB

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>Assignee: Akhil PB
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5705) [YARN-3368] Add support for Timeline V2 to new web UI

2016-10-24 Thread Akhil PB (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-5705:
---
Attachment: YARN-5705.010.patch

> [YARN-3368] Add support for Timeline V2 to new web UI
> -
>
> Key: YARN-5705
> URL: https://issues.apache.org/jira/browse/YARN-5705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil G
>Assignee: Akhil PB
> Attachments: YARN-5705.001.patch, YARN-5705.002.patch, 
> YARN-5705.003.patch, YARN-5705.004.patch, YARN-5705.005.patch, 
> YARN-5705.006.patch, YARN-5705.007.patch, YARN-5705.008.patch, 
> YARN-5705.009.patch, YARN-5705.010.patch
>
>
> Integrate timeline v2 to YARN-3368. This is a clone JIRA for YARN-4097



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5611) Provide an API to update lifetime of an application.

2016-10-24 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601420#comment-15601420
 ] 

Varun Vasudev commented on YARN-5611:
-

Instead of using long as part of the API and expecting clients to convert time 
to and from UTC epoch, it's much cleaner to use a ISO-8601 formatted string. 
You can avoid writing the utility functions as well since there are a lot of 
libraries to handle ISO-8601 dates.

> Provide an API to update lifetime of an application.
> 
>
> Key: YARN-5611
> URL: https://issues.apache.org/jira/browse/YARN-5611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-5611.patch, 0002-YARN-5611.patch, 
> 0003-YARN-5611.patch, YARN-5611.v0.patch
>
>
> YARN-4205 monitors an Lifetime of an applications is monitored if required. 
> Add an client api to update lifetime of an application. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-5772:

Description: YARN-5161 added Apache Hadoop logo in the UI but the logo is 
old.

> Replace old Hadoop logo with new one
> 
>
> Key: YARN-5772
> URL: https://issues.apache.org/jira/browse/YARN-5772
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: YARN-3368
>Reporter: Akira Ajisaka
>
> YARN-5161 added Apache Hadoop logo in the UI but the logo is old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5772) Replace old Hadoop logo with new one

2016-10-24 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created YARN-5772:
---

 Summary: Replace old Hadoop logo with new one
 Key: YARN-5772
 URL: https://issues.apache.org/jira/browse/YARN-5772
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn-ui-v2
Affects Versions: YARN-3368
Reporter: Akira Ajisaka






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-10-24 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601328#comment-15601328
 ] 

Yufei Gu commented on YARN-4743:


Hi [~gzh1992n], thanks for working on this. 
The patch v2 looks generally good to me. Some nits:
1. If you want to use if-else statements, better to use {{weight1 == 0}} 
instead of {{weight1 != 0}} to get better readability. Or we can use this to 
avoid if-else statements
{code}
useToWeightRatio1 = -weight1;
useToWeightRatio2 = -weight2;
{code}
2. Please describe the change in doc of function {{FairShareComparator}}.
3. Please fixed all style issue in Hadoop QA's comment.
4. Can we put the {{TestFairShareComparator}} into {{TestSchedulingPolicy}}, 
and add doc for the function in the unit test?
5. Not sure why {{startTimeColloection}} and {{nameCollection}} are needed. Can 
you explain a little bit?

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha1
>Reporter: Zephyr Guo
>Assignee: Zephyr Guo
> Attachments: YARN-4743-v1.patch, YARN-4743-v2.patch, timsort.log
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not 
> transitive.
> We get NaN when memorySize=0 and weight=0.
> {code:title=FairSharePolicy.java}
> useToWeightRatio1 = s1.getResourceUsage().getMemorySize() /
>   s1.getWeights().getWeight(ResourceType.MEMORY)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures

2016-10-24 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601173#comment-15601173
 ] 

sandflee commented on YARN-5375:


sorry for the delay, will do this in these days

> invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures
> --
>
> Key: YARN-5375
> URL: https://issues.apache.org/jira/browse/YARN-5375
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-5375.01.patch, YARN-5375.03.patch, 
> YARN-5375.04.patch, YARN-5375.05.patch, YARN-5375.06.patch, 
> YARN-5375.07-drain-statestore.patch, YARN-5375.07-sync-statestore.patch
>
>
> seen many test failures related to RMApp/RMAppattempt comes to some state but 
> some event are not processed in rm event queue or scheduler event queue, 
> cause test failure, seems we could implicitly invokes drainEvents(should also 
> drain sheduler event) in some mockRM method like waitForState



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601105#comment-15601105
 ] 

Sunil G commented on YARN-2009:
---

Test case failures are not related. YARN-5362 is handling the same.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
> Attachments: YARN-2009.0001.patch, YARN-2009.0002.patch, 
> YARN-2009.0003.patch, YARN-2009.0004.patch, YARN-2009.0005.patch, 
> YARN-2009.0006.patch, YARN-2009.0007.patch, YARN-2009.0008.patch, 
> YARN-2009.0009.patch, YARN-2009.0010.patch, YARN-2009.0011.patch, 
> YARN-2009.0012.patch, YARN-2009.0013.patch, YARN-2009.0014.patch
>
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601101#comment-15601101
 ] 

Sunil G commented on YARN-5375:
---

I think we need to get this in as many tests are failing randomly.

[~sandflee]. seems like we have a consensus for state-store patch approach. In 
that case, could you please make this as a proper patch here.

> invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures
> --
>
> Key: YARN-5375
> URL: https://issues.apache.org/jira/browse/YARN-5375
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-5375.01.patch, YARN-5375.03.patch, 
> YARN-5375.04.patch, YARN-5375.05.patch, YARN-5375.06.patch, 
> YARN-5375.07-drain-statestore.patch, YARN-5375.07-sync-statestore.patch
>
>
> seen many test failures related to RMApp/RMAppattempt comes to some state but 
> some event are not processed in rm event queue or scheduler event queue, 
> cause test failure, seems we could implicitly invokes drainEvents(should also 
> drain sheduler event) in some mockRM method like waitForState



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5362) TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail

2016-10-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601096#comment-15601096
 ] 

Sunil G commented on YARN-5362:
---

+1. I am still getting same as [~naganarasimha...@apache.org]. 
https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/


I think events are not fully drained here which would have come from 
StateStore. YARN-5375 would have been a clean solution for this. I think we can 
make progress there with review.  

> TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail
> ---
>
> Key: YARN-5362
> URL: https://issues.apache.org/jira/browse/YARN-5362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: sandflee
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5362.01.patch
>
>
> Saw the following in a precommit build that only changed an unrelated unit 
> test:
> {noformat}
> Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 101.265 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 0.411 sec  <<< FAILURE!
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1653)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >