[jira] [Commented] (YARN-10011) Catch all exception during init app in LogAggregationService

2020-02-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029614#comment-17029614
 ] 

Prabhu Joseph commented on YARN-10011:
--

The patch looks good, will change the state to Patch Available.

> Catch all exception  during init app in LogAggregationService 
> --
>
> Key: YARN-10011
> URL: https://issues.apache.org/jira/browse/YARN-10011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10011-001.patch
>
>
> we should catch all exception during init app in LogAggregationService in 
> case of nm exit 
> {code:java}
> 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at 
> org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10109) Allow stop and convert from leaf to parent queue in a single Mutation API call

2020-02-03 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029546#comment-17029546
 ] 

Sunil G commented on YARN-10109:


[~prabhujoseph] Thanks for the patch.

I am kind of +0 for having UNDEFINED state for Queue. I presume the Queue will 
land on to UNDEFINED state during some race condition scenarios. Post this, how 
a queue can be recovered? What user need to do at this point of time ? This is 
my assumption now. Please check below scenario for more details.

*My scenario is as follows:* User is NOT invoking validate api, But calling 
mutation api directly (may be wrong payload). If the payload config alone is 
updated with UNDEFINED state for some queue due to issues, and later you are 
discarding this config, then i think it's fine. I would like ensure that actual 
Queue in memory should never land on to UNDEFINED state in any case. 

> Allow stop and convert from leaf to parent queue in a single Mutation API call
> --
>
> Key: YARN-10109
> URL: https://issues.apache.org/jira/browse/YARN-10109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10109-001.patch, YARN-10109-002.patch, 
> YARN-10109-003.patch
>
>
> SchedulerConf Mutation API does not Allow Stop and Adding queue under an 
> existing Leaf Queue in a single call. 
> *Repro:*
>  
> {code:java}
> Capacity-Scheduler.xml: 
> yarn.scheduler.capacity.root.queues = default
> yarn.scheduler.capacity.root.default.capacity = 100 
> cat abc.xml 
> 
>   
>   root.default.v1
>   
> 
>   capacity
>   100
> 
>   
> 
> 
>   root.default
>   
> 
>   state
>   STOPPED
> 
>   
> 
> 
> [yarn@pjoseph-1 tmp]$ curl --negotiate -u : -X PUT -d @add.xml -H 
> "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> Failed to re-init queues : Can not convert the leaf queue: root.default to 
> parent queue since it is not yet in stopped state. Current State : RUNNING
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2020-02-03 Thread Mikayla Konst (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029396#comment-17029396
 ] 

Mikayla Konst commented on YARN-9011:
-

We experienced this exact same race condition recently (resource manager 
sending SHUTDOWN signal to node manager because it received a heartbeat from 
the node manager *after* the HostDetails reference was updated, but *before* 
the node was transitioned to state DECOMMISSIONING).

I think this patch is a huge improvement over the previous behavior, but I 
think there is still a narrow race that can happen when refresh nodes is called 
multiple times in a row in quick succession with the same set of nodes in the 
exclude file:
 # lazy-loaded HostDetails reference is updated
 # nodes are added to gracefullyDecommissionableNodes set
 # current HostDetails reference is updated
 # event to update node status to DECOMMISSIONING is added to asynchronous 
event handler's event queue, but hasn't been processed yet
 # refresh nodes is called a second time
 # lazy-loaded HostDetails reference is updated
 # gracefullyDecommissionableNodes set is cleared
 # node manager heartbeats to resource manager. It is not in state 
DECOMMISSIONING and not in the gracefullyDecommissionableNodes set, but is an 
excluded node in the HostDetails, so it is sent a SHUTDOWN signal
 # node is added to gracefullyDecommissionableNodes set
 # event handler transitions node to state DECOMMISSIONING at some point

This would be fixed if you used an AtomicReference for your set of 
"gracefullyDecommissionableNodes" and swapped out the reference, similar to how 
you handled the HostDetails.

Alternatively, instead of using an asynchronous event handler to update the 
state of the nodes to DECOMMISSIONING, you could update the state 
synchronously. You could grab a lock, then update HostDetails and synchronously 
update the states of the nodes being gracefully decommissioned, then release 
the lock. When the resource tracker service receives a heartbeat and needs to 
check if a node should be shutdown (if it is excluded and in state 
decommissioning), it would grab the lock right before doing the check. Having 
the resource tracker service wait on a lock doesn't sound great, but it would 
likely be on the order of milliseconds, and only when refresh nodes is called.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch, 
> YARN-9011-009.patch, YARN-9011-branch-3.1.001.patch, 
> YARN-9011-branch-3.2.001.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node 

[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029303#comment-17029303
 ] 

Hadoop QA commented on YARN-9538:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-9538 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991360/YARN-9538.004.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 345c70808307 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e3a0b0 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 310 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25492/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Document scheduler/app activities and REST APIs
> ---
>
> Key: YARN-9538
> URL: https://issues.apache.org/jira/browse/YARN-9538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9538.001.patch, YARN-9538.002.patch, 
> YARN-9538.003.patch, YARN-9538.004.patch
>
>
> Add documentation for scheduler/app activities in CapacityScheduler.md and 
> ResourceManagerRest.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029269#comment-17029269
 ] 

Hadoop QA commented on YARN-9538:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
29m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-9538 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991360/YARN-9538.004.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 7685f4838b81 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e3a0b0 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 413 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25491/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Document scheduler/app activities and REST APIs
> ---
>
> Key: YARN-9538
> URL: https://issues.apache.org/jira/browse/YARN-9538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9538.001.patch, YARN-9538.002.patch, 
> YARN-9538.003.patch, YARN-9538.004.patch
>
>
> Add documentation for scheduler/app activities in CapacityScheduler.md and 
> ResourceManagerRest.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029247#comment-17029247
 ] 

Hadoop QA commented on YARN-9567:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} YARN-9567 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9567 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25493/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add diagnostics for outstanding resource requests on app attempts page
> --
>
> Key: YARN-9567
> URL: https://issues.apache.org/jira/browse/YARN-9567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9567.001.patch, YARN-9567.002.patch, 
> YARN-9567.003.patch, app-activities-example.png, 
> image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, 
> image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, 
> no_diagnostic_at_first.png, scheduler-activities-example.png, 
> show_diagnostics_after_requesting_app_activities_REST_API.png
>
>
> Currently on app attempt page we can see outstanding resource requests, it 
> will be helpful for users to know why if we can join diagnostics of this app 
> with these. 
> Discussed with [~cheersyang], we can passively load diagnostics from cache of 
> completed app activities instead of actively triggering which may bring 
> uncontrollable risks.
> For example:
> (1) At first we can see no diagnostic in cache if app activities not 
> triggered below the outstanding requests.
> !no_diagnostic_at_first.png|width=793,height=248!
> (2) After requesting the application activities REST API, we can see 
> diagnostics now.
> !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs

2020-02-03 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029227#comment-17029227
 ] 

Weiwei Yang commented on YARN-9538:
---

Manually triggered the Jenkins job: 
[https://builds.apache.org/job/PreCommit-YARN-Build/25491/]

> Document scheduler/app activities and REST APIs
> ---
>
> Key: YARN-9538
> URL: https://issues.apache.org/jira/browse/YARN-9538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9538.001.patch, YARN-9538.002.patch, 
> YARN-9538.003.patch, YARN-9538.004.patch
>
>
> Add documentation for scheduler/app activities in CapacityScheduler.md and 
> ResourceManagerRest.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page

2020-02-03 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029226#comment-17029226
 ] 

Weiwei Yang commented on YARN-9567:
---

Hi [~Tao Yang]

Can you please rebase the patch to latest trunk?

> Add diagnostics for outstanding resource requests on app attempts page
> --
>
> Key: YARN-9567
> URL: https://issues.apache.org/jira/browse/YARN-9567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9567.001.patch, YARN-9567.002.patch, 
> YARN-9567.003.patch, app-activities-example.png, 
> image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, 
> image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, 
> no_diagnostic_at_first.png, scheduler-activities-example.png, 
> show_diagnostics_after_requesting_app_activities_REST_API.png
>
>
> Currently on app attempt page we can see outstanding resource requests, it 
> will be helpful for users to know why if we can join diagnostics of this app 
> with these. 
> Discussed with [~cheersyang], we can passively load diagnostics from cache of 
> completed app activities instead of actively triggering which may bring 
> uncontrollable risks.
> For example:
> (1) At first we can see no diagnostic in cache if app activities not 
> triggered below the outstanding requests.
> !no_diagnostic_at_first.png|width=793,height=248!
> (2) After requesting the application activities REST API, we can see 
> diagnostics now.
> !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page

2020-02-03 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029223#comment-17029223
 ] 

Weiwei Yang commented on YARN-9567:
---

Thanks. Looks good.

Somehow this latest patch did not trigger the jenkins job, manually triggered 
[https://builds.apache.org/job/PreCommit-YARN-Build/25490/]

> Add diagnostics for outstanding resource requests on app attempts page
> --
>
> Key: YARN-9567
> URL: https://issues.apache.org/jira/browse/YARN-9567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9567.001.patch, YARN-9567.002.patch, 
> YARN-9567.003.patch, app-activities-example.png, 
> image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, 
> image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, 
> no_diagnostic_at_first.png, scheduler-activities-example.png, 
> show_diagnostics_after_requesting_app_activities_REST_API.png
>
>
> Currently on app attempt page we can see outstanding resource requests, it 
> will be helpful for users to know why if we can join diagnostics of this app 
> with these. 
> Discussed with [~cheersyang], we can passively load diagnostics from cache of 
> completed app activities instead of actively triggering which may bring 
> uncontrollable risks.
> For example:
> (1) At first we can see no diagnostic in cache if app activities not 
> triggered below the outstanding requests.
> !no_diagnostic_at_first.png|width=793,height=248!
> (2) After requesting the application activities REST API, we can see 
> diagnostics now.
> !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029224#comment-17029224
 ] 

Hadoop QA commented on YARN-9567:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 11s{color} 
| {color:red} YARN-9567 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9567 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25490/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add diagnostics for outstanding resource requests on app attempts page
> --
>
> Key: YARN-9567
> URL: https://issues.apache.org/jira/browse/YARN-9567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9567.001.patch, YARN-9567.002.patch, 
> YARN-9567.003.patch, app-activities-example.png, 
> image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, 
> image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, 
> no_diagnostic_at_first.png, scheduler-activities-example.png, 
> show_diagnostics_after_requesting_app_activities_REST_API.png
>
>
> Currently on app attempt page we can see outstanding resource requests, it 
> will be helpful for users to know why if we can join diagnostics of this app 
> with these. 
> Discussed with [~cheersyang], we can passively load diagnostics from cache of 
> completed app activities instead of actively triggering which may bring 
> uncontrollable risks.
> For example:
> (1) At first we can see no diagnostic in cache if app activities not 
> triggered below the outstanding requests.
> !no_diagnostic_at_first.png|width=793,height=248!
> (2) After requesting the application activities REST API, we can see 
> diagnostics now.
> !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10022) Create RM Rest API to validate a CapacityScheduler Configuration

2020-02-03 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029205#comment-17029205
 ] 

Kinga Marton commented on YARN-10022:
-

The compilation failure is unrelated and is caused by HDFS. I checked the tests 
as well and most of the failures are because the test case timed out.

> Create RM Rest API to validate a CapacityScheduler Configuration
> 
>
> Key: YARN-10022
> URL: https://issues.apache.org/jira/browse/YARN-10022
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10022-branch-3.1.001.patch, 
> YARN-10022-branch-3.2.001.patch, YARN-10022.001.patch, YARN-10022.002.patch, 
> YARN-10022.003.patch, YARN-10022.004.patch, YARN-10022.005.patch, 
> YARN-10022.006.patch, YARN-10022.007.patch, YARN-10022.WIP.patch, 
> YARN-10022.WIP2.patch
>
>
> RMWebService should expose a new api which gets a CapacityScheduler 
> Configuration as an input, validates it and returns success / failure.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10101) Support listing of aggregated logs for containers belonging to an application attempt

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029171#comment-17029171
 ] 

Hadoop QA commented on YARN-10101:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 34s{color} | {color:orange} root: The patch generated 2 new + 87 unchanged - 
0 fixed = 89 total (was 87) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
32s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
28s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
42s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
40s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m 23s{color} | 
{color:black} {color} |
\\
\\
|| 

[jira] [Commented] (YARN-10022) Create RM Rest API to validate a CapacityScheduler Configuration

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029083#comment-17029083
 ] 

Hadoop QA commented on YARN-10022:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  9m 
24s{color} | {color:red} root in branch-3.1 failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-3.1 
failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} branch-3.1 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-3.1 
failed. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
34s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-3.1 
failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-3.1 
failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 12s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 112 unchanged - 9 fixed = 113 total (was 121) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
30s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 13s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:70a0ef5d4a6 |
| JIRA Issue | YARN-10022 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992247/YARN-10022-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 103f4ba21149 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 8ea4787 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/25489/artifact/out/branch-mvninstall-root.txt
 |
| compile 

[jira] [Commented] (YARN-10029) Add option to UIv2 to get container logs from the new JHS API

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029074#comment-17029074
 ] 

Hadoop QA commented on YARN-10029:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10029 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992503/YARN-10029.002.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux 93ebdb3f471c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e3a0b0 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 312 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25487/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add option to UIv2 to get container logs from the new JHS API
> -
>
> Key: YARN-10029
> URL: https://issues.apache.org/jira/browse/YARN-10029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10029.001.patch, YARN-10029.002.patch
>
>
> Provided the new API is ready to use (also integrated into JHS in 
> YARN-10028), we can add a new config option to UIv2 that would make the UIv2 
> to request logs from the JHS API similarly as the ATSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10101) Support listing of aggregated logs for containers belonging to an application attempt

2020-02-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029034#comment-17029034
 ] 

Adam Antal commented on YARN-10101:
---

Thanks for the review [~pbacsko]!

Your remarks were perfectly valid, fixed them in v6. 
Also fixed 2 checkstyle warnings and rebased to trunk.

> Support listing of aggregated logs for containers belonging to an application 
> attempt
> -
>
> Key: YARN-10101
> URL: https://issues.apache.org/jira/browse/YARN-10101
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10101.001.patch, YARN-10101.002.patch, 
> YARN-10101.003.patch, YARN-10101.004.patch, YARN-10101.005.patch, 
> YARN-10101.006.patch
>
>
> To display logs without access to the timeline server, we need an interface 
> where we can query the list of containers with aggregated logs belonging to 
> an application attempt.
> We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10101) Support listing of aggregated logs for containers belonging to an application attempt

2020-02-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10101:
--
Attachment: YARN-10101.006.patch

> Support listing of aggregated logs for containers belonging to an application 
> attempt
> -
>
> Key: YARN-10101
> URL: https://issues.apache.org/jira/browse/YARN-10101
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10101.001.patch, YARN-10101.002.patch, 
> YARN-10101.003.patch, YARN-10101.004.patch, YARN-10101.005.patch, 
> YARN-10101.006.patch
>
>
> To display logs without access to the timeline server, we need an interface 
> where we can query the list of containers with aggregated logs belonging to 
> an application attempt.
> We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10029) Add option to UIv2 to get container logs from the new JHS API

2020-02-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029011#comment-17029011
 ] 

Adam Antal commented on YARN-10029:
---

Uploaded patchset v2 which provides the desired behaviour on top of YARN-10101.

> Add option to UIv2 to get container logs from the new JHS API
> -
>
> Key: YARN-10029
> URL: https://issues.apache.org/jira/browse/YARN-10029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10029.001.patch, YARN-10029.002.patch
>
>
> Provided the new API is ready to use (also integrated into JHS in 
> YARN-10028), we can add a new config option to UIv2 that would make the UIv2 
> to request logs from the JHS API similarly as the ATSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10029) Add option to UIv2 to get container logs from the new JHS API

2020-02-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10029:
--
Attachment: YARN-10029.002.patch

> Add option to UIv2 to get container logs from the new JHS API
> -
>
> Key: YARN-10029
> URL: https://issues.apache.org/jira/browse/YARN-10029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10029.001.patch, YARN-10029.002.patch
>
>
> Provided the new API is ready to use (also integrated into JHS in 
> YARN-10028), we can add a new config option to UIv2 that would make the UIv2 
> to request logs from the JHS API similarly as the ATSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10011) Catch all exception during init app in LogAggregationService

2020-02-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028930#comment-17028930
 ] 

Adam Antal commented on YARN-10011:
---

Thanks for the patch [~cane].

I agree with the modification (better not let this exception slip to the 
dispatcher thread).

I was wondering, whether catching the {{IllegalStateException}} somewhere 
around {{LogAggregationFileController#verifyAndCreateRemoteLogDir}} would make 
sense, but I think it is well protected (catches IOExceptions as expected), so 
I think we did our best here.

+1 (non-binding).

Could you please take a look at this [~prabhujoseph]? 

> Catch all exception  during init app in LogAggregationService 
> --
>
> Key: YARN-10011
> URL: https://issues.apache.org/jira/browse/YARN-10011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10011-001.patch
>
>
> we should catch all exception during init app in LogAggregationService in 
> case of nm exit 
> {code:java}
> 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at 
> org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10011) Catch all exception during init app in LogAggregationService

2020-02-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028931#comment-17028931
 ] 

Adam Antal commented on YARN-10011:
---

Oh before I forget it, [~cane], please move the jira to patch available state 
so that jenkins check can run on this.

> Catch all exception  during init app in LogAggregationService 
> --
>
> Key: YARN-10011
> URL: https://issues.apache.org/jira/browse/YARN-10011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10011-001.patch
>
>
> we should catch all exception during init app in LogAggregationService in 
> case of nm exit 
> {code:java}
> 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at 
> org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-02-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028892#comment-17028892
 ] 

Peter Bacsko commented on YARN-9879:


The latest build picked up {{CSQueue.getQueueUsage.txt}}. [~shuzirra] you might 
want to re-upload the patch again.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9908) Make ZK calls in parallel to load application states.

2020-02-03 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson reassigned YARN-9908:
---

Assignee: Cyrus Jackson  (was: Abhishek Modi)

> Make ZK calls in parallel to load application states.
> -
>
> Key: YARN-9908
> URL: https://issues.apache.org/jira/browse/YARN-9908
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Cyrus Jackson
>Priority: Major
>
> At present, all the application states are fetched linearly from ZooKeeper. 
> This can be optimized by using a threadpool to fetch application states from 
> Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10057) Upgrade the dependencies managed by yarnpkg

2020-02-03 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028799#comment-17028799
 ] 

Akira Ajisaka commented on YARN-10057:
--

Yes, we should cherry-pick this change.

> Upgrade the dependencies managed by yarnpkg
> ---
>
> Key: YARN-10057
> URL: https://issues.apache.org/jira/browse/YARN-10057
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, yarn-ui-v2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Fix For: 3.3.0
>
>
> Run "yarn upgrade" to update the dependencies managed by yarnpkg.
> Dependabot automatically created the following pull requests and this issue 
> is to close them.
> * https://github.com/apache/hadoop/pull/1741
> * https://github.com/apache/hadoop/pull/1742
> * https://github.com/apache/hadoop/pull/1743
> * https://github.com/apache/hadoop/pull/1744



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10113) SystemServiceManagerImpl fails to initialize

2020-02-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028798#comment-17028798
 ] 

Hadoop QA commented on YARN-10113:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
49s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
58s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10113 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992469/YARN-10113-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9cc7f5436b07 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e3a0b0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25486/testReport/ |
| Max. process+thread count | 585 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25486/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> SystemServiceManagerImpl 

[jira] [Commented] (YARN-10109) Allow stop and convert from leaf to parent queue in a single Mutation API call

2020-02-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028788#comment-17028788
 ] 

Prabhu Joseph commented on YARN-10109:
--

Thanks [~kmarton].

[~sunil.gov...@gmail.com] [~snemeth] Can you review the patch when you get time.

> Allow stop and convert from leaf to parent queue in a single Mutation API call
> --
>
> Key: YARN-10109
> URL: https://issues.apache.org/jira/browse/YARN-10109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10109-001.patch, YARN-10109-002.patch, 
> YARN-10109-003.patch
>
>
> SchedulerConf Mutation API does not Allow Stop and Adding queue under an 
> existing Leaf Queue in a single call. 
> *Repro:*
>  
> {code:java}
> Capacity-Scheduler.xml: 
> yarn.scheduler.capacity.root.queues = default
> yarn.scheduler.capacity.root.default.capacity = 100 
> cat abc.xml 
> 
>   
>   root.default.v1
>   
> 
>   capacity
>   100
> 
>   
> 
> 
>   root.default
>   
> 
>   state
>   STOPPED
> 
>   
> 
> 
> [yarn@pjoseph-1 tmp]$ curl --negotiate -u : -X PUT -d @add.xml -H 
> "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> Failed to re-init queues : Can not convert the leaf queue: root.default to 
> parent queue since it is not yet in stopped state. Current State : RUNNING
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10109) Allow stop and convert from leaf to parent queue in a single Mutation API call

2020-02-03 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028787#comment-17028787
 ] 

Kinga Marton commented on YARN-10109:
-

Thank you [~prabhujoseph]. LGTM, +1 (non-binding)

> Allow stop and convert from leaf to parent queue in a single Mutation API call
> --
>
> Key: YARN-10109
> URL: https://issues.apache.org/jira/browse/YARN-10109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10109-001.patch, YARN-10109-002.patch, 
> YARN-10109-003.patch
>
>
> SchedulerConf Mutation API does not Allow Stop and Adding queue under an 
> existing Leaf Queue in a single call. 
> *Repro:*
>  
> {code:java}
> Capacity-Scheduler.xml: 
> yarn.scheduler.capacity.root.queues = default
> yarn.scheduler.capacity.root.default.capacity = 100 
> cat abc.xml 
> 
>   
>   root.default.v1
>   
> 
>   capacity
>   100
> 
>   
> 
> 
>   root.default
>   
> 
>   state
>   STOPPED
> 
>   
> 
> 
> [yarn@pjoseph-1 tmp]$ curl --negotiate -u : -X PUT -d @add.xml -H 
> "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> Failed to re-init queues : Can not convert the leaf queue: root.default to 
> parent queue since it is not yet in stopped state. Current State : RUNNING
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10113) SystemServiceManagerImpl fails to initialize

2020-02-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10113:
-
Attachment: YARN-10113-002.patch

> SystemServiceManagerImpl fails to initialize 
> -
>
> Key: YARN-10113
> URL: https://issues.apache.org/jira/browse/YARN-10113
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10113-001.patch, YARN-10113-002.patch
>
>
> RM fails to start with SystemServiceManagerImpl failed to initialize.
> {code}
> 2020-01-28 12:20:43,631 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(900)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:881)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1257)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1298)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1294)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1294)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
> ... 5 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1219)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1235)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1202)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1181)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1177)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1189)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 16 more
> {code}
> This happens when reusing the FileSystem object which gets closed.



--
This message was