[jira] [Updated] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3852: -- Attachment: YARN-3852-2.patch > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636369#comment-14636369 ] Brahma Reddy Battula commented on YARN-3528: Yes, I was in leave for these days..Do you have any other comments apart from [~varun_saxena]. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636358#comment-14636358 ] Joep Rottinghuis commented on YARN-451: --- Just for the record, at Twitter we've been running with YARN-2417 in production and are finding it very useful in clusters of many thousands of nodes with tens of thousands of jobs in a day. > Add more metrics to RM page > --- > > Key: YARN-451 > URL: https://issues.apache.org/jira/browse/YARN-451 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Assignee: Sangjin Lee > Attachments: in_progress_2x.png, yarn-451-trunk-20130916.1.patch > > > ResourceManager webUI shows list of RUNNING applications, but it does not > tell which applications are requesting more resource compared to others. With > cluster running hundreds of applications at once it would be useful to have > some kind of metric to show high-resource usage applications vs low-resource > usage ones. At the minimum showing number of containers is good option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3952) Fix new findbugs warnings in resourcemanager in YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3952: --- Description: {noformat} {noformat} was: {noformat} {noformat} > Fix new findbugs warnings in resourcemanager in YARN-2928 branch > > > Key: YARN-3952 > URL: https://issues.apache.org/jira/browse/YARN-3952 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='79'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='76'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='73'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='67'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='70'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='82'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='85'/> > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3250: -- Summary: Support admin cli interface in for Application Priority (was: Support admin cli interface in Application Priority Manager (server side)) > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636244#comment-14636244 ] Joep Rottinghuis commented on YARN-3176: For the record, we're running with this patch in production at Twitter. > In Fair Scheduler, child queue should inherit maxApp from its parent > > > Key: YARN-3176 > URL: https://issues.apache.org/jira/browse/YARN-3176 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch > > > if the child queue does not have a maxRunningApp limit, it will use the > queueMaxAppsDefault. This behavior is not quite right, since > queueMaxAppsDefault is normally a small number, whereas some parent queues do > have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636221#comment-14636221 ] Joep Rottinghuis commented on YARN-445: --- Can we rekindle this discussion? We've had folks ask how we're letting users debug their own containers at Twitter and the answer is that we're running with the patch supplied by Ming. Giving the users a mechanism to jstack is absolutely awesome. In fact we're using a capability in our JVM that lets user do a perf record/perf report right from a link on the UI using the very same mechanism. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe > Labels: BB2015-05-TBR > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3952) Fix new findbugs warning in resourcemanager in YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3952: --- Issue Type: Sub-task (was: Bug) Parent: YARN-2928 > Fix new findbugs warning in resourcemanager in YARN-2928 branch > --- > > Key: YARN-3952 > URL: https://issues.apache.org/jira/browse/YARN-3952 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='79'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='76'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='73'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='67'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='70'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='82'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='85'/> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3952) Fix new findbugs warnings in resourcemanager in YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3952: --- Summary: Fix new findbugs warnings in resourcemanager in YARN-2928 branch (was: Fix new findbugs warning in resourcemanager in YARN-2928 branch) > Fix new findbugs warnings in resourcemanager in YARN-2928 branch > > > Key: YARN-3952 > URL: https://issues.apache.org/jira/browse/YARN-3952 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='79'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='76'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='73'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='67'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='70'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='82'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='85'/> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3952) Fix new findbugs warning in resourcemanager in YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3952: --- Description: {noformat} {noformat} was: {noformat} > Fix new findbugs warning in resourcemanager in YARN-2928 branch > --- > > Key: YARN-3952 > URL: https://issues.apache.org/jira/browse/YARN-3952 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='79'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='76'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='73'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='67'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='70'/> > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='82'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='85'/> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3952) Fix new findbugs warning in resourcemanager in YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3952: --- Description: {noformat} > Fix new findbugs warning in resourcemanager in YARN-2928 branch > --- > > Key: YARN-3952 > URL: https://issues.apache.org/jira/browse/YARN-3952 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'> type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' > message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='79'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='76'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='73'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='67'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='70'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='82'/> category='STYLE' message='Unchecked/unconfirmed cast from > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to > org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent > in > org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' > lineNumber='85'/> -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3952) Fix new findbugs warning in resourcemanager in YARN-2928 branch
Varun Saxena created YARN-3952: -- Summary: Fix new findbugs warning in resourcemanager in YARN-2928 branch Key: YARN-3952 URL: https://issues.apache.org/jira/browse/YARN-3952 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636204#comment-14636204 ] Varun Saxena commented on YARN-3874: Test failures are unrelated. They are due to a recent commit which was reverted in main branch. That revert has not yet come in YARN-2928 branch. Findbugs in timelineservice are related. Primarily related to default encoding. Will fix. The ones in resourcemanager are not related. Will raise an issue for it. > Optimize and synchronize FS Reader and Writer Implementations > - > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch, > YARN-3874-YARN-2928.02.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636173#comment-14636173 ] He Tianyi commented on YARN-3903: - The latter one. I encountered the requirement that needs to prevent particular jobs from being preempted. This can either be done at queue-level or job-level. IMHO queue level have the advantage of transparency over job-level. > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell
[ https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636134#comment-14636134 ] Allen Wittenauer commented on YARN-3950: This needs to be either YARN_SHELL_ID or HADOOP_SHELL_ID. Having a 'naked' SHELL_ID pollutes the shell environment space. > Add unique SHELL_ID environment variable to DistributedShell > > > Key: YARN-3950 > URL: https://issues.apache.org/jira/browse/YARN-3950 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-3950.001.patch > > > As discussed in [this > comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027], > it would be useful to have a monotonically increasing and independent ID of > some kind that is unique per shell in the distributed shell program. > We can do that by adding a SHELL_ID env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell
[ https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636122#comment-14636122 ] Hadoop QA commented on YARN-3950: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 22s | The applied patch generated 2 new checkstyle issues (total was 46, now 48). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 57s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 43m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746452/YARN-3950.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 31f1171 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8602/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8602/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8602/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8602/console | This message was automatically generated. > Add unique SHELL_ID environment variable to DistributedShell > > > Key: YARN-3950 > URL: https://issues.apache.org/jira/browse/YARN-3950 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-3950.001.patch > > > As discussed in [this > comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027], > it would be useful to have a monotonically increasing and independent ID of > some kind that is unique per shell in the distributed shell program. > We can do that by adding a SHELL_ID env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636107#comment-14636107 ] Hadoop QA commented on YARN-3874: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 21s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 13s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 21s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 58s | The patch appears to introduce 13 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 112m 28s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 14m 3s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 51m 42s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 30s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 221m 50s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | FindBugs | module:hadoop-yarn-server-timelineservice | | Failed unit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746423/YARN-3874-YARN-2928.02.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / eb1932d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8601/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8601/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8601/console | This message was automatically generated. > Optimize and synchronize FS Reader and Writer Implementations > - > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch, > YARN-3874-YARN-2928.02.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636089#comment-14636089 ] Naganarasimha G R commented on YARN-3045: - Test case failures and white space issue is not related to this patch. White space issue will try to rectify it with other review comments if any and for test case failure have raised YARN-3941, So either [~djp] or [~sjlee0] can further review this jira > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
yarn-issues@hadoop.apache.org
Naganarasimha G R created YARN-3951: --- Summary: Test case failures in TestLogAggregationService, TestResourceLocalizationService &TestContainer Key: YARN-3951 URL: https://issues.apache.org/jira/browse/YARN-3951 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Found some test case failures in YARN-3045 build which were not related to YARN-3045 patch TestContainer.testKillOnLocalizedWhenContainerNotLaunched {quote} java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer.testKillOnLocalizedWhenContainerNotLaunched(TestContainer.java:413) {quote} TestResourceLocalizationService.testLocalizationHeartbeat {quote} Wanted but not invoked: eventHandler.handle( ); -> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testLocalizationHeartbeat(TestResourceLocalizationService.java:900) Actually, there were zero interactions with this mock. {quote} TestResourceLocalizationService.testPublicResourceAddResourceExceptions {quote} java.lang.AssertionError: expected null, but was:<\{ \{ file:/local/PRIVATE/ef9783a7514fda92, 2411, FILE, null \},pending,\[(container_314159265358979_0003_01_42)\],2661055154305048,DOWNLOADING}> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotNull(Assert.java:664) at org.junit.Assert.assertNull(Assert.java:646) at org.junit.Assert.assertNull(Assert.java:656) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceAddResourceExceptions(TestResourceLocalizationService.java:1366) {quote} TestLogAggregationService.testLogAggregationCreateDirsFailsWithoutKillingNM {quote} org.mortbay.util.MultiException: Multiple exceptions at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.checkEvents(TestLogAggregationService.java:1046) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationCreateDirsFailsWithoutKillingNM(TestLogAggregationService.java:736) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell
[ https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3950: Attachment: YARN-3950.001.patch > Add unique SHELL_ID environment variable to DistributedShell > > > Key: YARN-3950 > URL: https://issues.apache.org/jira/browse/YARN-3950 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-3950.001.patch > > > As discussed in [this > comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027], > it would be useful to have a monotonically increasing and independent ID of > some kind that is unique per shell in the distributed shell program. > We can do that by adding a SHELL_ID env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell
Robert Kanter created YARN-3950: --- Summary: Add unique SHELL_ID environment variable to DistributedShell Key: YARN-3950 URL: https://issues.apache.org/jira/browse/YARN-3950 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter As discussed in [this comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027], it would be useful to have a monotonically increasing and independent ID of some kind that is unique per shell in the distributed shell program. We can do that by adding a SHELL_ID env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636068#comment-14636068 ] Varun Vasudev commented on YARN-3926: - Thanks for the feedback [~kasha]! I'm fine with changing the config variables nomenclature to the one you suggested. I just wanted to clarify that simply using the same config file won't avoid the questions you raised(specifically 2 and 3). The one way we can avoid (2) and (3) is have versions of the resources configs but I think that's a little complex. We could mitigate the issue by building some tools to verify that a proposed config file would work with the existing RM/NM. I'm open to suggestions. With regards to node labels, I had initial conversations with [~leftnoteasy] but I haven't thought through the model in enough detail. My initial thinking is that we would modify the ResourceMapEntry to add a string/list of strings which can be used to specify node labels. > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636058#comment-14636058 ] Li Lu commented on YARN-3908: - Hi [~sjlee0], I think the code you posted here belongs to timeline v1 (o.a.h.yarn.api.records.timeline.*), but the v2 version is in o.a.h.yarn.api.records.timelineservice.*. TimelineEvent in v2, modified in YARN-3836, does use id for all related tasks. We're no longer using event info for equality check in that version. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636057#comment-14636057 ] Sangjin Lee commented on YARN-3908: --- Sorry my bad. I mistakenly pulled up the v.1 of {{TimelineEvent}}. Our version uses only the id and the timestamp for equality: {code:title=TimelineEvent.java|borderStyle=solid} @Override public int hashCode() { int result = (int) (timestamp ^ (timestamp >>> 32)); result = 31 * result + id.hashCode(); return result; } @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof TimelineEvent)) return false; TimelineEvent event = (TimelineEvent) o; if (timestamp != event.timestamp) return false; if (!id.equals(event.id)) { return false; } return true; } @Override public int compareTo(TimelineEvent other) { if (timestamp > other.timestamp) { return -1; } else if (timestamp < other.timestamp) { return 1; } else { return id.compareTo(other.id); } } {code} So that answers my first question. Sorry for the confusion! Only the second question remains... > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
[ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636056#comment-14636056 ] Sangjin Lee commented on YARN-3949: --- For background, HBase's {{BufferedMutatorImpl}} does flush writes to region servers when the write buffer becomes full per configuration. In a high throughput situation, normally writes should appear on the storage in a timely manner. However, it still remains the case there is no hard guarantee that the data will be available on the storage by "X seconds/minutes". This problem will be more pronounced if the writer is mostly idle. Here is one proposal. First, we would introduce {{flush()}} on the {{TimelineWriter}} interface. Users of {{TimelineWriter}} would call {{flush()}} to be able to force flushing writes to the backend storage. Implementations of {{TimelineWriter}} would implement {{flush()}} as appropriate for the respective storage. In case of HBase, it would result in {{BufferedMutator.flush()}} for all tables. Second, we would implement periodic invocation of {{TimelineWriter.flush()}} in the layer that calls {{TimelineWriter}}, where the frequency of flush is configurable. For example, in the timeline collector we could have a background thread that calls {{TimelineWriter.flush()}} regularly. The {{flush()}} method may also be called for critical writes such as lifecycle events. In those cases, the timeline collector code could call {{TimelineWriter.write()}} followed by {{TimelineWriter.flush()}} before returning to the caller. Let me know what you think of the proposal. Thanks! > ensure timely flush of timeline writes > -- > > Key: YARN-3949 > URL: https://issues.apache.org/jira/browse/YARN-3949 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Currently flushing of timeline writes is not really handled. For example, > {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch > and write puts asynchronously. However, {{BufferedMutator}} may not flush > them to HBase unless the internal buffer fills up. > We do need a flush functionality first to ensure that data are written in a > reasonably timely manner, and to be able to ensure some critical writes are > done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636050#comment-14636050 ] Sangjin Lee commented on YARN-3908: --- Hmm, then isn't this incorrect? {code:title=TimelineEvent.java|borderStyle=solid} @Override public int compareTo(TimelineEvent other) { if (timestamp > other.timestamp) { return -1; } else if (timestamp < other.timestamp) { return 1; } else { return eventType.compareTo(other.eventType); } } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; TimelineEvent event = (TimelineEvent) o; if (timestamp != event.timestamp) return false; if (!eventType.equals(event.eventType)) return false; if (eventInfo != null ? !eventInfo.equals(event.eventInfo) : event.eventInfo != null) return false; return true; } @Override public int hashCode() { int result = (int) (timestamp ^ (timestamp >>> 32)); result = 31 * result + eventType.hashCode(); result = 31 * result + (eventInfo != null ? eventInfo.hashCode() : 0); return result; } {code} First of all, id is not even used. Instead type is used. Also, event info is part of the equality semantics. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636009#comment-14636009 ] Li Lu commented on YARN-3908: - Hi [~sjlee0], I don't think we're still using event type in new TimelineEvent v2. However, the behavior you mentioned is quite consistent with the v1 TimelineEvent. Could you please double check this? Thanks! > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635996#comment-14635996 ] Sangjin Lee commented on YARN-3908: --- [~jrottinghuis], [~vrushalic], and I had offline chats, and we feel that we may need to revisit how we store events. Currently (with this patch) we store the event with the column name "e!eventId?infoKey" and the column value being the info value. The event timestamp is stored as the cell timestamp. We're realizing that this may not be a correct way to store events. I'm basing this on the [discussion|https://issues.apache.org/jira/browse/YARN-3836?focusedCommentId=14619729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619729] we had when we talked about the equality and identity semantics of {{TimelineEvent}}. Namely, the id *and* the timestamp form the identity of a {{TimelineEvent}}. Then I think storing the timestamp in the HBase cell timestamp does not work. Some questions for you, [~zjshen] and [~gtCarrera9]. (1) *What defines the identity of a {{TimelineEvent}}?* Is it the event id + timestamp? How about the event type? If you look at the {{equals()}} and the {{hashCode()}} implementations of {{TimelineEvent}}, it uses the timestamp, the event type, and even the info as a whole, but the id is not used for equality. How does that square with the stated intent that the event id and the timestamp form the identity? (2) *What would be the access pattern* for {{TimelineEvents}}?* Is pretty much the only access pattern "give me all the events that belong to this entity"? Also specifically, would you ever query for an event with the id *and* the timestamp? It is not reasonable for readers to be able to provide the event timestamp for queries, right? Would you also query for just the event id? What other access patterns need to be supported? Clarifying those things would help us correctly implement the schema. Thanks! > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635991#comment-14635991 ] Wangda Tan commented on YARN-1645: -- +1 to latest patch, thanks [~mding]. > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1645-YARN-1197.3.patch, > YARN-1645-YARN-1197.4.patch, YARN-1645-YARN-1197.5.patch, YARN-1645.1.patch, > YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2921) Fix MockRM/MockAM#waitForState sleep too long
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2921. Resolution: Fixed > Fix MockRM/MockAM#waitForState sleep too long > - > > Key: YARN-2921 > URL: https://issues.apache.org/jira/browse/YARN-2921 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.6.0, 2.7.0 >Reporter: Karthik Kambatla >Assignee: Tsuyoshi Ozawa > Fix For: 2.8.0 > > Attachments: YARN-2921.001.patch, YARN-2921.002.patch, > YARN-2921.003.patch, YARN-2921.004.patch, YARN-2921.005.patch, > YARN-2921.006.patch, YARN-2921.007.patch, YARN-2921.008.patch, > YARN-2921.008.patch > > > MockRM#waitForState methods currently sleep for too long (2 seconds and 1 > second). This leads to slow tests and sometimes failures if the > App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2921) Fix MockRM/MockAM#waitForState sleep too long
[ https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reopened YARN-2921: > Fix MockRM/MockAM#waitForState sleep too long > - > > Key: YARN-2921 > URL: https://issues.apache.org/jira/browse/YARN-2921 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Affects Versions: 2.6.0, 2.7.0 >Reporter: Karthik Kambatla >Assignee: Tsuyoshi Ozawa > Fix For: 2.8.0 > > Attachments: YARN-2921.001.patch, YARN-2921.002.patch, > YARN-2921.003.patch, YARN-2921.004.patch, YARN-2921.005.patch, > YARN-2921.006.patch, YARN-2921.007.patch, YARN-2921.008.patch, > YARN-2921.008.patch > > > MockRM#waitForState methods currently sleep for too long (2 seconds and 1 > second). This leads to slow tests and sometimes failures if the > App/AppAttempt moves to another state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3641: --- Assignee: Junping Du (was: Allen Wittenauer) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3641: --- Reporter: Allen Wittenauer (was: Junping Du) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3641: --- Reporter: Junping Du (was: Allen Wittenauer) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Allen Wittenauer >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned YARN-3641: -- Assignee: Allen Wittenauer (was: Junping Du) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Allen Wittenauer >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635954#comment-14635954 ] Jian He commented on YARN-1645: --- looks good, +1 > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1645-YARN-1197.3.patch, > YARN-1645-YARN-1197.4.patch, YARN-1645-YARN-1197.5.patch, YARN-1645.1.patch, > YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3299) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-3299. - Resolution: Fixed > Synchronize RM and Generic History Service Web-UIs > -- > > Key: YARN-3299 > URL: https://issues.apache.org/jira/browse/YARN-3299 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > > After YARN-1809, we are using the same protocol to fetch the information and > display in their webUI. RM webUI will use ApplicationClientProtocol, and > Generic History Service web ui will use ApplicationHistoryProtocol. Both of > them extend the same protocol. > Also, we have common appblock/attemptblock/containerblock shared by both RM > webUI and ATS webUI. > But we are still missing some information, such as outstanding resource > requests, preemption metrics, etc. > This ticket will be used as parent ticket to track all the remaining issues > for RM webUI and ATS webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3299) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635940#comment-14635940 ] Xuan Gong commented on YARN-3299: - Resolving this umbrella JIRA. And as new requirements/bugs come in, we can open new tickets. * Will leave the open sub-tasks as they are. * No fix-version as this was done across releases. > Synchronize RM and Generic History Service Web-UIs > -- > > Key: YARN-3299 > URL: https://issues.apache.org/jira/browse/YARN-3299 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > > After YARN-1809, we are using the same protocol to fetch the information and > display in their webUI. RM webUI will use ApplicationClientProtocol, and > Generic History Service web ui will use ApplicationHistoryProtocol. Both of > them extend the same protocol. > Also, we have common appblock/attemptblock/containerblock shared by both RM > webUI and ATS webUI. > But we are still missing some information, such as outstanding resource > requests, preemption metrics, etc. > This ticket will be used as parent ticket to track all the remaining issues > for RM webUI and ATS webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635927#comment-14635927 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-trunk-Commit #8197 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8197/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, > YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Nativ
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635897#comment-14635897 ] Jian He commented on YARN-3878: --- thanks ! committing this. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, > YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() > [0x
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635892#comment-14635892 ] Anubhav Dhoot commented on YARN-3878: - Agree this is ok to ignore > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, > YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635879#comment-14635879 ] Varun Vasudev commented on YARN-3852: - Thanks for the latest patch [~ashahab]. Patch looks good to me, just a couple of minor changes - # In container-executor.c and container-executor.h {code} -int check_dir(const char* npath, mode_t st_mode, mode_t desired, int finalComponent) { +int check_dir(char* npath, mode_t st_mode, mode_t desired, int finalComponent) { {code} and {code} -int check_dir(const char* npath, mode_t st_mode, mode_t desired, +int check_dir(char* npath, mode_t st_mode, mode_t desired, int finalComponent); -int create_validate_dir(const char* npath, mode_t perm, const char* path, +int create_validate_dir(char* npath, mode_t perm, char* path, int finalComponent); {code} You've removed the const-ness of npath. # In container-executor.c {code} +int create_script_paths(const char *work_dir, + const char *script_name, const char *cred_file, + char** script_file_dest, char** cred_file_dest, + int* container_file_source, int* cred_file_source ) { {code} The rest of the patch looks good to me. > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852-1.patch, YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635876#comment-14635876 ] Junping Du commented on YARN-2019: -- If so, I think we should at least differentiate RM and NM policies - user could be conservative to RM state store failure but be aggressive to NM state store failure. May be using "yarn.resourcemanager.fail-fast" here? Then we can use "yarn.nodemanager.fail-fast" later and may for other daemons (timeline service, etc.). > Retrospect on decision of making RM crashed if any exception throw in > ZKRMStateStore > > > Key: YARN-2019 > URL: https://issues.apache.org/jira/browse/YARN-2019 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Jian He >Priority: Critical > Labels: ha > Attachments: YARN-2019.1-wip.patch > > > Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal > exception to crash RM down. As shown in YARN-1924, it could due to RM HA > internal bug itself, but not fatal exception. We should retrospect some > decision here as HA feature is designed to protect key component but not > disturb it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635872#comment-14635872 ] Abin Shahab commented on YARN-3852: --- The test failures are unrelated to the container-executor.c changes. > Add docker container support to container-executor > --- > > Key: YARN-3852 > URL: https://issues.apache.org/jira/browse/YARN-3852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Abin Shahab > Attachments: YARN-3852-1.patch, YARN-3852.patch > > > For security reasons, we need to ensure that access to the docker daemon and > the ability to run docker containers is restricted to privileged users ( i.e > users running applications should not have direct access to docker). In order > to ensure the node manager can run docker commands, we need to add docker > support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Summary: Optimize and synchronize FS Reader and Writer Implementations (was: Combine FS Reader and Writer Implementations) > Optimize and synchronize FS Reader and Writer Implementations > - > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch, > YARN-3874-YARN-2928.02.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Attachment: YARN-3874-YARN-2928.02.patch > Optimize and synchronize FS Reader and Writer Implementations > - > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch, > YARN-3874-YARN-2928.02.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635761#comment-14635761 ] Tsuyoshi Ozawa commented on YARN-3798: -- The test result is as follows: {quote} -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 48 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. {quote} A javadoc warning looks not related to the patch. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.006.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop
[jira] [Created] (YARN-3949) ensure timely flush of timeline writes
Sangjin Lee created YARN-3949: - Summary: ensure timely flush of timeline writes Key: YARN-3949 URL: https://issues.apache.org/jira/browse/YARN-3949 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}} may not flush them to HBase unless the internal buffer fills up. We do need a flush functionality first to ensure that data are written in a reasonably timely manner, and to be able to ensure some critical writes are done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635754#comment-14635754 ] Sangjin Lee commented on YARN-3908: --- I filed YARN-3949 to address the need for timely flush of writes. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel
[ https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635727#comment-14635727 ] Bibin A Chundatt commented on YARN-3932: [~leftnoteasy] checkstyle issue is already existing one and test failure are unrelated to this patch. > SchedulerApplicationAttempt#getResourceUsageReport should be based on > NodeLabel > --- > > Key: YARN-3932 > URL: https://issues.apache.org/jira/browse/YARN-3932 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-3932.patch, 0002-YARN-3932.patch, > 0003-YARN-3932.patch, 0004-YARN-3932.patch, 0005-YARN-3932.patch, > 0006-YARN-3932.patch, ApplicationReport.jpg, TestResult.jpg > > > Application Resource Report shown wrong when node Label is used. > 1.Submit application with NodeLabel > 2.Check RM UI for resources used > Allocated CPU VCores and Allocated Memory MB is always {{zero}} > {code} > public synchronized ApplicationResourceUsageReport getResourceUsageReport() { > AggregateAppResourceUsage runningResourceUsage = > getRunningAggregateAppResourceUsage(); > Resource usedResourceClone = > Resources.clone(attemptResourceUsage.getUsed()); > Resource reservedResourceClone = > Resources.clone(attemptResourceUsage.getReserved()); > return ApplicationResourceUsageReport.newInstance(liveContainers.size(), > reservedContainers.size(), usedResourceClone, reservedResourceClone, > Resources.add(usedResourceClone, reservedResourceClone), > runningResourceUsage.getMemorySeconds(), > runningResourceUsage.getVcoreSeconds()); > } > {code} > should be {{attemptResourceUsage.getUsed(label)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635721#comment-14635721 ] Hadoop QA commented on YARN-3460: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 25s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 49s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 0s | Tests passed in hadoop-yarn-registry. | | | | 40m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731250/YARN-3460-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5137b38 | | hadoop-yarn-registry test log | https://builds.apache.org/job/PreCommit-YARN-Build/8600/artifact/patchprocess/testrun_hadoop-yarn-registry.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8600/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8600/console | This message was automatically generated. > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.6.0 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel
[ https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635701#comment-14635701 ] Hadoop QA commented on YARN-3932: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 23m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 55s | The applied patch generated 1 new checkstyle issues (total was 186, now 186). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 21s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746389/0006-YARN-3932.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cf74772 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8599/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8599/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8599/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8599/console | This message was automatically generated. > SchedulerApplicationAttempt#getResourceUsageReport should be based on > NodeLabel > --- > > Key: YARN-3932 > URL: https://issues.apache.org/jira/browse/YARN-3932 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-3932.patch, 0002-YARN-3932.patch, > 0003-YARN-3932.patch, 0004-YARN-3932.patch, 0005-YARN-3932.patch, > 0006-YARN-3932.patch, ApplicationReport.jpg, TestResult.jpg > > > Application Resource Report shown wrong when node Label is used. > 1.Submit application with NodeLabel > 2.Check RM UI for resources used > Allocated CPU VCores and Allocated Memory MB is always {{zero}} > {code} > public synchronized ApplicationResourceUsageReport getResourceUsageReport() { > AggregateAppResourceUsage runningResourceUsage = > getRunningAggregateAppResourceUsage(); > Resource usedResourceClone = > Resources.clone(attemptResourceUsage.getUsed()); > Resource reservedResourceClone = > Resources.clone(attemptResourceUsage.getReserved()); > return ApplicationResourceUsageReport.newInstance(liveContainers.size(), > reservedContainers.size(), usedResourceClone, reservedResourceClone, > Resources.add(usedResourceClone, reservedResourceClone), > runningResourceUsage.getMemorySeconds(), > runningResourceUsage.getVcoreSeconds()); > } > {code} > should be {{attemptResourceUsage.getUsed(label)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635624#comment-14635624 ] Allen Wittenauer commented on YARN-3460: {code} +Method methodInitialize = +kerb5LoginObject.getClass().getMethod("initialize", Subject.class, CallbackHandler.class, Map.class, Map.class); +methodInitialize.invoke(kerb5LoginObject, subject, null, new HashMap(), options ); {code} There's a tab here. Would you mind removing it? If no one else has any other comments, I'll be committing this by the end of the day. P.S.: downloading the IBM JDK is an exercise in frustration. > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.6.0 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-433) When RM is catching up with node updates then it should not expire acquired containers
[ https://issues.apache.org/jira/browse/YARN-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635590#comment-14635590 ] Hadoop QA commented on YARN-433: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 19s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 51s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746382/YARN-433.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3b7ffc4 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8597/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8597/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8597/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8597/console | This message was automatically generated. > When RM is catching up with node updates then it should not expire acquired > containers > -- > > Key: YARN-433 > URL: https://issues.apache.org/jira/browse/YARN-433 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-433.1.patch, YARN-433.2.patch, YARN-433.3.patch, > YARN-433.4.patch > > > RM expires containers that are not launched within some time of being > allocated. The default is 10mins. When an RM is not keeping up with node > updates then it may not be aware of new launched containers. If the expire > thread fires for such containers then the RM can expire them even though they > may have launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.
[ https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635546#comment-14635546 ] Jason Lowe commented on YARN-3925: -- Nice catch, [~zxu]! I think the patch will work, but it is a bit convoluted to use a new dir allocator and a new config property as a roundabout way to call LocalDirAllocator.AllocatorPerContext.getLocalPathToRead which is not a very complicated function. LocalDirsHandlerService already has the list of Paths to check, so if we just refactored the getLocalPathToRead functionality into a reusable Path utility function to locate a subpath given a list of top-level paths to search it would be very straightforward. > ContainerLogsUtils#getContainerLogFile fails to read container log files from > full disks. > - > > Key: YARN-3925 > URL: https://issues.apache.org/jira/browse/YARN-3925 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3925.000.patch > > > ContainerLogsUtils#getContainerLogFile fails to read files from full disks. > {{getContainerLogFile}} depends on > {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but > {{LocalDirsHandlerService#getLogPathToRead}} calls > {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses > configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not > include full disks in {{LocalDirsHandlerService#checkDirs}}: > {code} > Configuration conf = getConfig(); > List localDirs = getLocalDirs(); > conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS, > localDirs.toArray(new String[localDirs.size()])); > List logDirs = getLogDirs(); > conf.setStrings(YarnConfiguration.NM_LOG_DIRS, > logDirs.toArray(new String[logDirs.size()])); > {code} > ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and > ContainerLogsPage.ContainersLogsBlock#render to read the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635534#comment-14635534 ] Bibin A Chundatt commented on YARN-3838: Hi [~xgong] any comments on this > Rest API failing when ip configured in RM address in secure https mode > -- > > Key: YARN-3838 > URL: https://issues.apache.org/jira/browse/YARN-3838 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, > 0001-YARN-3838.patch, 0002-YARN-3810.patch, 0002-YARN-3838.patch > > > Steps to reproduce > === > 1.Configure hadoop.http.authentication.kerberos.principal as below > {code:xml} > > hadoop.http.authentication.kerberos.principal > HTTP/_h...@hadoop.com > > {code} > 2. In RM web address also configure IP > 3. Startup RM > Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP > /ws/v1/cluster/info"}} > *Actual* > Rest API failing > {code} > 2015-06-16 19:03:49,845 DEBUG > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos credentails) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos credentails) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635531#comment-14635531 ] Robert Kanter commented on YARN-3528: - [~brahmareddy], are you still working on this? > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635512#comment-14635512 ] Varun Saxena commented on YARN-3874: Many are not related as well. > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635513#comment-14635513 ] Junping Du commented on YARN-3816: -- Thanks [~sjlee0] for review and comments! bq. If I understand correctly, this patch basically does a time integral of a given metric, or "the area under the curve" for the metric as a function of time. For example, if the underlying metric is a container CPU usage, the "aggregated" metric according to TimelineMetric.aggregateTo() would be a cumulative CPU usage over time for that container (in the units of CPU-millis). That's correct. As a poc patch for app aggregation, we only pick up some metric to aggregated in some way to demonstrate overall end-to-end flow. I understand there could be more important aggregated metrics there and I will try to add more in following patches. bq. While this is certainly a useful number to keep track of, this was not the app-level aggregation I had in mind. IMO, the app-level aggregation (or any aggregation for that matter) is all about rolling metrics up from child entities to the parent entity. I would have thought that it would be the first thing we want to get to. It looks, however, as though that aggregation is not done in this patch. I don't see any code that rolls up values from containers to the application. Are you planning to introduce that soon? Yes. I should add that part in poc v2 patch for taking a "snapshot" for resource consumption on an application. Previous area value is also kept for different purpose (resource billing/charge, etc.). bq. This type of time integral works only if the underlying metric is a gauge. For example, for any counter-like metric (e.g. HDFS bytes read) which is cumulative in nature, the time integral does not make sense. We will need to introduce another type dimension to the metrics that signifies whether it is a counter or a gauge, but this is just to note that the time integral works only for gauges. I agree that we should differentiate counter with gauge. For the previous one, we are more focus on its cumulative property while the later one is more focus on "snapshot". However, in practice, there are cases that some aggregated metrics has both properties, like "area" value here - we do need its cumulative values and also could be interested in getting values within a given time interval. Isn't it? bq. Also, this is pretty similar to what we talked about during the offline meeting as "average/max" for gauges, except that it's not divided over time. We discussed that we want to introduce time averages and maxes for gauges (see "time average & max" in https://issues.apache.org/jira/secure/attachment/12743390/aggregation-design-discussion.pdf). Are we thinking of replacing that with this? No. Nothing get changed on the design since our last discussions. The average and max is also important but I just haven't get bandwidth to add in poc stage as adding existing things could be more straight-forward. I will add it later. bq. In the specific case of container CPU usage, it seems to me that emitting the actual CPU time millis directly would be a far easier and more accurate way to capture this info. I believe it's readily available, and it would be a counter-like metric instead of a gauge. Therefore the time integral doesn't apply (as it already is one). But all you need to do at the app-level aggregation for it is just to sum it up. I recognize that this time integral would be useful for other things, but just wanted to point that out. Thanks for pointing that out. I agree this is more precisely and will update this in following patch. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, config
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635499#comment-14635499 ] zhihai xu commented on YARN-3591: - +1 for [~jlowe]'s comment. Yes, It fixes some problems we have today without creating new ones. > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch, YARN-3591.5.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3932) SchedulerApplicationAttempt#getResourceUsageReport should be based on NodeLabel
[ https://issues.apache.org/jira/browse/YARN-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3932: --- Attachment: 0006-YARN-3932.patch Hi [~leftnoteasy] attaching patch again to retrigger CI and for review > SchedulerApplicationAttempt#getResourceUsageReport should be based on > NodeLabel > --- > > Key: YARN-3932 > URL: https://issues.apache.org/jira/browse/YARN-3932 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-3932.patch, 0002-YARN-3932.patch, > 0003-YARN-3932.patch, 0004-YARN-3932.patch, 0005-YARN-3932.patch, > 0006-YARN-3932.patch, ApplicationReport.jpg, TestResult.jpg > > > Application Resource Report shown wrong when node Label is used. > 1.Submit application with NodeLabel > 2.Check RM UI for resources used > Allocated CPU VCores and Allocated Memory MB is always {{zero}} > {code} > public synchronized ApplicationResourceUsageReport getResourceUsageReport() { > AggregateAppResourceUsage runningResourceUsage = > getRunningAggregateAppResourceUsage(); > Resource usedResourceClone = > Resources.clone(attemptResourceUsage.getUsed()); > Resource reservedResourceClone = > Resources.clone(attemptResourceUsage.getReserved()); > return ApplicationResourceUsageReport.newInstance(liveContainers.size(), > reservedContainers.size(), usedResourceClone, reservedResourceClone, > Resources.add(usedResourceClone, reservedResourceClone), > runningResourceUsage.getMemorySeconds(), > runningResourceUsage.getVcoreSeconds()); > } > {code} > should be {{attemptResourceUsage.getUsed(label)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635483#comment-14635483 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746388/YARN-3798-branch-2.7.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3b7ffc4 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8598/console | This message was automatically generated. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.006.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > o
[jira] [Commented] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635482#comment-14635482 ] Varun Saxena commented on YARN-3874: Test failures are related. Will fix > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3798: - Attachment: YARN-3798-branch-2.7.006.patch [~zxu] thank you for the comment. Attaching a patch to address your comment. 1. Using rc == Code.OK.intValue() instead of rc == 0. 2. Calling Thread.currentThread().interrupt(); to restore the interrupted status after catching InterruptedException from syncInternal. > ZKRMStateStore shouldn't create new session without occurrance of > SESSIONEXPIED > --- > > Key: YARN-3798 > URL: https://issues.apache.org/jira/browse/YARN-3798 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 > Environment: Suse 11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Varun Saxena >Priority: Blocker > Attachments: RM.log, YARN-3798-2.7.002.patch, > YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, > YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, > YARN-3798-branch-2.7.006.patch, YARN-3798-branch-2.7.patch > > > RM going down with NoNode exception during create of znode for appattempt > *Please find the exception logs* > {code} > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-09 10:09:44,732 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-09 10:09:44,886 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Exception while executing a ZK operation. > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) > at java.lang.Thread.run(Thread.java:745) > 2015-06-09 10:09:44,887 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed > out ZK retries. Giving up! > 2015-06-09 10:09:44,887 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating appAttempt: appattempt_1433764310492_7152_01 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at org.apache.zo
[jira] [Commented] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635473#comment-14635473 ] Hadoop QA commented on YARN-3874: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 36s | Pre-patch YARN-2928 has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 1s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 53s | The applied patch generated 9 new checkstyle issues (total was 6, now 13). | | {color:green}+1{color} | whitespace | 0m 18s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 55s | The patch appears to introduce 6 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 109m 58s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:red}-1{color} | yarn tests | 6m 54s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 13m 37s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:red}-1{color} | yarn tests | 1m 23s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 177m 7s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | | Failed unit tests | hadoop.mapred.TestMRTimelineEventHandling | | | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | | | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl | | | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.rmapp.attempt.TestAMLivelinessMonitor | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstra
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635427#comment-14635427 ] Hudson commented on YARN-2003: -- FAILURE: Integrated in Hadoop-trunk-Commit #8193 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8193/]) YARN-2003. Support for Application priority : Changes in RM and Capacity Scheduler. (Sunil G via wangda) (wangda: rev c39ca541f498712133890961598bbff50d89d68b) * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hado
[jira] [Commented] (YARN-3915) scmadmin help message correction
[ https://issues.apache.org/jira/browse/YARN-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635429#comment-14635429 ] Hudson commented on YARN-3915: -- FAILURE: Integrated in Hadoop-trunk-Commit #8193 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8193/]) YARN-3915. scmadmin help message correction (Bibin A Chundatt via aw) (aw: rev da2d1ac4bc0bf0812b9a2a1ffbb7748113cdaf6d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/SCMAdmin.java > scmadmin help message correction > - > > Key: YARN-3915 > URL: https://issues.apache.org/jira/browse/YARN-3915 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 3.0.0 > > Attachments: 0001-YARN-3915.patch > > > Help message for scmadmin > *Actual* {{hadoop scmadmin}} *expected* {{yarn scmadmin}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
[ https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635428#comment-14635428 ] Hudson commented on YARN-3261: -- FAILURE: Integrated in Hadoop-trunk-Commit #8193 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8193/]) YARN-3261. rewrite resourcemanager restart doc to remove roadmap bits (Gururaj Shetty via aw) (aw: rev 3b7ffc4f3f0ffb0fa6c324da6d88803f5b233832) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md * hadoop-yarn-project/CHANGES.txt > rewrite resourcemanager restart doc to remove roadmap bits > --- > > Key: YARN-3261 > URL: https://issues.apache.org/jira/browse/YARN-3261 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Gururaj Shetty > Fix For: 3.0.0 > > Attachments: YARN-3261.01.patch > > > Another mixture of roadmap and instruction manual that seems to be ever > present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3948) Display Application Priority in RM Web UI
Sunil G created YARN-3948: - Summary: Display Application Priority in RM Web UI Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3941) Proportional Preemption policy should try to avoid sending duplicate PREEMPT_CONTAINER event to scheduler
[ https://issues.apache.org/jira/browse/YARN-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635421#comment-14635421 ] Hadoop QA commented on YARN-3941: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 29s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746368/0001-YARN-3941.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 29cf887b | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8596/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8596/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8596/console | This message was automatically generated. > Proportional Preemption policy should try to avoid sending duplicate > PREEMPT_CONTAINER event to scheduler > - > > Key: YARN-3941 > URL: https://issues.apache.org/jira/browse/YARN-3941 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3941.patch > > > Currently ProportionalCPP tries to send multiple PREEMPT_CONTAINER events to > scheduler during every cycle of preemption check till the container is either > forcefully killed or preempted by AM. > This can be throttled from ProportionalPreemptionPolicy to avoid excess of > events to scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3944) Connection refused to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3944: -- Priority: Critical (was: Blocker) > Connection refused to nodemanagers are retried at multiple levels > - > > Key: YARN-3944 > URL: https://issues.apache.org/jira/browse/YARN-3944 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3944.v1.patch > > > This is related to YARN-3238. When NM is down, ipc client will get > ConnectException. > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > However, retry happens at two layers(ipc retry 40 times and serverProxy > retrying 91 times), this could end up with ~1 hour retry interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635417#comment-14635417 ] Sunil G commented on YARN-2003: --- Thank you very much [~jianhe] and [~leftnoteasy] for support. > Support for Application priority : Changes in RM and Capacity Scheduler > --- > > Key: YARN-2003 > URL: https://issues.apache.org/jira/browse/YARN-2003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, > 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, > 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, > 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, > 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, > 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, > 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch, > 0021-YARN-2003.patch, 0022-YARN-2003.patch, 0023-YARN-2003.patch, > 0024-YARN-2003.patch > > > AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from > Submission Context and store. > Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
[ https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3261: --- Issue Type: Improvement (was: Bug) > rewrite resourcemanager restart doc to remove roadmap bits > --- > > Key: YARN-3261 > URL: https://issues.apache.org/jira/browse/YARN-3261 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Allen Wittenauer >Assignee: Gururaj Shetty > Attachments: YARN-3261.01.patch > > > Another mixture of roadmap and instruction manual that seems to be ever > present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-433) When RM is catching up with node updates then it should not expire acquired containers
[ https://issues.apache.org/jira/browse/YARN-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-433: --- Attachment: YARN-433.4.patch [~zxu] Thanks for the comments. Upload a new patch to address all the latest comments > When RM is catching up with node updates then it should not expire acquired > containers > -- > > Key: YARN-433 > URL: https://issues.apache.org/jira/browse/YARN-433 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-433.1.patch, YARN-433.2.patch, YARN-433.3.patch, > YARN-433.4.patch > > > RM expires containers that are not launched within some time of being > allocated. The default is 10mins. When an RM is not keeping up with node > updates then it may not be aware of new launched containers. If the expire > thread fires for such containers then the RM can expire them even though they > may have launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635380#comment-14635380 ] Allen Wittenauer commented on YARN-3641: I can't see how to change this from 'Pending Closed' to 'Fixed'. :( > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned YARN-3641: -- Assignee: Junping Du (was: Allen Wittenauer) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3641) NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen in stopping NM's sub-services.
[ https://issues.apache.org/jira/browse/YARN-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned YARN-3641: -- Assignee: Allen Wittenauer (was: Junping Du) > NodeManager: stopRecoveryStore() shouldn't be skipped when exceptions happen > in stopping NM's sub-services. > --- > > Key: YARN-3641 > URL: https://issues.apache.org/jira/browse/YARN-3641 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, rolling upgrade >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Allen Wittenauer >Priority: Critical > Fix For: 2.7.1 > > Attachments: YARN-3641.patch > > > If NM' services not get stopped properly, we cannot start NM with enabling NM > restart with work preserving. The exception is as following: > {noformat} > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource > temporarily unavailable > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:175) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:217) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:507) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:555) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: > lock /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: > Resource temporarily unavailable > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2015-05-12 00:34:45,262 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NodeManager at > c6403.ambari.apache.org/192.168.64.103 > / > {noformat} > The related code is as below in NodeManager.java: > {code} > @Override > protected void serviceStop() throws Exception { > if (isStopping.getAndSet(true)) { > return; > } > super.serviceStop(); > stopRecoveryStore(); > DefaultMetricsSystem.shutdown(); > } > {code} > We can see we stop all NM registered services (NodeStatusUpdater, > LogAggregationService, ResourceLocalizationService, etc.) first. Any of > services get stopped with exception could cause stopRecoveryStore() get > skipped which means levelDB store is not get closed. So next time NM start, > it will get failed with exception above. > We should put stopRecoveryStore(); in a finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635332#comment-14635332 ] Karthik Kambatla commented on YARN-3903: I would like to understand the request better. Is the request to avoid a specific queue preempt resources from other queues? Or, is it to avoid preempting resources from a specific queue? > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Assignee: Karthik Kambatla >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3903: --- Fix Version/s: (was: 3.0.0) > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3903) Disable preemption at Queue level for Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3903: --- Assignee: (was: Karthik Kambatla) > Disable preemption at Queue level for Fair Scheduler > > > Key: YARN-3903 > URL: https://issues.apache.org/jira/browse/YARN-3903 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0 > Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 > (2014-12-08) x86_64 >Reporter: He Tianyi >Priority: Trivial > Attachments: YARN-3093.1.patch, YARN-3093.2.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > YARN-2056 supports disabling preemption at queue level for CapacityScheduler. > As for fair scheduler, we recently encountered the same need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635326#comment-14635326 ] Karthik Kambatla commented on YARN-3943: Marking it critical, since some users see very poor performance when the disk utilization hovers around the max-disk-utilization mark. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3943: --- Priority: Critical (was: Major) > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3947) Add support for short host names in yarn decommissioning process
[ https://issues.apache.org/jira/browse/YARN-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635306#comment-14635306 ] Sunil G commented on YARN-3947: --- Hi [~aanand001c] I have a doubt here. If I configure {{yarn.nodemanager.hostname}} to a short_name, and assume my short name has a mapping in /etc/hosts or other DNS conf to its FQDN. Then I think I can use like {{short_name:port}}. Here port will be mandatory as we can run multiple node managers in same node. Thoughts? > Add support for short host names in yarn decommissioning process > > > Key: YARN-3947 > URL: https://issues.apache.org/jira/browse/YARN-3947 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Amit Anand >Priority: Minor > > When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} > doesn't like short host names for the nodes to be decommissioned in > {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present > to be able to successfully decommission a node. The decommissioning behavior > in {{HDFS}} is different as it can take short host names. > Below are the details of what I am seeing: > My {{yarn.exlcude}} has short name for the host name: > {code} > bcpc-vm1 > {code} > Running: > {code} > sudo -u yarn yarn rmadmin -refreshNodes > {code} > shows following entries in the log file: > {code} > 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found > resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml > 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting > the includes file to > 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting > the excludes file to /etc/hadoop/conf/yarn.exclude > 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: > Refreshing hosts (include/exclude) list > 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding > bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude > 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding > bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude > 2015-07-21 11:14:18,803 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS > {code} > And the node is not decommissioned. > When I add the {{FQDN}} for the host name the decommissioning works > successfully and I see following in the RM logs: > {code} > 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found > resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml > 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting > the includes file to > 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting > the excludes file to /etc/hadoop/conf/yarn.exclude > 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: > Refreshing hosts (include/exclude) list > 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding > bcpc-vm1.example.com to the list of excluded hosts from > /etc/hadoop/conf/yarn.exclude > 2015-07-21 11:14:43,456 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS > 2015-07-21 11:14:44,198 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: > bcpc-vm1.example.com:35197 > 2015-07-21 11:14:44,198 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED > 2015-07-21 11:14:44,199 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED > 2015-07-21 11:14:44,199 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Removed node bcpc-vm1.example.com:35197 cluster capacity: vCores:96> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635301#comment-14635301 ] Karthik Kambatla commented on YARN-2019: We could have two fail-fast configs - one for daemon and one for app/container. If we could do with general fail-fast configs, we should try and avoid adding component-specific configs; otherwise, we ll end up making configuring Yarn even harder. > Retrospect on decision of making RM crashed if any exception throw in > ZKRMStateStore > > > Key: YARN-2019 > URL: https://issues.apache.org/jira/browse/YARN-2019 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Jian He >Priority: Critical > Labels: ha > Attachments: YARN-2019.1-wip.patch > > > Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal > exception to crash RM down. As shown in YARN-1924, it could due to RM HA > internal bug itself, but not fatal exception. We should retrospect some > decision here as HA feature is designed to protect key component but not > disturb it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635292#comment-14635292 ] Jason Lowe commented on YARN-3591: -- Sorry for the delay, as I was on vacation and am still working through the backlog. An incremental improvement where we try to avoid using bad/non-existent resources for future containers but still fail to cleanup old resources on bad disks sounds fine to me. IIUC it fixes some problems we have today without creating new ones. > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch, YARN-3591.5.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3947) Add support for short host names in yarn decommissioning process
[ https://issues.apache.org/jira/browse/YARN-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Anand updated YARN-3947: - Description: When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: {code} bcpc-vm1 {code} Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} And the node is not decommissioned. When I add the {{FQDN}} for the host name the decommissioning works successfully and I see following in the RM logs: {code} 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1.example.com to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: bcpc-vm1.example.com:35197 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node bcpc-vm1.example.com:35197 cluster capacity: {code} was: When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: {code} bcpc-vm1 {code} Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=Admi
[jira] [Updated] (YARN-3947) Add support for short host names in yarn decommissioning process
[ https://issues.apache.org/jira/browse/YARN-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Anand updated YARN-3947: - Description: When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} And the node is not decommissioned. When I add the {{FQDN}} for the host name the decommissioning works successfully and I see following in the RM logs: {code} 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf.LAB-A/yarn-site.xml 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1.example.com to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: bcpc-vm1.example.com:35197 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node bcpc-vm1.example.com:35197 cluster capacity: {code} was: When running {yarn decommissioning} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {yarn.exclude} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCES
[jira] [Updated] (YARN-3947) Add support for short host names in yarn decommissioning process
[ https://issues.apache.org/jira/browse/YARN-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Anand updated YARN-3947: - Description: When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: {code} bcpc-vm1 {code} Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} And the node is not decommissioned. When I add the {{FQDN}} for the host name the decommissioning works successfully and I see following in the RM logs: {code} 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf.LAB-A/yarn-site.xml 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1.example.com to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: bcpc-vm1.example.com:35197 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node bcpc-vm1.example.com:35197 cluster capacity: {code} was: When running {{yarn decommissioning}} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {{yarn.exclude}} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService
[jira] [Updated] (YARN-3947) Add support for short host names in yarn decommissioning process
[ https://issues.apache.org/jira/browse/YARN-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Anand updated YARN-3947: - Description: When running {yarn decommissioning} the {{yarn rmadmin -refreshNodes}} doesn't like short host names for the nodes to be decommissioned in {yarn.exclude} file. It requires {{FQDN}} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {{HDFS}} is different as it can take short host names. Below are the details of what I am seeing: My {{yarn.exlcude}} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} And the node is not decommissioned. When I add the {{FQDN}} for the host name the decommissioning works successfully and I see following in the RM logs: {code} 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf.LAB-A/yarn-site.xml 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1.example.com to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: bcpc-vm1.example.com:35197 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node bcpc-vm1.example.com:35197 cluster capacity: {code} was: When running {yarn decommissioning} the {yarn rmadmin -refreshNodes} doesn't like short host names for the nodes to be decommissioned in {yarn.exclude} file. It requires {FQDN} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {HDFS} is different as it can take short host names. Below are the details of what I am seeing: My {yarn.exlcude} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} An
[jira] [Created] (YARN-3947) Add support for short host names in yarn decommissioning process
Amit Anand created YARN-3947: Summary: Add support for short host names in yarn decommissioning process Key: YARN-3947 URL: https://issues.apache.org/jira/browse/YARN-3947 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Amit Anand Priority: Minor When running {yarn decommissioning} the {yarn rmadmin -refreshNodes} doesn't like short host names for the nodes to be decommissioned in {yarn.exclude} file. It requires {FQDN} for the host name to be present to be able to successfully decommission a node. The decommissioning behavior in {HDFS} is different as it can take short host names. Below are the details of what I am seeing: My {yarn.exlcude} has short name for the host name: bcpc-vm1 Running: {code} sudo -u yarn yarn rmadmin -refreshNodes {code} shows following entries in the log file: {code} 2015-07-21 11:14:18,795 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf/yarn-site.xml 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:18,802 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1 to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:18,803 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.0.100.12 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS {code} And the node is not decommissioned. When I add the {FQDN} for the host name the decommissioning works successfully and I see following in the RM logs: {code} 2015-07-21 11:14:43,453 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/etc/hadoop/conf.LAB-A/yarn-site.xml 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2015-07-21 11:14:43,456 INFO org.apache.hadoop.util.HostsFileReader: Adding bcpc-vm1.example.com to the list of excluded hosts from /etc/hadoop/conf/yarn.exclude 2015-07-21 11:14:43,456 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn IP=10.100.0.11 OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager nodeId: bcpc-vm1.example.com:35197 hostname: bcpc-vm1.example.com:35197 2015-07-21 11:14:44,198 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node bcpc-vm1.example.com:35197 as it is now DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: bcpc-vm1.example.com:35197 Node Transitioned from RUNNING to DECOMMISSIONED 2015-07-21 11:14:44,199 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node bcpc-vm1.example.com:35197 cluster capacity: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated
[ https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635246#comment-14635246 ] Nathan Roberts commented on YARN-3945: -- My feeling is the documentation on minimum-user-limit-percent needs a rewrite. It makes it sound like minimum-user-limit-percent caps the amount of resource to say 50% if there are 2 applications submitted to the queue. This isn't the case (afaik). My understanding is that it tries to guarantee all active applications this percentage of a queue's capacity (configured or current, whichever is larger). Note: an active application is one that is currently requesting resources, a running application that has all the resources it needs, is NOT active. If one application stops asking for additional resources, the other applications can certainly go higher than the 50%. user-limit-factor is what determines the absolute maximum capacity a user can consume within a queue. Basically, minimum-user-limit percent defines how fair the queue is. The lower the value, the sooner the queue will try to spread resources evenly across all users in the queue. The higher the value, the more fifo it behaves. > maxApplicationsPerUser is wrongly calculated > > > Key: YARN-3945 > URL: https://issues.apache.org/jira/browse/YARN-3945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > maxApplicationsPerUser is currently calculated based on the formula > {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * > userLimitFactor)}} but description of userlimit is > {quote} > Each queue enforces a limit on the percentage of resources allocated to a > user at any given time, if there is demand for resources. The user limit can > vary between a minimum and maximum value.{color:red} The the former (the > minimum value) is set to this property value {color} and the latter (the > maximum value) depends on the number of users who have submitted > applications. For e.g., suppose the value of this property is 25. If two > users have submitted applications to a queue, no single user can use more > than 50% of the queue resources. If a third user submits an application, no > single user can use more than 33% of the queue resources. With 4 or more > users, no user can use more than 25% of the queues resources. A value of 100 > implies no user limits are imposed. The default is 100. Value is specified as > a integer. > {quote} > configuration related to minimum limit should not be made used in a formula > to calculate max applications for a user -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3941) Proportional Preemption policy should try to avoid sending duplicate PREEMPT_CONTAINER event to scheduler
[ https://issues.apache.org/jira/browse/YARN-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3941: -- Attachment: 0001-YARN-3941.patch Hi [~leftnoteasy] Uploading an initial version of patch. *testExpireKill* already handles the verification of this change. Multiple PREEMPT_CONTAINER event will not be raised for same container for each interval check of ProportionalCPP. > Proportional Preemption policy should try to avoid sending duplicate > PREEMPT_CONTAINER event to scheduler > - > > Key: YARN-3941 > URL: https://issues.apache.org/jira/browse/YARN-3941 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3941.patch > > > Currently ProportionalCPP tries to send multiple PREEMPT_CONTAINER events to > scheduler during every cycle of preemption check till the container is either > forcefully killed or preempted by AM. > This can be throttled from ProportionalPreemptionPolicy to avoid excess of > events to scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635202#comment-14635202 ] Sumit Nigam commented on YARN-3946: --- Hi [~varun_saxena] - Yes, the idea is not to only debug the issue (which you rightly mentioned, Admin can). I am currently on 2.6.0 and will try 2.7.0 when I can, for sure. There are too many reasons to be able to correlate as to what may have happened - AM level, resource level, queue level, possibly a combination of these, etc. A programmatic API is also useful to apply corrective measures - say, I can program to submit my app to a whole new queue altogether, etc. after I notice it is queue level capacity issue or try reserving container, etc - all programatically! Another important use case is that of attempting to submit the app (say, through own AM) and after a period of remaining in ACCEPTED state, reporting back automatically as to why the state remains so. A REST API is extremely useful in such a case. With this, it would be possible to to even ascertain when a job moves to ACCEPTED state from RUNNING state itself (RM restart, AM crash + restart). Again, this currently requires looking through logs / UI to ascertain what happened. In esp big clusters, this is indeed non-trivial. I'd agree with Nagannarasimha that we should be able to know that without administrative understanding of the same. Plus, I am not working on this. > Allow fetching exact reason as to why a submitted app is in ACCEPTED state. > --- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635181#comment-14635181 ] Varun Saxena commented on YARN-3814: bq. For example, characters such as ":" are reserved characters in URL that are not directly allowed in queries or other parts of the URL. They should always be properly encoded (e.g. "%3A"). [~sjlee0], I think this tends to depend on whether the class used to construct URL at client is RFC-2396 compliant or not. I am for instance in my unit tests able to send both ":" and "%3A". And both are interpreted as ":" at server side. We anyways were using ":" even in ATSv1 so behavior would be same for current users. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635152#comment-14635152 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 56s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 8s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 6s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 53m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746328/YARN-3045-YARN-2928.006.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / eb1932d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/whitespace.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8594/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8594/console | This message was automatically generated. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635115#comment-14635115 ] Varun Saxena commented on YARN-3874: [~sjlee0], [~zjshen], kindly review. Following has been done : # Both FS reader and writer implementations have been made consistent with each other. Classes have not yet been combined as no final decision was taken on it in YARN-3051. We can decide if we will combine the classes or not. # Moved some of the code common to reader and writer impls into {{TimelineStorageUtils}} # Writer impl will now write app flow mapping file. # As [~zjshen] said in YARN-3051, used {{FileSystem}} class so that it can be used for HDFS as well. # Entity files would now have created time in the file name. This will be used to limit the entities returned and do filtering based on created time. # Modified times as of now, wont be part of file name as we will have to consistently change filenames if we do that. If required, that can be included as well. We can use rename operation. Thoughts ? # Writer would choose the file based on entity ID if created time is not given in request. If created time is 0 or negative and file for an entity id is being written for the first time, an error will be sent back. # App flow mapping will be cached in reader and writer on start. And entries will be added in reader whenever mapping has to be queried from file > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Attachment: YARN-3874-YARN-2928.01.patch > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Attachment: (was: YARN-3874-YARN-2928.01.patch) > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Attachment: YARN-3874-YARN-2928.01.patch > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3874) Combine FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3874: --- Attachment: (was: YARN-3874-YARN-2928.01.patch) > Combine FS Reader and Writer Implementations > > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)