[jira] [Resolved] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed
[ https://issues.apache.org/jira/browse/YARN-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen resolved YARN-4838. -- Resolution: Duplicate > TestLogAggregationService. testLocalFileDeletionOnDiskFull failed > - > > Key: YARN-4838 > URL: https://issues.apache.org/jira/browse/YARN-4838 > Project: Hadoop YARN > Issue Type: Test > Components: log-aggregation >Reporter: Haibo Chen >Assignee: Haibo Chen > > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService > testLocalFileDeletionOnDiskFull failed > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288) > The failure is caused by the time issue of DeletionService. DeletionService > runs its only thread pool to delete files. When verfiyLocalFileDeletion() > method checks file existence, it is possible that the FileDeletionTask has > been executed by the thread pool in DeletionService. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5841) Report only local collectors on node upon resync with RM after RM fails over
Varun Saxena created YARN-5841: -- Summary: Report only local collectors on node upon resync with RM after RM fails over Key: YARN-5841 URL: https://issues.apache.org/jira/browse/YARN-5841 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Saxena As per discussion on YARN-3359, we can potentially optimize reporting of collectors to RM after RM fails over. Currently NM would report all the collectors known to itself in HB after resync with RM. This would mean many NMs' may pretty much report similar set of collector infos in first NM HB on reconnection. This JIRA is to explore how to optimize this flow and if possible, fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5840) Yarn queues not being tracked correctly by Yarn Timeline
ramtin created YARN-5840: Summary: Yarn queues not being tracked correctly by Yarn Timeline Key: YARN-5840 URL: https://issues.apache.org/jira/browse/YARN-5840 Project: Hadoop YARN Issue Type: Bug Reporter: ramtin Assignee: Weiwei Yang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5839) ClusterApplication API does not include a ReservationID
Sean Po created YARN-5839: - Summary: ClusterApplication API does not include a ReservationID Key: YARN-5839 URL: https://issues.apache.org/jira/browse/YARN-5839 Project: Hadoop YARN Issue Type: Improvement Reporter: Sean Po Assignee: Sean Po Currently, the ClusterApplication, and ClusterApplications API does not allow users to find the reservation queue that an application is running in. YARN-5435 proposes to add ReservationId to the ClusterApplication and ClusterApplications API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4998) Minor cleanup to UGI use in AdminService
[ https://issues.apache.org/jira/browse/YARN-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved YARN-4998. - Resolution: Fixed > Minor cleanup to UGI use in AdminService > > > Key: YARN-4998 > URL: https://issues.apache.org/jira/browse/YARN-4998 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Trivial > Attachments: YARN-4998.001.patch, YARN-4998.002.patch > > > Instead of calling {{UserGroupInformation.getCurrentUser()}} over and over, > we should just use the stored {{daemonUser}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Merge YARN-3368 (new web UI) to trunk
Thanks for all the effort, Wangda! New Web UI looks very nice. I deployed in a 10-node cluster, ran some sample jobs, and then also changed the capacity scheduler configuration adding some extra queues. Everything worked well and looked good. +1 from me (non-binding). Thanks, Konstantinos On Thu, Nov 3, 2016 at 2:30 AM, Sreenath Somarajapuram < ssomarajapu...@hortonworks.com> wrote: > > Tried out the UI in my local cluster, and looks good to me. > +1(non-binding) > > Thanks, > Sreenath > > > On 11/1/16, 11:13 PM, "Sunil Govind"wrote: > > >Thank you very much Wangda for starting the vote thread. > > > >I have launched new UI in a 5 node cluster and tested. All pages are > >loading fine and provides necessary information. > >I think new UI will be a very good addition and it can provide a lot of > >information which user seeking for in much easier way compared to now. > > > >+1(non-binding) from me. > > > >Thanks, > >Sunil > > > > > >On Tue, Nov 1, 2016 at 4:24 AM Wangda Tan wrote: > > > >> YARN Devs, > >> > >> We propose to merge YARN-3368 (YARN next generation web UI) development > >> branch into trunk for better development, would like to hear your > >>thoughts > >> before sending out vote mail. > >> > >> The new UI will co-exist with the old YARN UI, by default it is > >>disabled. > >> Please refer to User documentation of the new YARN UI > >> < > >> > >>https://github.com/apache/hadoop/blob/YARN-3368/hadoop- > yarn-project/hadoo > >>p-yarn/hadoop-yarn-site/src/site/markdown/YarnUI2.md > >> > > >> for > >> more details. > >> > >> In addition, There¹re two working-in-progress features need the new UI > >>to > >> be merged to trunk for further development. > >> > >> 1) UI of YARN Timeline Server v2 (YARN-2928) > >> 2) UI of YARN ResourceManager Federation (YARN-2915). > >> > >> *Status of YARN next generation web UI* > >> > >> Completed features > >> > >>- Cluster Overview Page > >>- Scheduler page > >>- Applications / Application / Application-attempts pages > >>- Nodes / Node page > >> > >> Integration to YARN > >> > >>- Hosts new web UI in RM > >>- Integrates to maven build / package > >> > >> Miscs: > >> > >>- Added dependencies to LICENSE.txt/NOTICE.txt > >>- Documented how to use it. (In > >>hadoop-yarn-project/hadoop-yarn/hadoop- > >>yarn-site/src/site/markdown/YarnUI2.md) > >> > >> Major items will finish on trunk: > >> > >>- Security support > >> > >> We have run the new UI in our internal cluster for more than 3 months, > >>lots > >> of people have tried the new UI and gave lots of valuable feedbacks and > >> reported suggestions / issues to us. We fixed many of them so now we > >> believe it is more ready for wider folks to try. > >> > >> Merge JIRA for Jenkins is: > >>https://issues.apache.org/jira/browse/YARN-4734 > >> . > >> The latest Jenkins run > >> < > >> > >>https://issues.apache.org/jira/browse/YARN-4734? > focusedCommentId=15620808 > >>=com.atlassian.jira.plugin.system.issuetabpanels: > comment-tabpanel#co > >>mment-15620808 > >> > > >> gave > >> +1. > >> > >> The vote will run for 7 days, ending Sun, 11/06/2016. Please feel free > >>to > >> comment if you have any questions/doubts. I'll start with my +1 > >>(binding). > >> > >> Please share your thoughts about this. > >> > >> Thanks, > >> Wangda > >> > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
Re: [DISCUSS] Release cadence and EOL
I'm certainly willing to try this policy. There's definitely room for improvement when it comes to streamlining the release process. The create-release script that Allen wrote helps, but there are still a lot of manual steps in HowToRelease for staging and publishing a release. Another perennial problem is reconciling git log with the changes and release notes and JIRA information. I think each RM has written their own scripts for this, but it could probably be automated into a Jenkins report. And the final problem is that branches are often not in a releasable state. This is because we don't have any upstream integration testing. For instance, testing with 3.0.0-alpha1 has found a number of latent incompatibilities in the 2.8.0 branch. If we want to meaningfully speed up the minor release cycle, continuous integration testing is a must. Best, Andrew On Fri, Nov 4, 2016 at 10:33 AM, Sangjin Leewrote: > Thanks for your thoughts and more data points Andrew. > > I share your concern that the proposal may be more aggressive than what we > have been able to accomplish so far. I'd like to hear from the community > what is a desirable release cadence which is still within the realm of the > possible. > > The EOL policy can also be a bit of a forcing function. By having a > defined EOL, hopefully it would prod the community to move faster with > releases. Of course, automating releases and testing should help. > > > On Tue, Nov 1, 2016 at 4:31 PM, Andrew Wang > wrote: > >> Thanks for pushing on this Sangjin. The proposal sounds reasonable. >> >> However, for it to have teeth, we need to be *very* disciplined about the >> release cadence. Looking at our release history, we've done 4 maintenance >> releases in 2016 and no minor releases. 2015 had 4 maintenance and 1 minor >> release. The proposal advocates for 12 maintenance releases and 2 minors >> per year, or about 3.5x more releases than we've historically done. I >> think >> achieving this will require significantly streamlining our release and >> testing process. >> >> For some data points, here are a few EOL lifecycles for some major >> projects. They talk about support in terms of time (not number of >> releases), and release on a cadence. >> >> Ubuntu maintains LTS for 5 years: >> https://www.ubuntu.com/info/release-end-of-life >> >> Linux LTS kernels have EOLs ranging from 2 to 6 years, though it seems >> only >> one has actually ever been EOL'd: >> https://www.kernel.org/category/releases.html >> >> Mesos supports minor releases for 6 months, with a new minor every 2 >> months: >> http://mesos.apache.org/documentation/latest/versioning/ >> >> Eclipse maintains each minor for ~9 months before moving onto a new minor: >> http://stackoverflow.com/questions/35997352/how-to-determine >> -end-of-life-for-eclipse-versions >> >> >> >> On Fri, Oct 28, 2016 at 10:55 AM, Sangjin Lee wrote: >> >> > Reviving an old thread. I think we had a fairly concrete proposal on the >> > table that we can vote for. >> > >> > The proposal is a minor release on the latest major line every 6 months, >> > and a maintenance release on a minor release (as there may be >> concurrently >> > maintained minor releases) every 2 months. >> > >> > A minor release line is EOLed 2 years after it is first released or >> there >> > are 2 newer minor releases, whichever is sooner. The community reserves >> the >> > right to extend or shorten the life of a release line if there is a good >> > reason to do so. >> > >> > Comments? Objections? >> > >> > Regards, >> > Sangjin >> > >> > >> > On Tue, Aug 23, 2016 at 9:33 AM, Karthik Kambatla >> > wrote: >> > >> > > >> > >> Here is just an idea to get started. How about "a minor release line >> is >> > >> EOLed 2 years after it is released or there are 2 newer minor >> releases, >> > >> whichever is sooner. The community reserves the right to extend or >> > shorten >> > >> the life of a release line if there is a good reason to do so." >> > >> >> > >> >> > > Sounds reasonable, especially for our first commitment. For current >> > > releases, this essentially means 2.6.x is maintained until Nov 2016 >> and >> > Apr >> > > 2017 if 2.8 and 2.9 are not released by those dates. >> > > >> > > IIUC EOL does two things - (1) eases the maintenance cost for >> developers >> > > past EOL, and (2) indicates to the user when they must upgrade by. For >> > the >> > > latter, would users appreciate a specific timeline without any caveats >> > for >> > > number of subsequent minor releases? >> > > >> > > If we were to give folks a specific period for EOL for x.y.z, we >> should >> > > plan on releasing at least x.y+1.1 by then. 2 years might be a good >> > number >> > > to start with given our current cadence, and adjusted in the future as >> > > needed. >> > > >> > > >> > > >> > >> > >
Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/ [Nov 3, 2016 5:03:43 PM] (vvasudev) YARN-5822. Log ContainerRuntime initialization error in [Nov 3, 2016 5:27:52 PM] (arp) HDFS-11097. Fix warnings for deprecated StorageReceivedDeletedBlocks [Nov 3, 2016 6:16:07 PM] (xiao) HADOOP-13787. Azure testGlobStatusThrowsExceptionForUnreadableDir fails. [Nov 3, 2016 8:09:03 PM] (xiao) HADOOP-12453. Support decoding KMS Delegation Token with its own [Nov 3, 2016 8:49:10 PM] (liuml07) HDFS-11076. Add unit test for extended Acls. Contributed by Chen Liang [Nov 3, 2016 10:31:07 PM] (wang) HADOOP-11088. Quash unnecessary safemode WARN message during NameNode [Nov 4, 2016 3:01:43 AM] (vinayakumarb) HDFS-11098. Datanode in tests cannot start in Windows after HDFS-10638 [Nov 4, 2016 10:37:28 AM] (Sunil) YARN-5802. updateApplicationPriority api in scheduler should ensure to -1 overall The following subsystems voted -1: compile unit The following subsystems voted -1 but were configured to be filtered/ignored: cc javac The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer hadoop.hdfs.TestFileLengthOnClusterRestart hadoop.hdfs.TestFileAppend3 hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService hadoop.yarn.server.nodemanager.TestNodeManagerShutdown hadoop.yarn.server.timeline.TestRollingLevelDB hadoop.yarn.server.timeline.TestTimelineDataManager hadoop.yarn.server.timeline.TestLeveldbTimelineStore hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer hadoop.yarn.server.timelineservice.storage.common.TestRowKeys hadoop.yarn.server.timelineservice.storage.common.TestKeyConverters hadoop.yarn.server.timelineservice.storage.common.TestSeparator hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore hadoop.yarn.server.resourcemanager.TestRMRestart hadoop.yarn.server.resourcemanager.TestResourceTrackerService hadoop.yarn.server.TestMiniYarnClusterNodeUtilization hadoop.yarn.server.TestContainerManagerSecurity hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun hadoop.yarn.server.timelineservice.storage.TestPhoenixOfflineAggregationWriterImpl hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapred.TestShuffleHandler hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService Timed out junit tests : org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache org.apache.hadoop.tools.TestHadoopArchives compile: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-compile-root.txt [316K] cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-compile-root.txt [316K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-compile-root.txt [316K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [200K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [52K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt [52K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice.txt [20K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/145/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
[jira] [Created] (YARN-5838) windows - environement variables aren't accessible on Yarn 3.0 alpha-1
Kanthirekha created YARN-5838: - Summary: windows - environement variables aren't accessible on Yarn 3.0 alpha-1 Key: YARN-5838 URL: https://issues.apache.org/jira/browse/YARN-5838 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha1 Environment: windows 7 Reporter: Kanthirekha windows environment variables aren't accessible on Yarn 3.0 alpha-1 tried fetching %Path% from Application master and on the container script (after a container is allocated by application master for task executions) echo %Path% result : is echo is on it returns blank . Could you please let us know what are the necessary steps to access env variables from yarn 3.0 aplha1 version ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5759) Capability to register for a notification/callback on the expiry of timeouts
[ https://issues.apache.org/jira/browse/YARN-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-5759. --- Resolution: Duplicate Agree with [~rohithsharma]. Closing this as a dup of YARN-2261. > Capability to register for a notification/callback on the expiry of timeouts > > > Key: YARN-5759 > URL: https://issues.apache.org/jira/browse/YARN-5759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Gour Saha > > There is a need for the YARN native services REST-API service, to take > certain actions once a timeout of an application expires. For example, an > immediate requirement is to destroy a Slider application, once its lifetime > timeout expires and YARN has stopped the application. Destroying a Slider > application means cleanup of Slider HDFS state store and ZK paths for that > application. > Potentially, there will be advanced requirements from the REST-API service > and other services in the future, which will make this feature very handy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Release cadence and EOL
Thanks for your thoughts and more data points Andrew. I share your concern that the proposal may be more aggressive than what we have been able to accomplish so far. I'd like to hear from the community what is a desirable release cadence which is still within the realm of the possible. The EOL policy can also be a bit of a forcing function. By having a defined EOL, hopefully it would prod the community to move faster with releases. Of course, automating releases and testing should help. On Tue, Nov 1, 2016 at 4:31 PM, Andrew Wangwrote: > Thanks for pushing on this Sangjin. The proposal sounds reasonable. > > However, for it to have teeth, we need to be *very* disciplined about the > release cadence. Looking at our release history, we've done 4 maintenance > releases in 2016 and no minor releases. 2015 had 4 maintenance and 1 minor > release. The proposal advocates for 12 maintenance releases and 2 minors > per year, or about 3.5x more releases than we've historically done. I think > achieving this will require significantly streamlining our release and > testing process. > > For some data points, here are a few EOL lifecycles for some major > projects. They talk about support in terms of time (not number of > releases), and release on a cadence. > > Ubuntu maintains LTS for 5 years: > https://www.ubuntu.com/info/release-end-of-life > > Linux LTS kernels have EOLs ranging from 2 to 6 years, though it seems only > one has actually ever been EOL'd: > https://www.kernel.org/category/releases.html > > Mesos supports minor releases for 6 months, with a new minor every 2 > months: > http://mesos.apache.org/documentation/latest/versioning/ > > Eclipse maintains each minor for ~9 months before moving onto a new minor: > http://stackoverflow.com/questions/35997352/how-to- > determine-end-of-life-for-eclipse-versions > > > > On Fri, Oct 28, 2016 at 10:55 AM, Sangjin Lee wrote: > > > Reviving an old thread. I think we had a fairly concrete proposal on the > > table that we can vote for. > > > > The proposal is a minor release on the latest major line every 6 months, > > and a maintenance release on a minor release (as there may be > concurrently > > maintained minor releases) every 2 months. > > > > A minor release line is EOLed 2 years after it is first released or there > > are 2 newer minor releases, whichever is sooner. The community reserves > the > > right to extend or shorten the life of a release line if there is a good > > reason to do so. > > > > Comments? Objections? > > > > Regards, > > Sangjin > > > > > > On Tue, Aug 23, 2016 at 9:33 AM, Karthik Kambatla > > wrote: > > > > > > > >> Here is just an idea to get started. How about "a minor release line > is > > >> EOLed 2 years after it is released or there are 2 newer minor > releases, > > >> whichever is sooner. The community reserves the right to extend or > > shorten > > >> the life of a release line if there is a good reason to do so." > > >> > > >> > > > Sounds reasonable, especially for our first commitment. For current > > > releases, this essentially means 2.6.x is maintained until Nov 2016 and > > Apr > > > 2017 if 2.8 and 2.9 are not released by those dates. > > > > > > IIUC EOL does two things - (1) eases the maintenance cost for > developers > > > past EOL, and (2) indicates to the user when they must upgrade by. For > > the > > > latter, would users appreciate a specific timeline without any caveats > > for > > > number of subsequent minor releases? > > > > > > If we were to give folks a specific period for EOL for x.y.z, we should > > > plan on releasing at least x.y+1.1 by then. 2 years might be a good > > number > > > to start with given our current cadence, and adjusted in the future as > > > needed. > > > > > > > > > > > >
[jira] [Created] (YARN-5837) NPE when getting node status of a decommissioned node after an RM restart
Robert Kanter created YARN-5837: --- Summary: NPE when getting node status of a decommissioned node after an RM restart Key: YARN-5837 URL: https://issues.apache.org/jira/browse/YARN-5837 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha1, 2.7.3 Reporter: Robert Kanter Assignee: Robert Kanter If you decommission a node, the {{yarn node}} command shows it like this: {noformat} >> bin/yarn node -list -all 2016-11-04 08:54:37,169 INFO client.RMProxy: Connecting to ResourceManager at 0.0.0.0/0.0.0.0:8032 Total Nodes:1 Node-Id Node-State Node-Http-Address Number-of-Running-Containers 192.168.1.69:57560 DECOMMISSIONED 192.168.1.69:8042 0 {noformat} And a full report like this: {noformat} >> bin/yarn node -status 192.168.1.69:57560 2016-11-04 08:55:08,928 INFO client.RMProxy: Connecting to ResourceManager at 0.0.0.0/0.0.0.0:8032 Node Report : Node-Id : 192.168.1.69:57560 Rack : /default-rack Node-State : DECOMMISSIONED Node-Http-Address : 192.168.1.69:8042 Last-Health-Update : Fri 04/Nov/16 08:53:58:802PDT Health-Report : Containers : 0 Memory-Used : 0MB Memory-Capacity : 8192MB CPU-Used : 0 vcores CPU-Capacity : 8 vcores Node-Labels : Resource Utilization by Node : Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0 {noformat} If you then restart the ResourceManager, you get this report: {noformat} >> bin/yarn node -list -all 2016-11-04 08:57:18,512 INFO client.RMProxy: Connecting to ResourceManager at 0.0.0.0/0.0.0.0:8032 Total Nodes:4 Node-Id Node-State Node-Http-Address Number-of-Running-Containers 192.168.1.69:-1 DECOMMISSIONED 192.168.1.69:-1 0 {noformat} And when you try to get the full report on the now "-1" node, you get an NPE: {noformat} >> bin/yarn node -status 192.168.1.69:-1 2016-11-04 08:57:57,385 INFO client.RMProxy: Connecting to ResourceManager at 0.0.0.0/0.0.0.0:8032 Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.yarn.client.cli.NodeCLI.printNodeStatus(NodeCLI.java:296) at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:63) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5836) NMToken passwd not checked in ContainerManagerImpl, so that malicious AM can fake the Token and kill containers of other apps at will
Botong Huang created YARN-5836: -- Summary: NMToken passwd not checked in ContainerManagerImpl, so that malicious AM can fake the Token and kill containers of other apps at will Key: YARN-5836 URL: https://issues.apache.org/jira/browse/YARN-5836 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Botong Huang Assignee: Botong Huang Priority: Minor When AM calls NM via stopContainers in ContainerManagementProtocol, the NMToken (generated by RM) is passed along via the user ugi. However currently ContainerManagerImpl is not validating this token correctly, specifically in authorizeGetAndStopContainerRequest in ContainerManagerImpl. Basically it blindly trusts the content in the NMTokenIdentifier without verifying the password (RM generated signature) in the NMToken, so that malicious AM can just fake the content in the NMTokenIdentifier and pass it to NMs. Moreover, currently even for plain text checking, when the appId doesn’t match, all it does is log it as a warning and continues to kill the container… For startContainers the NMToken is not checked correctly in authorizeUser as well, however the ContainerToken is verified properly by regenerating and comparing the password in verifyAndGetContainerTokenIdentifier, so that malicious AM cannot launch containers at will. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/ [Nov 3, 2016 7:02:22 AM] (naganarasimha_gr) YARN-5720. Update document for "rmadmin -replaceLabelOnNode". [Nov 3, 2016 1:54:31 PM] (jlowe) YARN-4862. Handle duplicate completed containers in RMNodeImpl. [Nov 3, 2016 5:03:43 PM] (vvasudev) YARN-5822. Log ContainerRuntime initialization error in [Nov 3, 2016 5:27:52 PM] (arp) HDFS-11097. Fix warnings for deprecated StorageReceivedDeletedBlocks [Nov 3, 2016 6:16:07 PM] (xiao) HADOOP-13787. Azure testGlobStatusThrowsExceptionForUnreadableDir fails. [Nov 3, 2016 8:09:03 PM] (xiao) HADOOP-12453. Support decoding KMS Delegation Token with its own [Nov 3, 2016 8:49:10 PM] (liuml07) HDFS-11076. Add unit test for extended Acls. Contributed by Chen Liang [Nov 3, 2016 10:31:07 PM] (wang) HADOOP-11088. Quash unnecessary safemode WARN message during NameNode [Nov 4, 2016 3:01:43 AM] (vinayakumarb) HDFS-11098. Datanode in tests cannot start in Windows after HDFS-10638 -1 overall The following subsystems voted -1: asflicense unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.ha.TestZKFailoverController hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens hadoop.yarn.server.TestMiniYarnClusterNodeUtilization hadoop.yarn.server.TestContainerManagerSecurity Timed out junit tests : org.apache.hadoop.tools.TestHadoopArchives cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-compile-javac-root.txt [168K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-checkstyle-root.txt [16M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-patch-shelldocs.txt [16K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/whitespace-eol.txt [11M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/diff-javadoc-javadoc-root.txt [2.2M] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [124K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [56K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt [316K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-nativetask.txt [124K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-unit-hadoop-tools_hadoop-archives.txt [8.0K] asflicense: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/215/artifact/out/patch-asflicense-problems.txt [4.0K] Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org