[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
[ https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-457: - Attachment: YARN-457-2.patch Sorry. I changed to call initLocalNewNodeReportList() before clearing this.updatedNodes. Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl Key: YARN-457 URL: https://issues.apache.org/jira/browse/YARN-457 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Kenji Kikushima Priority: Minor Labels: Newbie Attachments: YARN-457-2.patch, YARN-457.patch {code} if (updatedNodes == null) { this.updatedNodes.clear(); return; } {code} If updatedNodes is already null, a NullPointerException is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620705#comment-13620705 ] Sandy Ryza commented on YARN-392: - Ok, I will work on a patch for the non-blacklist proposal. To clarify, should location-specific requests be able to coexist with non-location-specific requests at the same priority? Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620711#comment-13620711 ] Knut O. Hellan commented on YARN-527: - There is really no difference in how the directories are created. What probably happened under the hood was that the file system reached maximum number of files in the filecache directory. This maximum size is 32000 since we use EXT3. I don't have the exact numbers for any of the disks from my checks, but i remember seeing above 30k some places. The reason we were able to manually create directories might be that there was some automatic cleanup happening. Does YARN clean the file cache? Local filecache mkdir fails --- Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Attachments: yarn-site.xml Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-425: -- Attachment: YARN-425-branch-2-b.patch coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-425: -- Attachment: YARN-425-trunk-b.patch coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620843#comment-13620843 ] Aleksey Gorshkov commented on YARN-425: --- update patch for trunk :YARN-425-trunk-b.patch and for branch-2 YARN-425-branch-2-b.patch coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-trunk-a.patch YARN-427-branch-2-a.patch Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620845#comment-13620845 ] Aleksey Gorshkov commented on YARN-427: --- patch was update patch YARN-427-trunk-a.patch for trunk patch YARN-427-branch-2-a.patch for branch-2 and branch-0.23 Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-465: -- Attachment: YARN-465-branch-2-a.patch YARN-465-branch-0.23-a.patch fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-465: -- Attachment: YARN-465-trunk-a.patch fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620914#comment-13620914 ] Aleksey Gorshkov commented on YARN-465: --- patches updated patch YARN-465-trunk-a.patch for trunk patch YARN-465-branch-2-a.patch for branch-2 patch YARN-465-branch-0.23-a.patch for branch-0.23 fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620913#comment-13620913 ] Steve Loughran commented on YARN-117: - I'm not seeing all those tests failing locally, only {{TestUnmanagedAMLauncher}} and {{TestNMExpiry}}. {code} testNMExpiry(org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry) Time elapsed: 2797 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:0 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:283) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry.testNMExpiry(TestNMExpiry.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) {code} and {code} Running org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.579 sec FAILURE! org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 579 sec ERROR! org.apache.hadoop.yarn.YarnException: could not cleanup test dir: java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.yarn.server.MiniYARNCluster.init(MiniYARNCluster.java:95) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields
[jira] [Created] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
Steve Loughran created YARN-535: --- Summary: TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620974#comment-13620974 ] Steve Loughran commented on YARN-535: - {code} org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 4137 sec ERROR! java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1816) at org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:465) at org.apache.hadoop.conf.Configuration.asXmlDocument(Configuration.java:2127) at org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2096) at org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2086) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: org.xml.sax.SAXParseException: Premature end of file. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:246) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:153) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1887) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1875) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1946) ... 29 more {code} This stack trace is a failure to read the file yarn-site.xml, which is actually being written on line 63 of TestUnmanagedAMLauncher -a file that is already open for writing. It is possible that some filesystems (here, HFS+) make that write visible while it is still going on, triggering a failure which then corrupts later builds at init time {code} $ ls -l target/test-classes/yarn-site.xml -rw-r--r-- 1 stevel staff 0 3 Apr 15:37 target/test-classes/yarn-site.xml {code} This is newer than the one in test/properties, so Maven doesn't fix it next test run {code} $ ls -l src/test/resources/yarn-site.xml -rw-r--r--@ 1 stevel staff 830 28 Nov 16:29 src/test/resources/yarn-site.xml {code} as a result, follow on tests fail when MiniYARNCluster tries to read it. {code} org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 515 sec ERROR! java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899) at
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621023#comment-13621023 ] Robert Joseph Evans commented on YARN-528: -- OK, I understand now. I will try to find some time to play around with getting the AM ID to not have a wrapper at all. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621042#comment-13621042 ] Vinod Kumar Vavilapalli commented on YARN-527: -- If it is the 32K limit that caused it, the timing can't be more perfect. I just committed YARN-467 which addresses it for public cache, and YARN-99 is in progress which takes care of private cache. These two JIRAs enforce a limit in YARN itself, default is 8192. Looking back again at your stack trace, I agree that it is very likely you are hitting the 32K limit. Can I close this as a duplicate of YARN-467? You can verify the fix on 2.0.5-beta when it is out. Local filecache mkdir fails --- Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Attachments: yarn-site.xml Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-430) Add HDFS based store for RM which manages the store using directories
[ https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-430: - Summary: Add HDFS based store for RM which manages the store using directories (was: Add HDFS based store for RM) Add HDFS based store for RM which manages the store using directories - Key: YARN-430 URL: https://issues.apache.org/jira/browse/YARN-430 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He There is a generic FileSystem store but it does not take advantage of HDFS features like directories, replication, DFSClient advanced settings for HA, retries etc. Writing a store thats optimized for HDFS would be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-248) Security related work for RM restart
[ https://issues.apache.org/jira/browse/YARN-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-248: - Summary: Security related work for RM restart (was: Restore RMDelegationTokenSecretManager state on restart) Security related work for RM restart Key: YARN-248 URL: https://issues.apache.org/jira/browse/YARN-248 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tom White Assignee: Bikas Saha On restart, the RM creates a new RMDelegationTokenSecretManager with fresh state. This will cause problems for Oozie jobs running on secure clusters since the delegation tokens stored in the job credentials (used by the Oozie launcher job to submit a job to the RM) will not be recognized by the RM, and recovery will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621114#comment-13621114 ] Sandy Ryza commented on YARN-458: - Verified on a pseudo-distributed cluster that both the old and new configs work. YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.13.patch Fix the twice setting bug and change default max vcores to 4. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621128#comment-13621128 ] Omkar Vinit Joshi commented on YARN-99: --- Rebasing the patch as 467 is now committed. This issue is related to 467 and the detailed information can be found here [underlying problem and proposed/implemented Solution | https://issues.apache.org/jira/browse/YARN-467?focusedCommentId=13615894page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13615894] The only difference here is that the same problem is present in local-dir/usercache/user-name/filecache (Private user cache). We are using LocalCacheDirectoryManager for user-cache but not for app-cache as it is highly unlikely for application to have so many localized files. Earlier implementation for private cache involved computing localized path inside ContainerLocalizer; i.e. in different processes. Now in order to centralize this we have moved it to ResourceLocalizationService.LocalizerRunner and this is communicated to all the ContainerLocalizer as a part of the heartbeat. Thereby we can now manage LocalCacheDirectory at one place. Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Attachments: yarn-99-20130324.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-514: Assignee: Zhijie Shen (was: Bikas Saha) Delayed store operations should not result in RM unavailability for app submission -- Key: YARN-514 URL: https://issues.apache.org/jira/browse/YARN-514 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Zhijie Shen Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
Xuan Gong created YARN-536: -- Summary: Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-536: -- Assignee: Xuan Gong Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-404) Node Manager leaks Data Node connections
[ https://issues.apache.org/jira/browse/YARN-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-404: - Priority: Major (was: Blocker) Moving it off blocker status. Devaraj, can you give us more information. Is this still happening? Tx. Node Manager leaks Data Node connections Key: YARN-404 URL: https://issues.apache.org/jira/browse/YARN-404 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha, 0.23.6 Reporter: Devaraj K Assignee: Devaraj K RM is missing to give some applications to NM for clean up, due to this log aggregation is not happening for those applications and also it is leaking data node connections in NM side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621180#comment-13621180 ] Xuan Gong commented on YARN-536: Remove getter and setter for ContainerState, ContainerStatus from container interface, remove those contents from proto file. There are some test code which used the getter and setter to get containerState or containerStatus from container object. /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621201#comment-13621201 ] Hudson commented on YARN-101: - Integrated in Hadoop-trunk-Commit #3554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3554/]) YARN-101. Fix NodeManager heartbeat processing to not lose track of completed containers in case of dropped heartbeats. Contributed by Xuan Gong. (Revision 1464105) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1464105 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Fix For: 2.0.5-beta Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0;
[jira] [Commented] (YARN-381) Improve FS docs
[ https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621203#comment-13621203 ] Hudson commented on YARN-381: - Integrated in Hadoop-trunk-Commit #3554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3554/]) YARN-381. Improve fair scheduler docs. Contributed by Sandy Ryza. (Revision 1464130) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1464130 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Improve FS docs --- Key: YARN-381 URL: https://issues.apache.org/jira/browse/YARN-381 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Sandy Ryza Priority: Minor Fix For: 2.0.5-beta Attachments: YARN-381.patch The MR2 FS docs could use some improvements. Configuration: - sizebasedweight - what is the size here? Total memory usage? Pool properties: - minResources - what does min amount of aggregate memory mean given that this is not a reservation? - maxResources - is this a hard limit? - weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? There's no mention of ACLs, even though they're supported. See the CS docs for comparison. Also there are a couple typos worth fixing while we're at it, eg finish. apps to run Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621204#comment-13621204 ] Hadoop QA commented on YARN-427: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576767/YARN-427-trunk-a.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/660//console This message is automatically generated. Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621215#comment-13621215 ] Hadoop QA commented on YARN-465: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576782/YARN-465-trunk-a.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/659//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/659//console This message is automatically generated. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621218#comment-13621218 ] Hadoop QA commented on YARN-425: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576764/YARN-425-trunk-b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/661//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/661//console This message is automatically generated. coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621238#comment-13621238 ] Hadoop QA commented on YARN-193: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576820/YARN-193.13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/662//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/662//console This message is automatically generated. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621243#comment-13621243 ] Hadoop QA commented on YARN-99: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576823/yarn-99-20130403.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/663//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/663//console This message is automatically generated. Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621255#comment-13621255 ] Eli Collins commented on YARN-516: -- I reverted this change (and the initial HADOOP-9357 patch). We'll put this fix back in the HADOOP-9357 patch if we do another rev. TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: YARN-516.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-99: -- Attachment: yarn-99-20130403.1.patch Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, yarn-99-20130403.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-536: --- Attachment: YARN-536.1.patch Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-536.1.patch Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-537) Waiting containers are not informed if private localization for a resource fails.
Omkar Vinit Joshi created YARN-537: -- Summary: Waiting containers are not informed if private localization for a resource fails. Key: YARN-537 URL: https://issues.apache.org/jira/browse/YARN-537 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi In ResourceLocalizationService.LocalizerRunner.update() if localization fails then all the other waiting containers are not informed only the initiator is informed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-538) RM address DNS lookup can cause unnecessary slowness on every JHS page load
[ https://issues.apache.org/jira/browse/YARN-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur moved MAPREDUCE-5111 to YARN-538: Component/s: (was: jobhistoryserver) Affects Version/s: (was: 2.0.3-alpha) 2.0.3-alpha Key: YARN-538 (was: MAPREDUCE-5111) Project: Hadoop YARN (was: Hadoop Map/Reduce) RM address DNS lookup can cause unnecessary slowness on every JHS page load Key: YARN-538 URL: https://issues.apache.org/jira/browse/YARN-538 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5111.patch When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically. There's no that we need to perform this resolution on every page load. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-539) Memory leak in case resource localization fails. LocalizedResource remains in memory.
Omkar Vinit Joshi created YARN-539: -- Summary: Memory leak in case resource localization fails. LocalizedResource remains in memory. Key: YARN-539 URL: https://issues.apache.org/jira/browse/YARN-539 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi If resource localization fails then resource remains in memory and is 1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). 2) reused if LocalizationRequest comes again for the same resource. I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621319#comment-13621319 ] Alejandro Abdelnur commented on YARN-458: - +1. Do we need to do this for HS as well? If so please open a new JIRA. YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621324#comment-13621324 ] Hudson commented on YARN-516: - Integrated in Hadoop-trunk-Commit #3555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3555/]) Revert YARN-516 per HADOOP-9357. (Revision 1464181) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1464181 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Andrew Wang Fix For: 2.0.5-beta Attachments: YARN-516.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621338#comment-13621338 ] Hadoop QA commented on YARN-412: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576838/YARN-412.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/665//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/665//console This message is automatically generated. FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.14.patch Fixed the buggy test TestResourceManager#testResourceManagerInitConfigValidation Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, YARN-193.13.patch, YARN-193.14.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-537) Waiting containers are not informed if private localization for a resource fails.
[ https://issues.apache.org/jira/browse/YARN-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621381#comment-13621381 ] Vinod Kumar Vavilapalli commented on YARN-537: -- Yup, I put in a comment long (long) time back asking why it isn't getting informed through the LocalizedResource which knows about all the waiting containers. I think we should do that. Waiting containers are not informed if private localization for a resource fails. - Key: YARN-537 URL: https://issues.apache.org/jira/browse/YARN-537 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi In ResourceLocalizationService.LocalizerRunner.update() if localization fails then all the other waiting containers are not informed only the initiator is informed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails
[ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-539: - Summary: LocalizedResources are leaked in memory in case resource localization fails (was: Memory leak in case resource localization fails. LocalizedResource remains in memory.) LocalizedResources are leaked in memory in case resource localization fails --- Key: YARN-539 URL: https://issues.apache.org/jira/browse/YARN-539 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi If resource localization fails then resource remains in memory and is 1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). 2) reused if LocalizationRequest comes again for the same resource. I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621385#comment-13621385 ] Vinod Kumar Vavilapalli commented on YARN-458: -- +1 for the patch after the fact. Thanks for doing this Sandy. YARN daemon addresses must be placed in many different configs -- Key: YARN-458 URL: https://issues.apache.org/jira/browse/YARN-458 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.5-beta Attachments: YARN-458.patch The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621390#comment-13621390 ] Hadoop QA commented on YARN-536: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576856/YARN-536.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/667//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/667//console This message is automatically generated. Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-536.1.patch, YARN-536.2.patch Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621573#comment-13621573 ] Chris Nauroth commented on YARN-535: {{TestDistributedShell#setup}} has nearly identical code to overwrite yarn-site.xml. TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621709#comment-13621709 ] Hitesh Shah commented on YARN-412: -- @Roger, for future reference ( may not be applicable to this jira ), it is good to leave earlier patch attachments lying around and not delete them when uploading newer patches. This can be used to trace review comments/feedback etc. As for hadoop-common, mvn eclipse:eclipse, it can be ignored for now. It is a known issue with an open jira that has not been addressed yet. FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621720#comment-13621720 ] Vinod Kumar Vavilapalli commented on YARN-536: -- +1, this looks good. Checking it in. Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-536.1.patch, YARN-536.2.patch Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621733#comment-13621733 ] Hudson commented on YARN-536: - Integrated in Hadoop-trunk-Commit #3560 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3560/]) YARN-536. Removed the unused objects ContainerStatus and ContainerStatus from Container which also don't belong to the container. Contributed by Xuan Gong. (Revision 1464271) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1464271 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Container.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object -- Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.0.5-beta Attachments: YARN-536.1.patch, YARN-536.2.patch Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621735#comment-13621735 ] Hadoop QA commented on YARN-412: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576914/YARN-412.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/669//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/669//console This message is automatically generated. FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira