[jira] [Assigned] (YARN-9989) Typo in CapacityScheduler documentation: Runtime Configuration
[ https://issues.apache.org/jira/browse/YARN-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9989: -- Assignee: kevin su > Typo in CapacityScheduler documentation: Runtime Configuration > --- > > Key: YARN-9989 > URL: https://issues.apache.org/jira/browse/YARN-9989 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > > {quote} > Administrators can add additional queues at runtime, but queues cannot be > deleted at runtime unless the queue is STOPPED and *nhas* no pending/running > apps. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10017) Hardcoded in ZKClient class
[ https://issues.apache.org/jira/browse/YARN-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-10017: --- Assignee: kevin su > Hardcoded in ZKClient class > --- > > Key: YARN-10017 > URL: https://issues.apache.org/jira/browse/YARN-10017 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.1.3 >Reporter: bianqi >Assignee: kevin su >Priority: Major > > Hardcoded in ZKClient class、 > The second parameter of the constructor call of the ZooKeeper class is > hard-coded,Please fix! > This is the GitHub address > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/lib/ZKClient.java#L46|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/lib/ZKClient.java#L46] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9966) Code duplication in UserGroupMappingPlacementRule
[ https://issues.apache.org/jira/browse/YARN-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981341#comment-16981341 ] kevin su commented on YARN-9966: Thanks [~aajisaka] for the review and commit > Code duplication in UserGroupMappingPlacementRule > - > > Key: YARN-9966 > URL: https://issues.apache.org/jira/browse/YARN-9966 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0, 3.2.2, 3.1.4 > > > The methods > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule#validateParentQueue > and > org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils#validateQueueMappingUnderParentQueue > are exactly the same. > In these 2 classes, we also have a duplicate method named "extractQueuePath". > We need to extract these to a common method and delete one of these dupes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9966) Code duplication in UserGroupMappingPlacementRule
[ https://issues.apache.org/jira/browse/YARN-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973798#comment-16973798 ] kevin su commented on YARN-9966: [~snemeth] Could you help me review the patch > Code duplication in UserGroupMappingPlacementRule > - > > Key: YARN-9966 > URL: https://issues.apache.org/jira/browse/YARN-9966 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > > The methods > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule#validateParentQueue > and > org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils#validateQueueMappingUnderParentQueue > are exactly the same. > In these 2 classes, we also have a duplicate method named "extractQueuePath". > We need to extract these to a common method and delete one of these dupes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9963) Add getIpAndHost to RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9963: -- Assignee: kevin su > Add getIpAndHost to RuncContainerRuntime > > > Key: YARN-9963 > URL: https://issues.apache.org/jira/browse/YARN-9963 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: kevin su >Priority: Major > > {{RuncContainerRuntime}} does not currently implement this logic, but > {{DockerLinuxContainerRuntime}} does. > See YARN-5430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9966) Code duplication in UserGroupMappingPlacementRule
[ https://issues.apache.org/jira/browse/YARN-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9966: -- Assignee: kevin su > Code duplication in UserGroupMappingPlacementRule > - > > Key: YARN-9966 > URL: https://issues.apache.org/jira/browse/YARN-9966 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > > The methods > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule#validateParentQueue > and > org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils#validateQueueMappingUnderParentQueue > are exactly the same. > In these 2 classes, we also have a duplicate method named "extractQueuePath". > We need to extract these to a common method and delete one of these dupes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other
[ https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970823#comment-16970823 ] kevin su commented on YARN-9677: Thanks [~snemeth] for the commit > Make FpgaDevice and GpuDevice classes more similar to each other > > > Key: YARN-9677 > URL: https://issues.apache.org/jira/browse/YARN-9677 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9677.001.patch > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice > is an inner class of FpgaResourceAllocator. > It is not only being used from its parent class but from other classes as > well so we are losing the purpose of the inner class, it does not really make > sense. > We also have > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice > which is a similar class, but for GPU devices. > What we could do here is to make FpgaDevice a single class and harmonize the > packages of these 2 classes, meaning they should be "closer" to each other in > terms of packaging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other
[ https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967808#comment-16967808 ] kevin su commented on YARN-9677: [~pbacsko] Thanks for the review > Make FpgaDevice and GpuDevice classes more similar to each other > > > Key: YARN-9677 > URL: https://issues.apache.org/jira/browse/YARN-9677 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-9677.001.patch > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice > is an inner class of FpgaResourceAllocator. > It is not only being used from its parent class but from other classes as > well so we are losing the purpose of the inner class, it does not really make > sense. > We also have > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice > which is a similar class, but for GPU devices. > What we could do here is to make FpgaDevice a single class and harmonize the > packages of these 2 classes, meaning they should be "closer" to each other in > terms of packaging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9787) Typo in analysesErrorMsg
[ https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925092#comment-16925092 ] kevin su commented on YARN-9787: Thanks for [~surendrasingh] and [~jojochuang] for the review and commit > Typo in analysesErrorMsg > > > Key: YARN-9787 > URL: https://issues.apache.org/jira/browse/YARN-9787 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: kevin su >Priority: Trivial > Labels: newbie, noob > Fix For: 3.3.0 > > Attachments: YARN-9787.001.patch > > > {code:java} > analysis.append("Please check whether your etc/hadoop/mapred-site.xml " > + "contains the below configuration:\n"); > {code} > I think it should be {{/etc/hadoop/mapred-site.xml}} > https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9787) Typo in analysesErrorMsg
[ https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920440#comment-16920440 ] kevin su commented on YARN-9787: [~surendrasingh] Could you help me review the patch > Typo in analysesErrorMsg > > > Key: YARN-9787 > URL: https://issues.apache.org/jira/browse/YARN-9787 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: kevin su >Priority: Trivial > Labels: newbie, noob > Attachments: YARN-9787.001.patch > > > {code:java} > analysis.append("Please check whether your etc/hadoop/mapred-site.xml " > + "contains the below configuration:\n"); > {code} > I think it should be {{/etc/hadoop/mapred-site.xml}} > https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9787) Typo in analysesErrorMsg
[ https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su updated YARN-9787: --- Attachment: YARN-9787.001.patch > Typo in analysesErrorMsg > > > Key: YARN-9787 > URL: https://issues.apache.org/jira/browse/YARN-9787 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: kevin su >Priority: Trivial > Labels: newbie, noob > Attachments: YARN-9787.001.patch > > > {code:java} > analysis.append("Please check whether your etc/hadoop/mapred-site.xml " > + "contains the below configuration:\n"); > {code} > I think it should be {{/etc/hadoop/mapred-site.xml}} > https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9787) Typo in analysesErrorMsg
[ https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9787: -- Assignee: kevin su > Typo in analysesErrorMsg > > > Key: YARN-9787 > URL: https://issues.apache.org/jira/browse/YARN-9787 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: kevin su >Priority: Trivial > Labels: newbie, noob > > {code:java} > analysis.append("Please check whether your etc/hadoop/mapred-site.xml " > + "contains the below configuration:\n"); > {code} > I think it should be {{/etc/hadoop/mapred-site.xml}} > https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other
[ https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su updated YARN-9677: --- Attachment: YARN-9677.001.patch > Make FpgaDevice and GpuDevice classes more similar to each other > > > Key: YARN-9677 > URL: https://issues.apache.org/jira/browse/YARN-9677 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-9677.001.patch > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice > is an inner class of FpgaResourceAllocator. > It is not only being used from its parent class but from other classes as > well so we are losing the purpose of the inner class, it does not really make > sense. > We also have > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice > which is a similar class, but for GPU devices. > What we could do here is to make FpgaDevice a single class and harmonize the > packages of these 2 classes, meaning they should be "closer" to each other in > terms of packaging. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9683) Remove reapDockerContainerNoPid left behind by YARN-9074
[ https://issues.apache.org/jira/browse/YARN-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907467#comment-16907467 ] kevin su edited comment on YARN-9683 at 8/14/19 5:20 PM: - [~eyang] [~adam.antal] Thanks for the review, but it looks like the patch has not been commit yet was (Author: pingsutw): [~eyang] [~adam.antal] Thanks for the review, but it looks like the patch didn't commit yet > Remove reapDockerContainerNoPid left behind by YARN-9074 > > > Key: YARN-9683 > URL: https://issues.apache.org/jira/browse/YARN-9683 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: kevin su >Priority: Trivial > Labels: newbie > Fix For: 3.3.0 > > > YARN-9074 has touched the ContainerCleanup.java but created a separate > function instead of using reapDockerContainerNoPid in ContainerCleanup.java. > Having no usages, that private function can be safely removed. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9683) Remove reapDockerContainerNoPid left behind by YARN-9074
[ https://issues.apache.org/jira/browse/YARN-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907467#comment-16907467 ] kevin su commented on YARN-9683: [~eyang] [~adam.antal] Thanks for the review, but it looks like the patch didn't commit yet > Remove reapDockerContainerNoPid left behind by YARN-9074 > > > Key: YARN-9683 > URL: https://issues.apache.org/jira/browse/YARN-9683 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: kevin su >Priority: Trivial > Labels: newbie > Fix For: 3.3.0 > > > YARN-9074 has touched the ContainerCleanup.java but created a separate > function instead of using reapDockerContainerNoPid in ContainerCleanup.java. > Having no usages, that private function can be safely removed. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8585) Add test class for DefaultAMSProcessor
[ https://issues.apache.org/jira/browse/YARN-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-8585: -- Assignee: kevin su > Add test class for DefaultAMSProcessor > -- > > Key: YARN-8585 > URL: https://issues.apache.org/jira/browse/YARN-8585 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie > > Since this class has no test coverage at all, it seems to be a good idea to > test it. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9354) TestUtils#createResource calls should be replaced with ResourceTypesTestHelper#newResource
[ https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904383#comment-16904383 ] kevin su commented on YARN-9354: [~sahuja] is this issue still in progress ? If not, I could help you do it. > TestUtils#createResource calls should be replaced with > ResourceTypesTestHelper#newResource > -- > > Key: YARN-9354 > URL: https://issues.apache.org/jira/browse/YARN-9354 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Siddharth Ahuja >Priority: Trivial > Labels: newbie, newbie++ > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource > has not identical, but very similar implementation to > org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. > Since these 2 methods are doing the same essentially and > ResourceTypesTestHelper is newer and used more, TestUtils#createResource > should be replaced with ResourceTypesTestHelper#newResource with all > occurrence. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9683) Remove reapDockerContainerNoPid left behind by YARN-9074
[ https://issues.apache.org/jira/browse/YARN-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9683: -- Assignee: kevin su > Remove reapDockerContainerNoPid left behind by YARN-9074 > > > Key: YARN-9683 > URL: https://issues.apache.org/jira/browse/YARN-9683 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: kevin su >Priority: Trivial > Labels: newbie > > YARN-9074 has touched the ContainerCleanup.java but created a separate > function instead of using reapDockerContainerNoPid in ContainerCleanup.java. > Having no usages, that private function can be safely removed. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901554#comment-16901554 ] kevin su commented on YARN-9678: [~jojochuang] Thank you so much > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9410) Typo in documentation: Using FPGA On YARN
[ https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9410: -- Assignee: kevin su > Typo in documentation: Using FPGA On YARN > -- > > Key: YARN-9410 > URL: https://issues.apache.org/jira/browse/YARN-9410 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > > fpag.major-device-number should be changed to fpga... -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9677) Make FpgaDevice and GpuDevice classes more similar to each other
[ https://issues.apache.org/jira/browse/YARN-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kevin su reassigned YARN-9677: -- Assignee: kevin su > Make FpgaDevice and GpuDevice classes more similar to each other > > > Key: YARN-9677 > URL: https://issues.apache.org/jira/browse/YARN-9677 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceAllocator.FpgaDevice > is an inner class of FpgaResourceAllocator. > It is not only being used from its parent class but from other classes as > well so we are losing the purpose of the inner class, it does not really make > sense. > We also have > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDevice > which is a similar class, but for GPU devices. > What we could do here is to make FpgaDevice a single class and harmonize the > packages of these 2 classes, meaning they should be "closer" to each other in > terms of packaging. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898726#comment-16898726 ] kevin su commented on YARN-9678: [~snemeth] Thank you so much > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898375#comment-16898375 ] kevin su commented on YARN-9678: [~snemeth] I would like to do it, could you assign this to me > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2663) Race condintion in shared cache CleanerTask during deletion of resource
[ https://issues.apache.org/jira/browse/YARN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878395#comment-16878395 ] kevin su commented on YARN-2663: *removeResourceFromCacheFileSystem* already rename the file after rm scm cache,so it should support atomic already Do we also need lock for these ? If yes, I can do it {code:java} private boolean removeResourceFromCacheFileSystem(Path path) throws IOException { // rename the directory to make the delete atomic Path renamedPath = new Path(path.toString() + RENAMED_SUFFIX); if (fs.rename(path, renamedPath)) { // the directory can be removed safely now // log the original path LOG.info("Deleting " + path.toString()); return fs.delete(renamedPath, true); } {code} > Race condintion in shared cache CleanerTask during deletion of resource > --- > > Key: YARN-2663 > URL: https://issues.apache.org/jira/browse/YARN-2663 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Priority: Blocker > > In CleanerTask, store.removeResource(key) and > removeResourceFromCacheFileSystem(path) do not happen together in atomic > fashion. > Since resources could be uploaded with different file names, the SCM could > receive a notification to add a resource to the SCM between the two > operations. Thus, we have a scenario where the cleaner service deletes the > entry from the scm, receives a notification from the uploader (adding the > entry back into the scm) and then deletes the file from HDFS. > Cleaner code that deletes resource: > {code} > if (store.isResourceEvictable(key, resource)) { > try { > /* >* TODO: There is a race condition between store.removeResource(key) >* and removeResourceFromCacheFileSystem(path) operations because > they >* do not happen atomically and resources can be uploaded with >* different file names by the node managers. >*/ > // remove the resource from scm (checks for appIds as well) > if (store.removeResource(key)) { > // remove the resource from the file system > boolean deleted = removeResourceFromCacheFileSystem(path); > if (deleted) { > resourceStatus = ResourceStatus.DELETED; > } else { > LOG.error("Failed to remove path from the file system." > + " Skipping this resource: " + path); > resourceStatus = ResourceStatus.ERROR; > } > } else { > // we did not delete the resource because it contained application > // ids > resourceStatus = ResourceStatus.PROCESSED; > } > } catch (IOException e) { > LOG.error( > "Failed to remove path from the file system. Skipping this > resource: " > + path, e); > resourceStatus = ResourceStatus.ERROR; > } > } else { > resourceStatus = ResourceStatus.PROCESSED; > } > {code} > Uploader code that uploads resource: > {code} > // create the temporary file > tempPath = new Path(directoryPath, getTemporaryFileName(actualPath)); > if (!uploadFile(actualPath, tempPath)) { > LOG.warn("Could not copy the file to the shared cache at " + > tempPath); > return false; > } > // set the permission so that it is readable but not writable > // TODO should I create the file with the right permission so I save the > // permission call? > fs.setPermission(tempPath, FILE_PERMISSION); > // rename it to the final filename > Path finalPath = new Path(directoryPath, actualPath.getName()); > if (!fs.rename(tempPath, finalPath)) { > LOG.warn("The file already exists under " + finalPath + > ". Ignoring this attempt."); > deleteTempFile(tempPath); > return false; > } > // notify the SCM > if (!notifySharedCacheManager(checksumVal, actualPath.getName())) { > // the shared cache manager rejected the upload (as it is likely > // uploaded under a different name > // clean up this file and exit > fs.delete(finalPath, false); > return false; > } > {code} > One solution is to have the UploaderService always rename the resource file > to the checksum of the resource plus the extension. With this fix we will > never receive a notify for the resource before the delete from the FS has > happened because the rename on the node manager will fail. If the node > manager uploads the file after it is deleted from the FS, we are ok and the > resource will simply get added back to the scm once a notification is >
[jira] [Commented] (YARN-9410) Typo in documentation: Using FPGA On YARN
[ https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826043#comment-16826043 ] kevin su commented on YARN-9410: Could I do this issue? I don't have permission to assign to myself > Typo in documentation: Using FPGA On YARN > -- > > Key: YARN-9410 > URL: https://issues.apache.org/jira/browse/YARN-9410 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > fpag.major-device-number should be changed to fpga... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org