[jira] [Assigned] (YARN-11692) Support mixed cgroup v1/v2 controller structure
[ https://issues.apache.org/jira/browse/YARN-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-11692: -- Assignee: Peter Szucs > Support mixed cgroup v1/v2 controller structure > --- > > Key: YARN-11692 > URL: https://issues.apache.org/jira/browse/YARN-11692 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > > There were heavy changes on the device side in cgroup v2. To keep supporting > FGPAs and GPUs short term, mixed structures where some of the cgroup > controllers are from v1 while others from v2 should be supported. More info: > https://dropbear.xyz/2023/05/23/devices-with-cgroup-v2/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11675) Update MemoryResourceHandler implementation for cgroup v2 support
[ https://issues.apache.org/jira/browse/YARN-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11675: --- Description: cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about updating MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. h3. *Differences in the controls comparing to cgroup v1:* ** h3. Hard limit on memory {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ h3. Soft limit on memory {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ Detailed descriptions about the memory controls can be found in the official [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. h3. Swappiness _memory.swappiness_ has been removed from the available cgroup v2 controls. Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: {quote}Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. {quote} Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] case study we found that most of the time swappiness didn't work as expected as it mostly depends on the I/O balance of the system, so it is no longer available in cgroup v2. was: cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about updating MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. Differences in the controls comparing to cgroup v1: h3. *Hard limit on memory* {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ h3. *Soft limit on memory* {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ Detailed descriptions about the memory controls can be found in the official [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. h3. *_Swappiness_* _memory.swappiness_ has been removed from the available cgroup v2 controls. Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: {quote}Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. {quote} Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] case study we found that most of the time swappiness didn't work as expected as it mostly depends on the I/O balance of the system, so it is no longer available in cgroup v2. > Update MemoryResourceHandler implementation for cgroup v2 support > - > > Key: YARN-11675 > URL: https://issues.apache.org/jira/browse/YARN-11675 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > cgroup v2 has some changes in various controllers (some changed their > functionality, some were removed). This task is about updating > MemoryResourceHandler's > [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. > h3. *Differences in the controls comparing to cgroup v1:* > ** > h3. Hard limit on memory > {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ > h3. Soft limit on memory > {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ > Detailed descriptions about the memory controls can be found in the official > [cgroup v2
[jira] [Updated] (YARN-11675) Update MemoryResourceHandler implementation for cgroup v2 support
[ https://issues.apache.org/jira/browse/YARN-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11675: --- Description: cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about updating MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. h3. *Differences in the controls comparing to cgroup v1:* h3. Hard limit on memory {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ h3. Soft limit on memory {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ Detailed descriptions about the memory controls can be found in the official [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. h3. Swappiness _memory.swappiness_ has been removed from the available cgroup v2 controls. Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: {quote}Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. {quote} Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] case study we found that most of the time swappiness didn't work as expected as it mostly depends on the I/O balance of the system, so it is no longer available in cgroup v2. was: cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about updating MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. h3. *Differences in the controls comparing to cgroup v1:* ** h3. Hard limit on memory {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ h3. Soft limit on memory {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ Detailed descriptions about the memory controls can be found in the official [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. h3. Swappiness _memory.swappiness_ has been removed from the available cgroup v2 controls. Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: {quote}Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. {quote} Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] case study we found that most of the time swappiness didn't work as expected as it mostly depends on the I/O balance of the system, so it is no longer available in cgroup v2. > Update MemoryResourceHandler implementation for cgroup v2 support > - > > Key: YARN-11675 > URL: https://issues.apache.org/jira/browse/YARN-11675 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > cgroup v2 has some changes in various controllers (some changed their > functionality, some were removed). This task is about updating > MemoryResourceHandler's > [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. > h3. *Differences in the controls comparing to cgroup v1:* > h3. Hard limit on memory > {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ > h3. Soft limit on memory > {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ > Detailed descriptions about the memory controls can be found in the official > [cgroup v2
[jira] [Updated] (YARN-11675) Update MemoryResourceHandler implementation for cgroup v2 support
[ https://issues.apache.org/jira/browse/YARN-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11675: --- Description: cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about updating MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. Differences in the controls comparing to cgroup v1: h3. *Hard limit on memory* {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ h3. *Soft limit on memory* {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ Detailed descriptions about the memory controls can be found in the official [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. h3. *_Swappiness_* _memory.swappiness_ has been removed from the available cgroup v2 controls. Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: {quote}Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100, inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. {quote} Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] case study we found that most of the time swappiness didn't work as expected as it mostly depends on the I/O balance of the system, so it is no longer available in cgroup v2. was:cgroup v2 has some changes in various controllers (some changed their functionality, some were removed). This task is about checking if MemoryResourceHandler's [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46] need any updates. > Update MemoryResourceHandler implementation for cgroup v2 support > - > > Key: YARN-11675 > URL: https://issues.apache.org/jira/browse/YARN-11675 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > cgroup v2 has some changes in various controllers (some changed their > functionality, some were removed). This task is about updating > MemoryResourceHandler's > [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46]. > Differences in the controls comparing to cgroup v1: > h3. *Hard limit on memory* > {_}memory{_}.{_}limit_in_bytes{_} control is replaced with _memory.max_ > h3. *Soft limit on memory* > {_}memory{_}.soft_{_}limit_in_bytes{_} control is replaced with _memory.low_ > Detailed descriptions about the memory controls can be found in the official > [cgroup v2 documentation|https://docs.kernel.org/admin-guide/cgroup-v2.html]. > h3. *_Swappiness_* > _memory.swappiness_ has been removed from the available cgroup v2 controls. > Quoting [redhat documentation|https://access.redhat.com/solutions/103833]: > {quote}Swappiness is a property for the Linux kernel that changes the balance > between swapping out runtime memory, as opposed to dropping pages from the > system page cache. Swappiness can be set to values between 0 and 100, > inclusive. A low value means the kernel will try to avoid swapping as much as > possible where a higher value instead will make the kernel aggressively try > to use swap space. > {quote} > Referring [this|https://github.com/opencontainers/runtime-spec/issues/1005] > case study we found that most of the time swappiness didn't work as expected > as it mostly depends on the I/O balance of the system, so it is no longer > available in cgroup v2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11685) Create a config to enable/disable cgroup v2 functionality
[ https://issues.apache.org/jira/browse/YARN-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-11685: -- Assignee: Peter Szucs > Create a config to enable/disable cgroup v2 functionality > - > > Key: YARN-11685 > URL: https://issues.apache.org/jira/browse/YARN-11685 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > > Various OS's mount the cgroup v2 differently, some of them mount both the v1 > and v2 structure, others mount a hybrid structure. To avoid initialization > issues the cgroup v1/v2 functionality should be set by a config property. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11675) Update MemoryResourceHandler implementation for cgroup v2 support
[ https://issues.apache.org/jira/browse/YARN-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-11675: -- Assignee: Peter Szucs > Update MemoryResourceHandler implementation for cgroup v2 support > - > > Key: YARN-11675 > URL: https://issues.apache.org/jira/browse/YARN-11675 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > > cgroup v2 has some changes in various controllers (some changed their > functionality, some were removed). This task is about checking if > MemoryResourceHandler's > [implementation|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java#L47-L46] > need any updates. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III
[ https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-5305: - Assignee: Peter Szucs > Yarn Application Log Aggregation fails due to NM can not get correct HDFS > delegation token III > -- > > Key: YARN-5305 > URL: https://issues.apache.org/jira/browse/YARN-5305 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Xianyin Xin >Assignee: Peter Szucs >Priority: Major > > Different with YARN-5098 and YARN-5302, this problem happens when AM submits > a startContainer request with a new HDFS token (say, tokenB) which is not > managed by YARN, so two tokens exist in the credentials of the user on NM, > one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is > selected when connect to HDFS and tokenB expires, exception happens. > Supplementary: this problem happen due to that AM didn't use the service name > as the token alias in credentials, so two tokens for the same service can > co-exist in one credentials. TokenSelector can only select the first matched > token, it doesn't care if the token is valid or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11630) Passing admin Java options to container localizers
Peter Szucs created YARN-11630: -- Summary: Passing admin Java options to container localizers Key: YARN-11630 URL: https://issues.apache.org/jira/browse/YARN-11630 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Peter Szucs Assignee: Peter Szucs Currently we can specify Java options for container localizers in _"yarn.nodemanager.container-localizer.java.opts"_ parameter. The aim of this ticket is to create a parameter which we can use to pass admin options as well. It would work similarly as the admin Java options we can pass for Mapreduce jobs, first we should pass the admin options to the container executor, then the user-defined ones. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11545) FS2CS not converts ACLs when all users are allowed
Peter Szucs created YARN-11545: -- Summary: FS2CS not converts ACLs when all users are allowed Key: YARN-11545 URL: https://issues.apache.org/jira/browse/YARN-11545 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Peter Szucs Assignee: Peter Szucs Currently we only convert ACLs if users or groups are set. This should be extended to check if the "allAllowed" flag is set in the AcessControlList to be able to preserve * values also for the ACLs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11542) NegativeArraySizeException when running MR jobs with large data size
[ https://issues.apache.org/jira/browse/YARN-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747879#comment-17747879 ] Peter Szucs commented on YARN-11542: Moving this to mapreduce project. > NegativeArraySizeException when running MR jobs with large data size > > > Key: YARN-11542 > URL: https://issues.apache.org/jira/browse/YARN-11542 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Peter Szucs >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > We are using bit shifting to double the byte array in IFile's > [nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437] > method to store the byte values in it. With large dataset it can easily > happen that we shift the leftmost bit when we are calculating the size of the > array, which can lead to a negative number as the array size, causing the > NegativeArraySizeException. > It would be safer to expand the backing array with a 1.5x factor, and have a > check not to extend Integer's max value during that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11542) NegativeArraySizeException when running MR jobs with large data size
[ https://issues.apache.org/jira/browse/YARN-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs resolved YARN-11542. Resolution: Abandoned > NegativeArraySizeException when running MR jobs with large data size > > > Key: YARN-11542 > URL: https://issues.apache.org/jira/browse/YARN-11542 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Peter Szucs >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > We are using bit shifting to double the byte array in IFile's > [nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437] > method to store the byte values in it. With large dataset it can easily > happen that we shift the leftmost bit when we are calculating the size of the > array, which can lead to a negative number as the array size, causing the > NegativeArraySizeException. > It would be safer to expand the backing array with a 1.5x factor, and have a > check not to extend Integer's max value during that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11542) NegativeArraySizeException when running MR jobs with large data size
Peter Szucs created YARN-11542: -- Summary: NegativeArraySizeException when running MR jobs with large data size Key: YARN-11542 URL: https://issues.apache.org/jira/browse/YARN-11542 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Peter Szucs Assignee: Peter Szucs We are using bit shifting to double the byte array in IFile's [nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437] method to store the byte values in it. With large dataset it can easily happen that we shift the leftmost bit when we are calculating the size of the array, which can lead to a negative number as the array size, causing the NegativeArraySizeException. It would be safer to expand the backing array with a 1.5x factor, and have a check not to extend Integer's max value during that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11534) Incorrect exception handling during container recovery
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Summary: Incorrect exception handling during container recovery (was: Incorrect exception handling in RecoveredContainerLaunch) > Incorrect exception handling during container recovery > -- > > Key: YARN-11534 > URL: https://issues.apache.org/jira/browse/YARN-11534 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Peter Szucs >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When NM is restarted during a container recovery, it can happen that it > interrupts the container reaquisition during the LinuxContainerExecutor's > signalContainer method. In this case we will get the following exception: > {code:java} > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) > at org.apache.hadoop.util.Shell.run(Shell.java:901) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.InterruptedException > at java.base/java.lang.Object.wait(Native Method) > at java.base/java.lang.Object.wait(Object.java:328) > at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) > ... 15 more{code} > Later this InterruptedIOException get caught and wrapped inside a > PrivilegedOperationException and a ContainerExecutionException. In > LinuxContainerExecutor's > [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] > method we catch this exception again, and throw an IOException from it, > indicating this error message in the stack trace: > {code:java} > IOException from it, causing the following stack trace: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Signal container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) > at >
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, indicating this error message in the stack trace: {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, indicating this error message in the stack trace: {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling container 256974 with NULL; output: null and exitCode: -1 at
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling container 256974 with NULL; output: null and exitCode: -1 at
[jira] [Created] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
Peter Szucs created YARN-11534: -- Summary: Incorrect exception handling in RecoveredContainerLaunch Key: YARN-11534 URL: https://issues.apache.org/jira/browse/YARN-11534 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Peter Szucs Assignee: Peter Szucs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11372) Migrate legacy AQC to flexible AQC
[ https://issues.apache.org/jira/browse/YARN-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11372: --- Description: Currently the codebase of Legacy AQC (with ManagedParentQueue/AutoCreatedLeafQueue) classes live next to the basic queue classes that are used by the flexible AQC. The scope of this task is to eliminate the former while migrating the functionality of legacy AQC. (was: Currently the codebase of Legacy AQC (with ManagedParentQueue/ManagedLeafQueue) classes live next to the basic queue classes that are used by the flexible AQC. The scope of this task is to eliminate the former while migrating the functionality of legacy AQC.) > Migrate legacy AQC to flexible AQC > -- > > Key: YARN-11372 > URL: https://issues.apache.org/jira/browse/YARN-11372 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > > Currently the codebase of Legacy AQC (with > ManagedParentQueue/AutoCreatedLeafQueue) classes live next to the basic queue > classes that are used by the flexible AQC. The scope of this task is to > eliminate the former while migrating the functionality of legacy AQC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10921) AbstractCSQueue: Node Labels logic is scattered and iteration logic is repeated all over the place
[ https://issues.apache.org/jira/browse/YARN-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10921: -- Assignee: Peter Szucs > AbstractCSQueue: Node Labels logic is scattered and iteration logic is > repeated all over the place > -- > > Key: YARN-10921 > URL: https://issues.apache.org/jira/browse/YARN-10921 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > TODO items: > - Check original Node labels epic / jiras? > - Think about ways to improve repetitive iteration on configuredNodeLabels > - Search for: "String label" in code > Code blocks to handle Node labels: > - AbstractCSQueue#setupQueueConfigs > - AbstractCSQueue#getQueueConfigurations > - AbstractCSQueue#accessibleToPartition > - AbstractCSQueue#getNodeLabelsForQueue > - AbstractCSQueue#updateAbsoluteCapacities > - AbstractCSQueue#updateConfigurableResourceRequirement > - CSQueueUtils#loadCapacitiesByLabelsFromConf > - AutoCreatedLeafQueue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
[ https://issues.apache.org/jira/browse/YARN-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648132#comment-17648132 ] Peter Szucs edited comment on YARN-10926 at 1/3/23 4:17 PM: I checked the mentioned jiras and the tests in the related code changes and didn't see any invalid assertions in them. For YARN-10504 there were a TODO to fix TestAbsoluteResourceWithAutoQueue#testAutoCreateLeafQueueCreation [here|#diff-1ed6d328b2546b3599468f169e823d2d411a3bdb85ea7871a8533cd205e2d311],] and it has been fixed in a later PR: [https://github.com/apache/hadoop/pull/3868] As I saw in the last comments of YARN-10504 test issues remained in TestAbsoluteResourceConfiguration.testSimpleMinMaxResourceConfigurartionPerQueue but it was also fixed in a follow-up commit [here|https://github.com/apache/hadoop/commit/4f008153ef5fca9e1f71ebc7069c502e803ab1e8] was (Author: JIRAUSER297340): I checked the mentioned jiras and the tests in the related code changes and didn't see any invalid assertions in them. For YARN-10504 there were a TODO to fix TestAbsoluteResourceWithAutoQueue#testAutoCreateLeafQueueCreation [here|[https://github.com/apache/hadoop/commit/b0eec0909772cf92427957670da5630b1dd11da0#diff-1ed6d328b2546b3599468f169e823d2d411a3bdb85ea7871a8533cd205e2d311],] and it has been fixed in a later PR: [https://github.com/apache/hadoop/pull/3868] As I saw in the last comments of [YARN-10504|https://issues.apache.org/jira/browse/YARN-10504] test issues remained in TestAbsoluteResourceConfiguration.testSimpleMinMaxResourceConfigurartionPerQueue but it was also fixed in a follow-up commit [here|https://github.com/apache/hadoop/commit/4f008153ef5fca9e1f71ebc7069c502e803ab1e8 ] > Test validation after YARN-10504 and YARN-10506: Check if modified test > expectations are correct or not > --- > > Key: YARN-10926 > URL: https://issues.apache.org/jira/browse/YARN-10926 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > YARN-10504 and YARN-10506 modified some test expectations. > The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
[ https://issues.apache.org/jira/browse/YARN-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs resolved YARN-10905. Resolution: Won't Fix > Investigate if AbstractCSQueue#configuredNodeLabels vs. > QueueCapacities#getExistingNodeLabels holds the same data > - > > Key: YARN-10905 > URL: https://issues.apache.org/jira/browse/YARN-10905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > The task is to investigate whether the field > AbstractCSQueue#configuredNodeLabels holds the same data as > QueueCapacities#getExistingNodeLabels. > Obviously, we don't want double-entry bookkeeping so if the data is the same, > we can remove this or that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
[ https://issues.apache.org/jira/browse/YARN-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs resolved YARN-10926. Resolution: Won't Fix > Test validation after YARN-10504 and YARN-10506: Check if modified test > expectations are correct or not > --- > > Key: YARN-10926 > URL: https://issues.apache.org/jira/browse/YARN-10926 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > YARN-10504 and YARN-10506 modified some test expectations. > The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reopened YARN-11041: > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks for > noticing.| > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041 ] Peter Szucs deleted comment on YARN-11041: was (Author: JIRAUSER297340): The attached pull request is merged to trunk, I think just the administration is left here, I'll close this ticket. > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks for > noticing.| > >
[jira] [Resolved] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs resolved YARN-11041. Resolution: Fixed > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks for > noticing.| > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653938#comment-17653938 ] Peter Szucs commented on YARN-11041: The attached pull request is merged to trunk, I think just the administration is left here, I'll close this ticket. > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks
[jira] [Updated] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11041: --- Fix Version/s: 3.4.0 > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks for > noticing.| > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (YARN-11041) Replace all occurences of queuePath with the new QueuePath class - followup
[ https://issues.apache.org/jira/browse/YARN-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-11041: -- Assignee: Peter Szucs > Replace all occurences of queuePath with the new QueuePath class - followup > --- > > Key: YARN-11041 > URL: https://issues.apache.org/jira/browse/YARN-11041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Tibor Kovács >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > The QueuePath class was introduced in YARN-10897, however, its current > adoption happened only for code changes after this JIRA. We need to adopt it > retrospectively. > > A lot of changes are introduced via ticket YARN-10982. The replacing should > be continued by touching the next comments: > > [...g/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueTemplate.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-fde6885144b59bb06b2c3358780388d958829b13f68aceee7bb6d394bb5e0548] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765012937] > I think this could be also refactored in a follow-up jira so the string magic > could probably be replaced with some more elegant solution. Though, I think > this would be too much in this patch, hence I do suggest the follow-up jira.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765013096] > [~bteke] [ |https://github.com/9uapaw] [~gandras] [ > \|https://github.com/9uapaw] Thoughts?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765110750] > +1, even the QueuePath object could have some kind of support for this.| > |[~gandras] [https://github.com/apache/hadoop/pull/3660#discussion_r765131244] > Agreed, let's handle it in a followup!| > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765023717] > There are many string operations in this class: > E.g. * getQueuePrefix that works with the full queue path > * getNodeLabelPrefix that also works with the full queue path| > I suggest to create a static class, called "QueuePrefixes" or something like > that and add some static methods there to convert the QueuePath object to > those various queue prefix strings that are ultimately keys in the > Configuration object. > > > > [...he/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java|https://github.com/apache/hadoop/pull/3660/files/f956918bc154d0e35fce07c5dd8be804eb007acc#diff-c4b0c5e70208f1e3cfbd5a86ffa2393e5c996cc8b45605d9d41abcb7e0bd382a] > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765026119] > This seems hacky, just based on the constructor parameter names of QueuePath: > parent, leaf. > The AQC Template prefix is not the leaf, obviously. > Could we somehow circumvent this?| > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765126207] > Maybe a factory method could be created, which returns a new QueuePath with > the parent set as the original queuePath. I.e > rootQueuePath.createChild(String childName) -> this could return a new > QueuePath object with root.childName path, and rootQueuePath as parent.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765039033] > Looking at this getQueues method, I realized almost all the callers are using > some kind of string magic that should be addressed with this patch. > For example, take a look at: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider#addQueue > I think getQueues should also receive the QueuePath object instead of > Strings.| > > > > [.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java|https://github.com/apache/hadoop/pull/3660/files/0c3dd17c936260fc9c386dcabc6368b54b27aa82..39f4ec203377244f840e4593aa02386ff51cc3c4#diff-0adf8192c51cbe4671324f06f7f8cbd48898df0376bbcc516451a3bdb2b48d3b] > |[~bteke] [https://github.com/apache/hadoop/pull/3660#discussion_r765912967] > Nit: Gets the queue path object. > The object of the queue suggests a CSQueue object.| > |[~snemeth] [https://github.com/apache/hadoop/pull/3660#discussion_r765922133] > Will fix the nit upon commit if I'm fine with the whole patch. Thanks for > noticing.| > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
[ https://issues.apache.org/jira/browse/YARN-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648132#comment-17648132 ] Peter Szucs commented on YARN-10926: I checked the mentioned jiras and the tests in the related code changes and didn't see any invalid assertions in them. For YARN-10504 there were a TODO to fix TestAbsoluteResourceWithAutoQueue#testAutoCreateLeafQueueCreation [here|[https://github.com/apache/hadoop/commit/b0eec0909772cf92427957670da5630b1dd11da0#diff-1ed6d328b2546b3599468f169e823d2d411a3bdb85ea7871a8533cd205e2d311],] and it has been fixed in a later PR: [https://github.com/apache/hadoop/pull/3868] As I saw in the last comments of [YARN-10504|https://issues.apache.org/jira/browse/YARN-10504] test issues remained in TestAbsoluteResourceConfiguration.testSimpleMinMaxResourceConfigurartionPerQueue but it was also fixed in a follow-up commit [here|https://github.com/apache/hadoop/commit/4f008153ef5fca9e1f71ebc7069c502e803ab1e8 ] > Test validation after YARN-10504 and YARN-10506: Check if modified test > expectations are correct or not > --- > > Key: YARN-10926 > URL: https://issues.apache.org/jira/browse/YARN-10926 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > YARN-10504 and YARN-10506 modified some test expectations. > The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
[ https://issues.apache.org/jira/browse/YARN-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645316#comment-17645316 ] Peter Szucs commented on YARN-10905: As I saw in my investigation, _configuredNodeLabels_ method is extracted to _NodeLabelsSettings_ class since the ticket was issued. In the _AbstractCSQueue's_ _setupQueueConfigs_ method we load the _queueNodeLabelsSettings_ and the _queueCapacities_ every time we refresh a queue. The process of this is the following: * we initialize the _queueNodeLabelsSettings_ and read all the node label informations (accessible/configured node labels, defaultLabelExpression) from the config. We store _configuredNodeLabels_ as a set of strings here. * after this we initialize the _queueCapacities_ map with iterating through the _configuredNodeLabels_ and reading the capacity properties for each label for a given queue from the config. * _QueueCapacities#getExistingNodeLabels_ returns the keyset of this map Since we are iterating through the _queueNodeLabelsSettings#configuredNodeLabels_ and creating another map from it for the detailed capacities, _configuredNodeLabels_ and _QueueCapacities#getExistingNodeLabels_ should return the same set of labels. *Conclusion:* _QueueCapacities_ needs the _configuredNodeLabels_ for the initialization, so I think the only thing that can be removed is the _QueueCapacities#getExistingNodeLabels_ method, but I think it's reasonable to have a method in _QueueCapacities_ to retrieve the keyset of the capacities map for code parts that are dealing with only the {_}QueueCapacities{_}, for example {_}QueueCapacitiesInfo{_}, or _mergeCapacities_ in the {_}AutoCreatedLeafQueue{_}, where we are creating one capacity map from another, so I haven't found a nice way to clean this up yet. > Investigate if AbstractCSQueue#configuredNodeLabels vs. > QueueCapacities#getExistingNodeLabels holds the same data > - > > Key: YARN-10905 > URL: https://issues.apache.org/jira/browse/YARN-10905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > The task is to investigate whether the field > AbstractCSQueue#configuredNodeLabels holds the same data as > QueueCapacities#getExistingNodeLabels. > Obviously, we don't want double-entry bookkeeping so if the data is the same, > we can remove this or that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
[ https://issues.apache.org/jira/browse/YARN-10926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10926: -- Assignee: Peter Szucs > Test validation after YARN-10504 and YARN-10506: Check if modified test > expectations are correct or not > --- > > Key: YARN-10926 > URL: https://issues.apache.org/jira/browse/YARN-10926 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > YARN-10504 and YARN-10506 modified some test expectations. > The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
[ https://issues.apache.org/jira/browse/YARN-10905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10905: -- Assignee: Peter Szucs > Investigate if AbstractCSQueue#configuredNodeLabels vs. > QueueCapacities#getExistingNodeLabels holds the same data > - > > Key: YARN-10905 > URL: https://issues.apache.org/jira/browse/YARN-10905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > The task is to investigate whether the field > AbstractCSQueue#configuredNodeLabels holds the same data as > QueueCapacities#getExistingNodeLabels. > Obviously, we don't want double-entry bookkeeping so if the data is the same, > we can remove this or that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
[ https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10946: -- Assignee: Peter Szucs > AbstractCSQueue: Create separate class for constructing Queue API objects > - > > Key: YARN-10946 > URL: https://issues.apache.org/jira/browse/YARN-10946 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > Relevant methods are: > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs resolved YARN-10959. Resolution: Resolved > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637151#comment-17637151 ] Peter Szucs edited comment on YARN-10959 at 11/22/22 10:32 AM: --- We saw that extracting wouldn't provide us a real benefit here because of the size of the duplication and the difference in the logic, so as per our discussion with [~snemeth] I close this ticket. was (Author: JIRAUSER297340): We saw that extracting wouldn't provide us a real benefit here because of the size of the duplication and the difference in the logic, ** so as per our discussion with [~snemeth] I close this ticket. > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637151#comment-17637151 ] Peter Szucs commented on YARN-10959: We saw that extracting wouldn't provide us a real benefit here because of the size of the duplication and the difference in the logic, ** so as per our discussion with [~snemeth] I close this ticket. > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10959) Extract common method of two that check if preemption disabled in CSQueuePreemption
[ https://issues.apache.org/jira/browse/YARN-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10959: -- Assignee: Peter Szucs > Extract common method of two that check if preemption disabled in > CSQueuePreemption > --- > > Key: YARN-10959 > URL: https://issues.apache.org/jira/browse/YARN-10959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > This is a follow-up of YARN-10913. > After YARN-10913, we have a class called CSQueuePreemption that has 2 methods > that are very similar to each other: > - isQueueHierarchyPreemptionDisabled > - isIntraQueueHierarchyPreemptionDisabled > The goal is to create one method and use it from those 2, merging the common > logic as much as we can. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10005) Code improvements in MutableCSConfigurationProvider
[ https://issues.apache.org/jira/browse/YARN-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs reassigned YARN-10005: -- Assignee: Peter Szucs > Code improvements in MutableCSConfigurationProvider > --- > > Key: YARN-10005 > URL: https://issues.apache.org/jira/browse/YARN-10005 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > > * Important: constructKeyValueConfUpdate and all related methods seems a > separate responsibility: how to convert incoming SchedConfUpdateInfo to > Configuration changes (Configuration object) > * Duplicated code block (9 lines) in init / formatConfigurationInStore methods > * Method "getConfStore" could be package-private -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org