[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=797522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797522 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 03/Aug/22 07:17 Start Date: 03/Aug/22 07:17 Worklog Time Spent: 10m Work Description: slfan1989 closed pull request #4438: HDFS-16631. Enable dfs.datanode.lockmanager.trace In Test. URL: https://github.com/apache/hadoop/pull/4438 Issue Time Tracking --- Worklog Id: (was: 797522) Time Spent: 2h 40m (was: 2.5h) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: image-2022-06-18-09-49-28-725.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > According to suggestions, the following modifications are made: > 1. On the read and write lock related methods of DataSetLockManager, add the > operation name to clearly indicate the source of the lock, which is > convenient for public use. > 2. Increase the granularity of indicator monitoring, including the number of > locks, the time of locks, and the early warning of locks. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782563 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 18/Jun/22 02:40 Start Date: 18/Jun/22 02:40 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1159345763 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 47m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 69m 45s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | shadedclient | 22m 31s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 26s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 44s | | The patch does not generate ASF License warnings. | | | | 99m 33s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4438 | | Optional Tests | dupname asflicense unit codespell detsecrets xmllint | | uname | Linux f79c1a23757e 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5dac9ee7e1a8bd62849eba4ec2813f5f8921bb87 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/testReport/ | | Max. process+thread count | 524 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 782563) Time Spent: 2.5h (was: 2h 20m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: image-2022-06-18-09-49-28-725.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > According to suggestions, the following modifications are made: > 1. On the read and write lock related
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782562 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 18/Jun/22 02:00 Start Date: 18/Jun/22 02:00 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1159337745 readLock ``` getVolume(final ExtendedBlock b) getStoredBlock(String bpid, long blkid) Set deepCopyReplica(String bpid) getBlockInputStream(ExtendedBlock b, long seekOffset) moveBlockAcrossStorage(ExtendedBlock block, StorageType targetStorageType, String targetStorageId) moveBlockAcrossVolumes(ExtendedBlock block, FsVolumeSpi destination) ReplicaHandler createRbw(StorageType storageType, String storageId, ExtendedBlock b, boolean allowLazyPersist) Map getBlockReports(String bpid) public List getFinalizedBlocks(String bpid) public boolean contains(final ExtendedBlock block) public String getReplicaString(String bpid, long blockId) public long getReplicaVisibleLength(final ExtendedBlock block) public BlockLocalPathInfo getBlockLocalPathInfo(ExtendedBlock block) ``` Issue Time Tracking --- Worklog Id: (was: 782562) Time Spent: 2h 20m (was: 2h 10m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: image-2022-06-18-09-49-28-725.png > > Time Spent: 2h 20m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > According to suggestions, the following modifications are made: > 1. On the read and write lock related methods of DataSetLockManager, add the > operation name to clearly indicate the source of the lock, which is > convenient for public use. > 2. Increase the granularity of indicator monitoring, including the number of > locks, the time of locks, and the early warning of locks. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782449 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 17/Jun/22 15:55 Start Date: 17/Jun/22 15:55 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1159014765 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 28s | | trunk passed | | +1 :green_heart: | shadedclient | 61m 49s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | shadedclient | 21m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 27s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 91m 14s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4438 | | Optional Tests | dupname asflicense unit codespell detsecrets xmllint | | uname | Linux 2672a54c0019 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 30bf6edc6e886f5ec0c6bf24e62a0a5bce4e838a | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/testReport/ | | Max. process+thread count | 550 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 782449) Time Spent: 2h 10m (was: 2h) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail:
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782434 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 17/Jun/22 15:29 Start Date: 17/Jun/22 15:29 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1158989376 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 62m 57s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | shadedclient | 21m 45s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 26s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 44s | | The patch does not generate ASF License warnings. | | | | 92m 8s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4438 | | Optional Tests | dupname asflicense unit codespell detsecrets xmllint | | uname | Linux e09f249c6203 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 438879f0d576fdb1e7f823b592daf3cfa0215d2a | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/testReport/ | | Max. process+thread count | 522 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 782434) Time Spent: 2h (was: 1h 50m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail:
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782386 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 17/Jun/22 12:40 Start Date: 17/Jun/22 12:40 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1158832553 > From my side, I do not think enable lock trace only is good idea for tests as @MingXiangLi has mentioned above. The only INFO level log will not help to debug or test. IMO, if there are some cases we would like to cover and need to collect locks information, it is better to add some inject logic. FYI. Thank you very much for your suggestion, I will think about how to collect lock information! Issue Time Tracking --- Worklog Id: (was: 782386) Time Spent: 1h 50m (was: 1h 40m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=782350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782350 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 17/Jun/22 09:38 Start Date: 17/Jun/22 09:38 Worklog Time Spent: 10m Work Description: Hexiaoqiao commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1158695959 From my side, I do not think enable lock trace only is good idea for tests as @MingXiangLi has mentioned above. The only INFO level log will not help to debug or test. IMO, if there are some cases we would like to cover and need to collect locks information, it is better to add some inject logic. FYI. Issue Time Tracking --- Worklog Id: (was: 782350) Time Spent: 1h 40m (was: 1.5h) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781450 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 15/Jun/22 02:43 Start Date: 15/Jun/22 02:43 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1155921035 @MingXiangLi @ZanderXu Thanks for helping to review the code, can I make the following changes? ``` public void lockLeakCheck() throws Exception { if (!openLockTrace) { LOG.warn("not open lock leak check func"); return; } if (threadCountMap.isEmpty()) { LOG.warn("all lock has release"); return; } setLastException(new Exception("lock Leak")); threadCountMap.forEach((name, trackLog) -> trackLog.showLockMessage()); // throw exception ? throw new Exception("lock Leak"); } ``` Issue Time Tracking --- Worklog Id: (was: 781450) Time Spent: 1.5h (was: 1h 20m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781240 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 14:19 Start Date: 14/Jun/22 14:19 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1155258147 @slfan1989 It's a good idea. But I personally feel that it would be nice to thrown exception directly in the test environment when there is a lock leak. Issue Time Tracking --- Worklog Id: (was: 781240) Time Spent: 1h 20m (was: 1h 10m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781067 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 10:05 Start Date: 14/Jun/22 10:05 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154982248 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 61m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | shadedclient | 22m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 28s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 44s | | The patch does not generate ASF License warnings. | | | | 91m 20s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4438 | | Optional Tests | dupname asflicense unit codespell detsecrets xmllint | | uname | Linux cd4c75a4e6c9 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a9933a117854461fbc1900ed6a2344e7b6d947f1 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/testReport/ | | Max. process+thread count | 520 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 781067) Time Spent: 1h 10m (was: 1h) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail:
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781055 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 09:24 Start Date: 14/Jun/22 09:24 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154938140 > Ans should we throw Exception ? Most of user will ignore if just print the log. If a thread obtains a read lock (write lock) and needs to obtain a write lock (read lock), will an exception be thrown directly? Issue Time Tracking --- Worklog Id: (was: 781055) Time Spent: 1h (was: 50m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781054 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 09:23 Start Date: 14/Jun/22 09:23 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154936417 > DataSetLockManager only print lock trace when invoke DataNode.shutdown() or dataSetLockManager.lockLeakCheck().So I doubt it will work in all UT. Thanks for the suggestion, I think this parameter can cover scenarios like HDFS-16600, DN shutdown will definitely be called in Junit Test, I personally feel that if there is an error in the junit test of DN and if you suspect a deadlock, you can see this print message. Issue Time Tracking --- Worklog Id: (was: 781054) Time Spent: 50m (was: 40m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781052 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 09:17 Start Date: 14/Jun/22 09:17 Worklog Time Spent: 10m Work Description: MingXiangLi commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154930047 Ans should we throw Exception ? Most of user will ignore if just print the log. Issue Time Tracking --- Worklog Id: (was: 781052) Time Spent: 40m (was: 0.5h) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781051 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 09:15 Start Date: 14/Jun/22 09:15 Worklog Time Spent: 10m Work Description: MingXiangLi commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154927638 DataSetLockManager only print lock trace when invoke DataNode.shutdown() or dataSetLockManager.lockLeakCheck().So I doubt it will work in all UT. Issue Time Tracking --- Worklog Id: (was: 781051) Time Spent: 0.5h (was: 20m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781034 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 08:34 Start Date: 14/Jun/22 08:34 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4438: URL: https://github.com/apache/hadoop/pull/4438#issuecomment-1154882633 @Hexiaoqiao @MingXiangLi @ZanderXu please help me review the code. Issue Time Tracking --- Worklog Id: (was: 781034) Time Spent: 20m (was: 10m) > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16631) Enable dfs.datanode.lockmanager.trace In Test
[ https://issues.apache.org/jira/browse/HDFS-16631?focusedWorklogId=781033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781033 ] ASF GitHub Bot logged work on HDFS-16631: - Author: ASF GitHub Bot Created on: 14/Jun/22 08:32 Start Date: 14/Jun/22 08:32 Worklog Time Spent: 10m Work Description: slfan1989 opened a new pull request, #4438: URL: https://github.com/apache/hadoop/pull/4438 JIRA:HDFS-16631. Enable dfs.datanode.lockmanager.trace In Test. In Jira [HDFS-16600](https://issues.apache.org/jira/browse/HDFS-16600). Fix deadlock on DataNode side. We discussed the issue of deadlock, this is a very meaningful discussion, I was reading the log and found the following: ``` 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - not open lock leak check func. ``` Looking at the code, I found that there is such a parameter: ``` dfs.datanode.lockmanager.trace false If this is true, after shut down datanode lock Manager will print all leak thread that not release by lock Manager. Only used for test or trace dead lock problem. In produce default set false, because it's have little performance loss. ``` I think this parameter should be added in the test environment, so that if there is a DN deadlock, the cause can be quickly located. If my understanding is correct, if a thread needs both read locks and write locks, if this parameter is true, relevant thread information can be printed. Issue Time Tracking --- Worklog Id: (was: 781033) Remaining Estimate: 0h Time Spent: 10m > Enable dfs.datanode.lockmanager.trace In Test > - > > Key: HDFS-16631 > URL: https://issues.apache.org/jira/browse/HDFS-16631 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of > deadlock, this is a very meaningful discussion, I was reading the log and > found the following: > {code:java} > 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN > datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) - > not open lock leak check func.{code} > Looking at the code, I found that there is such a parameter: > {code:java} > > dfs.datanode.lockmanager.trace > false > > If this is true, after shut down datanode lock Manager will print all > leak > thread that not release by lock Manager. Only used for test or trace > dead lock > problem. In produce default set false, because it's have little > performance loss. > > {code} > I think this parameter should be added in the test environment, so that if > there is a DN deadlock, the cause can be quickly located. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org