[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-13051: -- Target Version/s: 2.7.8 (was: 2.7.7) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.s
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13051: - Target Version/s: 2.10.0, 2.8.4, 2.7.6, 3.0.2, 2.9.2 (was: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.F
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HDFS-13051: -- Target Version/s: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 (was: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.n
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-13051: --- Status: Patch Available (was: Open) Finally got back to this. Basic issue is a synchronized caller (ie. log rolling) must use wait() to relinquish the monitor so the async thread has a chance to drain the full queue. Non-synchronized caller can just put. I had to add a customized semaphore to throttle access to a full queue. While generally useful, the unit tests would not finish due to starvation. Edit were produced so fast that between the time to unpark and re-offer, it almost always lost the race. In practice, it's impossible to generate edits that fast. The other option was add a random sleep between edits in the tests but that provided no where near the stress level to help ensure deadlock cannot occur. > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at ja
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-13051: --- Attachment: HDFS-13112.patch > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.F
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-13051: - Target Version/s: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 (was: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.nameno
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Attachment: deadlock.patch > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEdi
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Attachment: (was: deadlock.patch) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1190) > - locked <
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Attachment: deadlock.patch > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEdi
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-13051: -- Target Version/s: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6 (was: 2.8.1, 2.8.2, 2.7.5, 3.0.0) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.nam
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Labels: AsyncEditlog deadlock (was: ) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Priority: Major > Labels: AsyncEditlog, deadlock > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1190) > - locked <0x7fbcbc131000> (a > org.apac
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Description: when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync lock object,and write 3 EDIT(the second one override logEdit method and return true) in extremely case,when FSEditLogAsync's logSync is very slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer edit object into editPendingQ when doing rolleditlog,it block on editPendingQ .put method,however it does't release FSEditLogAsync object lock, and edit.logEdit method in FSEditLogAsync.run thread can never acquire FSEditLogAsync object lock, it case dead lock stack trace like below "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) - locked <0x7fbd3cbae500> (a java.lang.Object) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) at java.lang.Thread.run(Thread.java:745) "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) - waiting to lock <*0x7fbcbc131000*> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) at java.lang.Thread.run(Thread.java:745) "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) - locked <*0x7fbcbc131000*> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1190) - locked <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1235) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6463) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1062) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Target Version/s: 3.0.0, 2.7.5, 2.8.2, 2.8.1 (was: 2.8.1, 2.8.2, 2.7.5, 3.0.0) Description: when doing rolleditlog it acquire fs write lock,then acquire FSEditLogAsync lock object,and write 3 EDIT(the second one override logEdit method and return true) in extremely case,when FSEditLogAsync's logSync is very slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer edit object into editPendingQ when doing rolleditlog,it block on editPendingQ .put method,however it does't release FSEditLogAsync object lock, and edit.logEdit method in FSEditLogAsync.run thread can never acquire FSEditLogAsync object lock, it case dead lock stack trace like below "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) - locked <0x7fbd3cbae500> (a java.lang.Object) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) at java.lang.Thread.run(Thread.java:745) "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) - waiting to lock <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) at java.lang.Thread.run(Thread.java:745) "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) - locked <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1190) - locked <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1235) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6463) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1062) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(Na
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Description: when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync lock object,and write 3 EDIT(the second one override logEdit method and return true) in extremely case,when FSEditLogAsync's logSync is very slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer edit object into editPendingQ when doing rolleditlog,it block on editPendingQ .put method,however it does't release FSEditLogAsync object lock, and edit.logEdit method in FSEditLogAsync.run thread can never acquire FSEditLogAsync object lock, it case dead lock stack trace like below "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) - locked <0x7fbd3cbae500> (a java.lang.Object) at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) at java.lang.Thread.run(Thread.java:745) "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) - waiting to lock <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) at java.lang.Thread.run(Thread.java:745) "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fbd3cb96f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) - locked <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1190) - locked <0x7fbcbc131000> (a org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1235) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6463) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1062) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$Prot
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangwei updated HDFS-13051: Target Version/s: 3.0.0, 2.7.5, 2.8.2, 2.8.1 (was: 2.8.1, 2.8.2, 2.7.5, 3.0.0) Description: When we apply AsyncEditlog in our prod env, in case, deadlock will occur and crash the namenode. in the case, rollEditlog rpc call happen while editPendingQ in AsyncEditlog is full (was: When we apply AsyncEditlog in our prod env, in extremely case, deadlock will occur and crash the namenode. in the case, rollEditlog rpc call happen while AsyncEditlog asynchronized takes edit object from queue and is going to acquire AsyncEditlog object lock to executed it's logEdit method.) Summary: dead lock occurs when rolleditlog rpc call happen and editPendingQ is full (was: dead lock occurs when using async editlog while rolleditlog rpc call happen) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Priority: Major > > When we apply AsyncEditlog in our prod env, in case, deadlock will occur and > crash the namenode. in the case, rollEditlog rpc call happen while > editPendingQ in AsyncEditlog is full -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org