[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Attachment: HDFS-10472.patch add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Labels: patch (was: ) Release Note: catch throwable Status: Patch Available (was: Open) add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 >Reporter: ChenFolin > Labels: patch > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
ChenFolin created HDFS-10472: Summary: NameNode Rpc Reader Thread crash, and cluster hang. Key: HDFS-10472 URL: https://issues.apache.org/jira/browse/HDFS-10472 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 Reporter: ChenFolin My Cluster hang yesterday . Becuase the rpc server Reader threads crash. So all rpc request timeout, include datanode hearbeat &. We can see , the method doRunLoop just catch InterruptedException and IOException: while (running) { SelectionKey key = null; try { // consume as many connections as currently queued to avoid // unbridled acceptance of connections that starves the select int size = pendingConnections.size(); for (int i=size; i>0; i--) { Connection conn = pendingConnections.take(); conn.channel.register(readSelector, SelectionKey.OP_READ, conn); } readSelector.select(); Iterator iter = readSelector.selectedKeys().iterator(); while (iter.hasNext()) { key = iter.next(); iter.remove(); if (key.isValid()) { if (key.isReadable()) { doRead(key); } } key = null; } } catch (InterruptedException e) { if (running) { // unexpected -- log it LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e); } } catch (IOException ex) { LOG.error("Error in Reader", ex); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10214) Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 10000
[ https://issues.apache.org/jira/browse/HDFS-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin resolved HDFS-10214. -- Resolution: Duplicate Fix Version/s: 2.7.2 > Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause > DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the > object num > 1. > -- > > Key: HDFS-10214 > URL: https://issues.apache.org/jira/browse/HDFS-10214 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, namenode >Affects Versions: 2.5.0, 2.6.4 > Environment: 500 DataNode. > 137407265 files and directories, 129614074 blocks = 267021339 total > filesystem object(s) >Reporter: ChenFolin > Fix For: 2.7.2 > > Original Estimate: 672h > Remaining Estimate: 672h > > The current Cluster status : > 137407265 files and directories, 129614074 blocks = 267021339 total > filesystem object(s). > The checkpoint save namespace cost more than 5 min. > DataNode rpc timeout. > Standby NameNode skip the DataNode rpc request(because datanode rpc timeout , > datanode close the socket channel). > There are many corrupt files when failover. > So, Checkpoint may be done by other component, not Standby NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting
[ https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin resolved HDFS-10322. -- Resolution: Fixed Fix Version/s: 2.6.4 It is the same as HADOOP-11802 https://issues.apache.org/jira/browse/HADOOP-11802. I am not sure first, because i saw a runnable DomainSocketWatcher thread. Now, I have known that runnable DomainSocketWatcher thread provide the webhdfs service, because if enable webhdfs, the datanode process may contains two DomainSocketWatcher threads. And now, I am sure another thread was done. > DomianSocket error lead to more and more DataNode thread waiting > - > > Key: HDFS-10322 > URL: https://issues.apache.org/jira/browse/HDFS-10322 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: ChenFolin > Fix For: 2.6.4 > > > When open short read and a DomianSoket broken pipe error happened,The > Datanode will produce more and more waiting threads. > It is similar to Bug HADOOP-11802, but i do not think they are same problem, > because the DomainSocket thread is in Running state. > stack log: > "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for > operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on > condition [0x7f2d6e4a5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00061c493500> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > =DomianSocketWatcher > "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable > [0x7f2dbe4cb000] >java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) > at > org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474) > at java.lang.Thread.run(Thread.java:745) > ===datanode error log > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM > operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: > java.net.SocketException: write(2) error: Broken pipe > at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) > at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) > at > org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) > at > com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) > at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) > at > com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting
[ https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253284#comment-15253284 ] ChenFolin commented on HDFS-10322: -- Hello Chris Nautoth Thanks for your reply. I see all bugs (HADOOP-11333, HADOOP-11604, HADOOP-11648 and HDFS-8429) are not the same with me. > DomianSocket error lead to more and more DataNode thread waiting > - > > Key: HDFS-10322 > URL: https://issues.apache.org/jira/browse/HDFS-10322 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: ChenFolin > > When open short read and a DomianSoket broken pipe error happened,The > Datanode will produce more and more waiting threads. > It is similar to Bug HADOOP-11802, but i do not think they are same problem, > because the DomainSocket thread is in Running state. > stack log: > "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for > operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on > condition [0x7f2d6e4a5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00061c493500> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > =DomianSocketWatcher > "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable > [0x7f2dbe4cb000] >java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) > at > org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474) > at java.lang.Thread.run(Thread.java:745) > ===datanode error log > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM > operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: > java.net.SocketException: write(2) error: Broken pipe > at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) > at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) > at > org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) > at > com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) > at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) > at > com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting
[ https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251390#comment-15251390 ] ChenFolin commented on HDFS-10322: -- May call The native method doPoll0 had got stuck.. > DomianSocket error lead to more and more DataNode thread waiting > - > > Key: HDFS-10322 > URL: https://issues.apache.org/jira/browse/HDFS-10322 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 >Reporter: ChenFolin > > When open short read and a DomianSoket broken pipe error happened,The > Datanode will produce more and more waiting threads. > It is similar to Bug HADOOP-11802, but i do not think they are same problem, > because the DomainSocket thread is in Running state. > stack log: > "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for > operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on > condition [0x7f2d6e4a5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00061c493500> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) > =DomianSocketWatcher > "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable > [0x7f2dbe4cb000] >java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) > at > org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) > at > org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474) > at java.lang.Thread.run(Thread.java:745) > ===datanode error log > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM > operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: > java.net.SocketException: write(2) error: Broken pipe > at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) > at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) > at > org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) > at > com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) > at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) > at > com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting
ChenFolin created HDFS-10322: Summary: DomianSocket error lead to more and more DataNode thread waiting Key: HDFS-10322 URL: https://issues.apache.org/jira/browse/HDFS-10322 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: ChenFolin When open short read and a DomianSoket broken pipe error happened,The Datanode will produce more and more waiting threads. It is similar to Bug HADOOP-11802, but i do not think they are same problem, because the DomainSocket thread is in Running state. stack log: "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on condition [0x7f2d6e4a5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00061c493500> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) =DomianSocketWatcher "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable [0x7f2dbe4cb000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474) at java.lang.Thread.run(Thread.java:745) ===datanode error log ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: java.net.SocketException: write(2) error: Broken pipe at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10229) Long GC cause Checkpoint TransferFsImage failed, and not retry.
ChenFolin created HDFS-10229: Summary: Long GC cause Checkpoint TransferFsImage failed, and not retry. Key: HDFS-10229 URL: https://issues.apache.org/jira/browse/HDFS-10229 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.2, 2.5.0 Reporter: ChenFolin 2016-03-28 14:22:52,542 WARN [org.apache.hadoop.util.JvmPauseMonitor$Monitor@25e67bd6] org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 272655ms GC pool 'ParNew' had collection(s): count=1 time=3377ms GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=269508ms 2016-03-28 14:22:52,552 ERROR [Standby State Checkpointer] org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Exception in doCheckpoint java.io.IOException: Exception during image upload: java.io.IOException: Error writing request body to server at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:221) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:353) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$700(StandbyCheckpointer.java:260) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:280) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:411) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:276) Caused by: java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3205) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3188) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:368) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:316) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:290) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:222) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:207) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:204) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10214) Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 100000
ChenFolin created HDFS-10214: Summary: Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 1. Key: HDFS-10214 URL: https://issues.apache.org/jira/browse/HDFS-10214 Project: Hadoop HDFS Issue Type: New Feature Components: ha, namenode Affects Versions: 2.6.4, 2.5.0 Environment: 500 DataNode. 137407265 files and directories, 129614074 blocks = 267021339 total filesystem object(s) Reporter: ChenFolin The current Cluster status : 137407265 files and directories, 129614074 blocks = 267021339 total filesystem object(s). The checkpoint save namespace cost more than 5 min. DataNode rpc timeout. Standby NameNode skip the DataNode rpc request(because datanode rpc timeout , datanode close the socket channel). There are many corrupt files when failover. So, Checkpoint may be done by other component, not Standby NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-4423) Checkpoint exception causes fatal damage to fsimage.
ChenFolin created HDFS-4423: --- Summary: Checkpoint exception causes fatal damage to fsimage. Key: HDFS-4423 URL: https://issues.apache.org/jira/browse/HDFS-4423 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.1.1, 1.0.4 Environment: CentOS 6.2 Reporter: ChenFolin Priority: Blocker The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java {code} boolean loadFSImage(MetaRecoveryContext recovery) throws IOException { ... latestNameSD.read(); needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE)); LOG.info(Image file of size + imageSize + loaded in + (FSNamesystem.now() - startTime)/1000 + seconds.); // Load latest edits if (latestNameCheckpointTime latestEditsCheckpointTime) // the image is already current, discard edits needToSave |= true; else // latestNameCheckpointTime == latestEditsCheckpointTime needToSave |= (loadFSEdits(latestEditsSD, recovery) 0); return needToSave; } {code} If it is the normal flow of the checkpoint,the value of latestNameCheckpointTime is equal to the value of latestEditsCheckpointTime,and it will exec “else”. The problem is that,latestNameCheckpointTime latestEditsCheckpointTime: SecondNameNode starts checkpoint, ... NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime and before write latestEditsCheckpointTime. Start NameNode:because latestNameCheckpointTime latestEditsCheckpointTime,so the value of needToSave is true, and it will not update “rootDir”'s nsCount that is the cluster's file number(update exec at loadFSEdits “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then “saveNamespace” will write file number to fsimage whit default value “1”。 The next time,loadFSImage will fail. Maybe,it will work: {code} boolean loadFSImage(MetaRecoveryContext recovery) throws IOException { ... latestNameSD.read(); needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE)); LOG.info(Image file of size + imageSize + loaded in + (FSNamesystem.now() - startTime)/1000 + seconds.); // Load latest edits if (latestNameCheckpointTime latestEditsCheckpointTime){ // the image is already current, discard edits needToSave |= true; FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota(); } else // latestNameCheckpointTime == latestEditsCheckpointTime needToSave |= (loadFSEdits(latestEditsSD, recovery) 0); return needToSave; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Attachment: HDFS-4309.patch Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. {code} private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Status: Patch Available (was: Open) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0 Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. {code} private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Affects Version/s: (was: 2.0.2-alpha) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. {code} private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532291#comment-13532291 ] ChenFolin commented on HDFS-4309: - Hi Aaron T. Myers, When I execute dev-support/test-patch.sh patch,that causes many errors,such as: org.apache.hadoop.record.RecordComparator is deprecated. and the code is: {code} @Deprecated @InterfaceAudience.Public @InterfaceStability.Stable public abstract class RecordComparator extends WritableComparator { {code} So,dev-support/test-patch.sh patch exec failed.And now,how can I do for it? == == Determining number of patched javac warnings. == == mvn clean test -DskipTests -DHadoopPatchProcess -Pnative -Ptest-patch /tmp/patchJavacWarnings.txt 21 {color:red}-1 overall{color}. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. == == Finished build. == == Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. {code} private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Labels: patch (was: ) Target Version/s: 2.0.2-alpha Status: Patch Available (was: Open) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0 Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Attachment: HDFS-4309.patch Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530963#comment-13530963 ] ChenFolin commented on HDFS-4309: - HDFS-4309.patch Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531910#comment-13531910 ] ChenFolin commented on HDFS-4309: - Hi,Aaron T. Myers. According to your suggestion, I re-hit a patch,may it is available. Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Status: Patch Available (was: Open) renew patch Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0 Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Attachment: HDFS-4309.patch May available Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: HDFS-4309.patch, HDFS-4309.patch, jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Attachment: (was: HDFS-4309.patch) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak
[ https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-4309: Attachment: (was: HDFS-4309.patch) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak -- Key: HDFS-4309 URL: https://issues.apache.org/jira/browse/HDFS-4309 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha Reporter: MaWenJin Labels: patch Attachments: jmap2.log Original Estimate: 204h Remaining Estimate: 204h If multiple threads concurrently execute the following methods will result in the thread fs = createFileSystem (uri, conf) method is called.And create multiple DFSClient, start at the same time LeaseChecker daemon thread, may not be able to use shutdownhook close it after the process, resulting in a memory leak. private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{ FileSystem fs = null; synchronized (this) { fs = map.get(key); } if (fs != null) { return fs; } // this is fs = createFileSystem(uri, conf); synchronized (this) { // refetch the lock again FileSystem oldfs = map.get(key); if (oldfs != null) { // a file system is created while lock is releasing fs.close(); // close the new file system return oldfs; // return the old file system } // now insert the new file system into the map if (map.isEmpty() !clientFinalizer.isAlive()) { Runtime.getRuntime().addShutdownHook(clientFinalizer); } fs.key = key; map.put(key, fs); if (conf.getBoolean(fs.automatic.close, true)) { toAutoClose.add(key); } return fs; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira