[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.

2016-05-31 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-10472:
-
Attachment: HDFS-10472.patch

add catch throwable

> NameNode Rpc Reader Thread crash, and cluster hang.
> ---
>
> Key: HDFS-10472
> URL: https://issues.apache.org/jira/browse/HDFS-10472
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4
>Reporter: ChenFolin
>  Labels: patch
> Attachments: HDFS-10472.patch
>
>
> My Cluster hang yesterday .
> Becuase the rpc server Reader threads crash. So all rpc request  timeout, 
> include datanode hearbeat &.
> We can see , the method doRunLoop just catch InterruptedException and 
> IOException:
> while (running) {
>   SelectionKey key = null;
>   try {
> // consume as many connections as currently queued to avoid
> // unbridled acceptance of connections that starves the select
> int size = pendingConnections.size();
> for (int i=size; i>0; i--) {
>   Connection conn = pendingConnections.take();
>   conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
> }
> readSelector.select();
> Iterator iter = 
> readSelector.selectedKeys().iterator();
> while (iter.hasNext()) {
>   key = iter.next();
>   iter.remove();
>   if (key.isValid()) {
> if (key.isReadable()) {
>   doRead(key);
> }
>   }
>   key = null;
> }
>   } catch (InterruptedException e) {
> if (running) {  // unexpected -- log it
>   LOG.info(Thread.currentThread().getName() + " unexpectedly 
> interrupted", e);
> }
>   } catch (IOException ex) {
> LOG.error("Error in Reader", ex);
>   } 
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.

2016-05-31 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-10472:
-
  Labels: patch  (was: )
Release Note: catch throwable
  Status: Patch Available  (was: Open)

add catch throwable

> NameNode Rpc Reader Thread crash, and cluster hang.
> ---
>
> Key: HDFS-10472
> URL: https://issues.apache.org/jira/browse/HDFS-10472
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0
>Reporter: ChenFolin
>  Labels: patch
>
> My Cluster hang yesterday .
> Becuase the rpc server Reader threads crash. So all rpc request  timeout, 
> include datanode hearbeat &.
> We can see , the method doRunLoop just catch InterruptedException and 
> IOException:
> while (running) {
>   SelectionKey key = null;
>   try {
> // consume as many connections as currently queued to avoid
> // unbridled acceptance of connections that starves the select
> int size = pendingConnections.size();
> for (int i=size; i>0; i--) {
>   Connection conn = pendingConnections.take();
>   conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
> }
> readSelector.select();
> Iterator iter = 
> readSelector.selectedKeys().iterator();
> while (iter.hasNext()) {
>   key = iter.next();
>   iter.remove();
>   if (key.isValid()) {
> if (key.isReadable()) {
>   doRead(key);
> }
>   }
>   key = null;
> }
>   } catch (InterruptedException e) {
> if (running) {  // unexpected -- log it
>   LOG.info(Thread.currentThread().getName() + " unexpectedly 
> interrupted", e);
> }
>   } catch (IOException ex) {
> LOG.error("Error in Reader", ex);
>   } 
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.

2016-05-31 Thread ChenFolin (JIRA)
ChenFolin created HDFS-10472:


 Summary: NameNode Rpc Reader Thread crash, and cluster hang.
 Key: HDFS-10472
 URL: https://issues.apache.org/jira/browse/HDFS-10472
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0
Reporter: ChenFolin


My Cluster hang yesterday .
Becuase the rpc server Reader threads crash. So all rpc request  timeout, 
include datanode hearbeat &.
We can see , the method doRunLoop just catch InterruptedException and 
IOException:

while (running) {
  SelectionKey key = null;
  try {
// consume as many connections as currently queued to avoid
// unbridled acceptance of connections that starves the select
int size = pendingConnections.size();
for (int i=size; i>0; i--) {
  Connection conn = pendingConnections.take();
  conn.channel.register(readSelector, SelectionKey.OP_READ, conn);
}
readSelector.select();

Iterator iter = 
readSelector.selectedKeys().iterator();
while (iter.hasNext()) {
  key = iter.next();
  iter.remove();
  if (key.isValid()) {
if (key.isReadable()) {
  doRead(key);
}
  }
  key = null;
}
  } catch (InterruptedException e) {
if (running) {  // unexpected -- log it
  LOG.info(Thread.currentThread().getName() + " unexpectedly 
interrupted", e);
}
  } catch (IOException ex) {
LOG.error("Error in Reader", ex);
  } 
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10214) Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 10000

2016-04-27 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin resolved HDFS-10214.
--
   Resolution: Duplicate
Fix Version/s: 2.7.2

> Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause 
> DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the 
> object num > 1.
> --
>
> Key: HDFS-10214
> URL: https://issues.apache.org/jira/browse/HDFS-10214
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, namenode
>Affects Versions: 2.5.0, 2.6.4
> Environment: 500 DataNode.
> 137407265 files and directories, 129614074 blocks = 267021339 total 
> filesystem object(s)
>Reporter: ChenFolin
> Fix For: 2.7.2
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current Cluster status :
> 137407265 files and directories, 129614074 blocks = 267021339 total 
> filesystem object(s).
> The checkpoint save namespace cost more than 5 min.
> DataNode rpc timeout.
> Standby NameNode skip the DataNode rpc request(because datanode rpc timeout , 
> datanode close the socket channel).
> There are many corrupt files when failover.
> So, Checkpoint may be done by other component, not Standby NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin resolved HDFS-10322.
--
   Resolution: Fixed
Fix Version/s: 2.6.4

It is the same as HADOOP-11802 
https://issues.apache.org/jira/browse/HADOOP-11802.
I am not sure first, because i saw a runnable DomainSocketWatcher thread. 
Now, I have known that runnable DomainSocketWatcher thread  provide the webhdfs 
service, because if enable webhdfs, the datanode process may contains two 
DomainSocketWatcher threads. And now, I am sure another thread was done.  

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-21 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253284#comment-15253284
 ] 

ChenFolin commented on HDFS-10322:
--

Hello Chris Nautoth
Thanks for your reply.
I see  all bugs (HADOOP-11333, HADOOP-11604, HADOOP-11648 and HDFS-8429) are 
not the same with me.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-21 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251390#comment-15251390
 ] 

ChenFolin commented on HDFS-10322:
--

May call The native method doPoll0 had got stuck..

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-21 Thread ChenFolin (JIRA)
ChenFolin created HDFS-10322:


 Summary: DomianSocket error lead to more and more DataNode thread 
waiting 
 Key: HDFS-10322
 URL: https://issues.apache.org/jira/browse/HDFS-10322
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0
Reporter: ChenFolin


When open short read and  a DomianSoket broken pipe error happened,The Datanode 
will produce more and more waiting threads.
 It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
because the DomainSocket thread is in Running state.


stack log:

"DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
condition [0x7f2d6e4a5000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00061c493500> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:745)

=DomianSocketWatcher

"Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
[0x7f2dbe4cb000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
at java.lang.Thread.run(Thread.java:745)

===datanode error log

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
java.net.SocketException: write(2) error: Broken pipe
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
at 
com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
at 
com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10229) Long GC cause Checkpoint TransferFsImage failed, and not retry.

2016-03-29 Thread ChenFolin (JIRA)
ChenFolin created HDFS-10229:


 Summary: Long GC cause Checkpoint TransferFsImage failed, and not 
retry.
 Key: HDFS-10229
 URL: https://issues.apache.org/jira/browse/HDFS-10229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.2, 2.5.0
Reporter: ChenFolin


2016-03-28 14:22:52,542 WARN 
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@25e67bd6] 
org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine 
(eg GC): pause of approximately 272655ms
GC pool 'ParNew' had collection(s): count=1 time=3377ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=269508ms
2016-03-28 14:22:52,552 ERROR [Standby State Checkpointer] 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Exception in 
doCheckpoint
java.io.IOException: Exception during image upload: java.io.IOException: Error 
writing request body to server
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:221)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:353)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$700(StandbyCheckpointer.java:260)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:280)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:411)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:276)
Caused by: java.io.IOException: Error writing request body to server
at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3205)
at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3188)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:368)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:316)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:222)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:207)
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:204)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10214) Checkpoint Can not be done by StandbyNameNode.Because checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc timeout when the object num > 100000

2016-03-24 Thread ChenFolin (JIRA)
ChenFolin created HDFS-10214:


 Summary: Checkpoint Can not be done by StandbyNameNode.Because 
checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc 
timeout when the object num > 1.
 Key: HDFS-10214
 URL: https://issues.apache.org/jira/browse/HDFS-10214
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, namenode
Affects Versions: 2.6.4, 2.5.0
 Environment: 500 DataNode.

137407265 files and directories, 129614074 blocks = 267021339 total filesystem 
object(s)
Reporter: ChenFolin


The current Cluster status :
137407265 files and directories, 129614074 blocks = 267021339 total filesystem 
object(s).

The checkpoint save namespace cost more than 5 min.

DataNode rpc timeout.

Standby NameNode skip the DataNode rpc request(because datanode rpc timeout , 
datanode close the socket channel).

There are many corrupt files when failover.

So, Checkpoint may be done by other component, not Standby NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-4423) Checkpoint exception causes fatal damage to fsimage.

2013-01-20 Thread ChenFolin (JIRA)
ChenFolin created HDFS-4423:
---

 Summary: Checkpoint exception causes fatal damage to fsimage.
 Key: HDFS-4423
 URL: https://issues.apache.org/jira/browse/HDFS-4423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.1.1, 1.0.4
 Environment: CentOS 6.2
Reporter: ChenFolin
Priority: Blocker


The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
{code}
boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
...
latestNameSD.read();
needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
LOG.info(Image file of size  + imageSize +  loaded in  
+ (FSNamesystem.now() - startTime)/1000 +  seconds.);

// Load latest edits
if (latestNameCheckpointTime  latestEditsCheckpointTime)
  // the image is already current, discard edits
  needToSave |= true;
else // latestNameCheckpointTime == latestEditsCheckpointTime
  needToSave |= (loadFSEdits(latestEditsSD, recovery)  0);

return needToSave;
  }
{code}
If it is the normal flow of the checkpoint,the value of 
latestNameCheckpointTime  is equal to the value of 
latestEditsCheckpointTime,and it will exec “else”.
The problem is that,latestNameCheckpointTime  latestEditsCheckpointTime:
SecondNameNode starts checkpoint,
...
NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime and 
before write latestEditsCheckpointTime.
Start NameNode:because latestNameCheckpointTime  latestEditsCheckpointTime,so 
the value of needToSave is true, and it will not update “rootDir”'s nsCount 
that is the cluster's file number(update exec at loadFSEdits 
“FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then 
“saveNamespace” will write file number to fsimage whit default value “1”。
The next time,loadFSImage will fail.

Maybe,it will work:
{code}
boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
...
latestNameSD.read();
needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
LOG.info(Image file of size  + imageSize +  loaded in  
+ (FSNamesystem.now() - startTime)/1000 +  seconds.);

// Load latest edits
if (latestNameCheckpointTime  latestEditsCheckpointTime){
  // the image is already current, discard edits
  needToSave |= true;
  FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
}
else // latestNameCheckpointTime == latestEditsCheckpointTime
  needToSave |= (loadFSEdits(latestEditsSD, recovery)  0);

return needToSave;
  }
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-16 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Attachment: HDFS-4309.patch

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 {code}
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-16 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Status: Patch Available  (was: Open)

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 {code}
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-16 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Affects Version/s: (was: 2.0.2-alpha)

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 {code}
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-14 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532291#comment-13532291
 ] 

ChenFolin commented on HDFS-4309:
-

Hi Aaron T. Myers,
When I execute dev-support/test-patch.sh patch,that causes many errors,such 
as:
org.apache.hadoop.record.RecordComparator is deprecated.
and the code is:
{code}
@Deprecated
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class RecordComparator extends WritableComparator {
{code}

So,dev-support/test-patch.sh patch exec failed.And now,how can I do for it?

==
==
Determining number of patched javac warnings.
==
==


mvn clean test -DskipTests -DHadoopPatchProcess -Pnative -Ptest-patch  
/tmp/patchJavacWarnings.txt 21




{color:red}-1 overall{color}.  

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.




==
==
Finished build.
==
==

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 {code}
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


  Labels: patch  (was: )
Target Version/s: 2.0.2-alpha
  Status: Patch Available  (was: Open)

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Attachment: HDFS-4309.patch

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530963#comment-13530963
 ] 

ChenFolin commented on HDFS-4309:
-

HDFS-4309.patch

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531910#comment-13531910
 ] 

ChenFolin commented on HDFS-4309:
-

Hi,Aaron T. Myers. According to your suggestion, I re-hit a patch,may it is 
available.

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Status: Patch Available  (was: Open)

renew patch

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.4, 0.23.1, 0.20.205.0
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Attachment: HDFS-4309.patch

May available

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: HDFS-4309.patch, HDFS-4309.patch, jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Attachment: (was: HDFS-4309.patch)

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4309) Multithreaded get through the Cache FileSystem Object to lead LeaseChecker memory leak

2012-12-13 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin updated HDFS-4309:


Attachment: (was: HDFS-4309.patch)

 Multithreaded get through the Cache FileSystem Object to lead LeaseChecker 
 memory leak
 --

 Key: HDFS-4309
 URL: https://issues.apache.org/jira/browse/HDFS-4309
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.20.205.0, 0.23.1, 0.23.4, 2.0.1-alpha, 2.0.2-alpha
Reporter: MaWenJin
  Labels: patch
 Attachments: jmap2.log

   Original Estimate: 204h
  Remaining Estimate: 204h

 If multiple threads concurrently execute the following methods will result in 
 the thread fs = createFileSystem (uri, conf) method is called.And create 
 multiple DFSClient, start at the same time LeaseChecker daemon thread, may 
 not be able to use shutdownhook close it after the process, resulting in a 
 memory leak.
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws 
 IOException{
   FileSystem fs = null;
   synchronized (this) {
 fs = map.get(key);
   }
   if (fs != null) {
 return fs;
   }
   //  this is 
   fs = createFileSystem(uri, conf);
   synchronized (this) {  // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is 
 releasing
   fs.close(); // close the new file system
   return oldfs;  // return the old file system
 }
 // now insert the new file system into the map
 if (map.isEmpty()  !clientFinalizer.isAlive()) {
   Runtime.getRuntime().addShutdownHook(clientFinalizer);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean(fs.automatic.close, true)) {
   toAutoClose.add(key);
 }
 return fs;
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira