[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-25 Thread Yang Jiandan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147574#comment-14147574
 ] 

Yang Jiandan commented on HDFS-6999:


@Jianshi Huang your problem is not the same with us although the phenomenon is 
that all hbase handlers  are blocked, a good new is we also have the same 
question and  have resolved it. now I have attach a patch file inHDFS-7145 . 
Details is in https://issues.apache.org/jira/browse/HDFS-7145

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop. 
 the following while never break.
 {code:xml}
 while (buf.remaining()  0) {
   int n = ch.read(buf);
   if (n  0) {
 throw new IOException(Premature EOF reading from  + ch);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-25 Thread Jianshi Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147597#comment-14147597
 ] 

Jianshi Huang commented on HDFS-6999:
-

Looks like it's the same problem, and your HBase triggered it and my Spark 
program triggered it.

I also checked my jars. Looks like remotePeerFactory.newConnectedPeer does set 
the socketTimeout

  @Override // RemotePeerFactory
  public Peer newConnectedPeer(InetSocketAddress addr) throws IOException {
Peer peer = null;
boolean success = false;
Socket sock = null;
try {
  sock = socketFactory.createSocket();
  NetUtils.connect(sock, addr,
getRandomLocalInterfaceAddr(),
dfsClientConf.socketTimeout);
  peer = TcpPeerServer.peerFromSocketAndKey(sock, 
  getDataEncryptionKey());
  success = true;
  return peer;
} finally {
  if (!success) {
IOUtils.cleanup(LOG, peer);
IOUtils.closeSocket(sock);
  }
}
  }



Jianshi

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop. 
 the following while never break.
 {code:xml}
 while (buf.remaining()  0) {
   int n = ch.read(buf);
   if (n  0) {
 throw new IOException(Premature EOF reading from  + ch);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-24 Thread Jianshi Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147443#comment-14147443
 ] 

Jianshi Huang commented on HDFS-6999:
-

I'm having the same issue. And it's reproducible.

My stacktrace looks like this:

Executor task launch worker-3
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at 
parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
at 
parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
at 
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:139)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

I'm running Spark 1.1.0 on HDP 2.1 (HDFS 2.4.0), the task reads a bunch of 
parquet files.

Jianshi

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop. 
 the following while never break.
 {code:xml}
 while (buf.remaining()  0) {
   int n = ch.read(buf);
   if (n  0) {
 throw new IOException(Premature EOF reading from  + ch);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-08 Thread Yang Jiandan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125275#comment-14125275
 ] 

Yang Jiandan commented on HDFS-6999:


We can't reproduce stably and don't know exactly the particular combination 
now. In our Configuration dfs.datanode.transferTo.allowed= ture, So We doube 
BlockSender may only sends the head of packet and don't send the data part of 
the packet because of some reasons. 

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop. 
 the following while never break.
 {code:xml}
 while (buf.remaining()  0) {
   int n = ch.read(buf);
   if (n  0) {
 throw new IOException(Premature EOF reading from  + ch);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123678#comment-14123678
 ] 

stack commented on HDFS-6999:
-

Any chance of your having the particular combination that brings on the 
infinite loop [~yangjiandan]?  Can you reproduce at all?  Thanks.

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop. 
 the following while never break.
 {code:xml}
 while (buf.remaining()  0) {
   int n = ch.read(buf);
   if (n  0) {
 throw new IOException(Premature EOF reading from  + ch);
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop

2014-09-04 Thread Yang Jiandan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122300#comment-14122300
 ] 

Yang Jiandan commented on HDFS-6999:


the stack is :
regionserver60020-largeCompactions-1409055324582 daemon prio=10 
tid=0x01080800 nid=0x2c7c runnable [0x601cb000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocket.readByteBufferDirect0(Native 
Method)
at 
org.apache.hadoop.net.unix.DomainSocket.access$400(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainChannel.read(DomainSocket.java:628)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138)
- locked 0x00047c41f7e0 (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:682)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:738)
- locked 0x0004aaceca60 (a org.apache.hadoop.hdfs.DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:795)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
- locked 0x0004aaceca60 (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:643)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:757)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:136)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:217)
at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:76)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1086)
at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1480)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

   Locked ownable synchronizers:
- 0x00049e162b60 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- 0x0005974a84f0 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- 0x00065e45cf58 (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

 PacketReceiver#readChannelFully is in an infinite loop
 --

 Key: HDFS-6999
 URL: https://issues.apache.org/jira/browse/HDFS-6999
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Yang Jiandan
Priority: Critical

 In our cluster, we found hbase handler may be never return when it reads hdfs 
 file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo 
 found this is because PacketReceiver#readChannelFully is in an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)