[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147574#comment-14147574 ] Yang Jiandan commented on HDFS-6999: @Jianshi Huang your problem is not the same with us although the phenomenon is that all hbase handlers are blocked, a good new is we also have the same question and have resolved it. now I have attach a patch file inHDFS-7145 . Details is in https://issues.apache.org/jira/browse/HDFS-7145 PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. the following while never break. {code:xml} while (buf.remaining() 0) { int n = ch.read(buf); if (n 0) { throw new IOException(Premature EOF reading from + ch); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147597#comment-14147597 ] Jianshi Huang commented on HDFS-6999: - Looks like it's the same problem, and your HBase triggered it and my Spark program triggered it. I also checked my jars. Looks like remotePeerFactory.newConnectedPeer does set the socketTimeout @Override // RemotePeerFactory public Peer newConnectedPeer(InetSocketAddress addr) throws IOException { Peer peer = null; boolean success = false; Socket sock = null; try { sock = socketFactory.createSocket(); NetUtils.connect(sock, addr, getRandomLocalInterfaceAddr(), dfsClientConf.socketTimeout); peer = TcpPeerServer.peerFromSocketAndKey(sock, getDataEncryptionKey()); success = true; return peer; } finally { if (!success) { IOUtils.cleanup(LOG, peer); IOUtils.closeSocket(sock); } } } Jianshi PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. the following while never break. {code:xml} while (buf.remaining() 0) { int n = ch.read(buf); if (n 0) { throw new IOException(Premature EOF reading from + ch); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147443#comment-14147443 ] Jianshi Huang commented on HDFS-6999: - I'm having the same issue. And it's reproducible. My stacktrace looks like this: Executor task launch worker-3 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599) at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:139) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) I'm running Spark 1.1.0 on HDP 2.1 (HDFS 2.4.0), the task reads a bunch of parquet files. Jianshi PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. the following while never break. {code:xml} while (buf.remaining() 0) { int n = ch.read(buf); if (n 0) { throw new IOException(Premature EOF reading from + ch); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125275#comment-14125275 ] Yang Jiandan commented on HDFS-6999: We can't reproduce stably and don't know exactly the particular combination now. In our Configuration dfs.datanode.transferTo.allowed= ture, So We doube BlockSender may only sends the head of packet and don't send the data part of the packet because of some reasons. PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. the following while never break. {code:xml} while (buf.remaining() 0) { int n = ch.read(buf); if (n 0) { throw new IOException(Premature EOF reading from + ch); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123678#comment-14123678 ] stack commented on HDFS-6999: - Any chance of your having the particular combination that brings on the infinite loop [~yangjiandan]? Can you reproduce at all? Thanks. PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. the following while never break. {code:xml} while (buf.remaining() 0) { int n = ch.read(buf); if (n 0) { throw new IOException(Premature EOF reading from + ch); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6999) PacketReceiver#readChannelFully is in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122300#comment-14122300 ] Yang Jiandan commented on HDFS-6999: the stack is : regionserver60020-largeCompactions-1409055324582 daemon prio=10 tid=0x01080800 nid=0x2c7c runnable [0x601cb000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.readByteBufferDirect0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$400(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainChannel.read(DomainSocket.java:628) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138) - locked 0x00047c41f7e0 (a org.apache.hadoop.hdfs.RemoteBlockReader2) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:682) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:738) - locked 0x0004aaceca60 (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:795) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836) - locked 0x0004aaceca60 (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:392) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:757) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:217) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:76) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1086) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1480) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - 0x00049e162b60 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - 0x0005974a84f0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - 0x00065e45cf58 (a java.util.concurrent.ThreadPoolExecutor$Worker) PacketReceiver#readChannelFully is in an infinite loop -- Key: HDFS-6999 URL: https://issues.apache.org/jira/browse/HDFS-6999 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Yang Jiandan Priority: Critical In our cluster, we found hbase handler may be never return when it reads hdfs file using RemoteBlockReader2, and the hander thread occupys 100% cup. wo found this is because PacketReceiver#readChannelFully is in an infinite loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)