[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations

2023-05-11 Thread Dheeren Beborrtha (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721964#comment-17721964
 ] 

Dheeren Beborrtha commented on HDFS-12737:
--

We are observing this issue in Hbase cluster of around 75 RSs experiencing the 
issue where Region Server is littered with following logs:
{noformat}
2023-05-09 18:47:46,092 WARN  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=3,port=16020] hdfs.DFSClient: 
Connection failure: Failed to connect to 
hbase1wn41-0.subnetpoc1.vcn12231050.oraclevcn.com/10.1.64.234:1019 for file 
/apps/hbase/data/data/default/usertable2/fe172ff893d8afcf20c008e3765077da/cf/921cfad177b0434a957079cd4506c834
 for block 
BP-1395570538-10.1.21.157-1682117242080:blk_1093623349_19885353:org.apache.hadoop.net.ConnectTimeoutException:
 6 millis timeout while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=hbase1wn41-0.subnetpoc1.vcn12231050.oraclevcn.com/10.1.64.234:1019]
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=hbase1wn41-0.subnetpoc1.vcn12231050.oraclevcn.com/10.1.64.234:1019]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:589)
        at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3033)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:829)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:754)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:381)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1199)
        at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1151)
        at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1511)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1475)
        at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:98)
        at 
org.apache.hadoop.hbase.io.util.BlockIOUtils.preadWithExtra(BlockIOUtils.java:233)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1456)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1679)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1490)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1308)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:318)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:659)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:612)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:306)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:214)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:408)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:253)
        at 
org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2100)
        at 
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2091)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:7049)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:7029)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:3043)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3023)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3005)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2999)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2614)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2538)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45945)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:384)
        at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131){noformat}
|| ||
|[root@hbase1wn61-0 ~]# netstat -nat \| awk '\{print $6}' \| sort \| uniq -c \| 
sort -n
      1 established)
      1 Foreign
      1 SYN_RECV
      2 FIN_WAIT1
      2 

[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM

2013-10-21 Thread Dheeren Beborrtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801113#comment-13801113
 ] 

Dheeren Beborrtha commented on HDFS-3752:
-

Any ETA on this fix?

 BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace 
 at ANN in case of BKJM
 ---

 Key: HDFS-3752
 URL: https://issues.apache.org/jira/browse/HDFS-3752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.0.0-alpha
Reporter: Vinay
Assignee: Todd Lipcon

 1. do {{saveNameSpace}} in ANN node by entering into safemode
 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY
 3. Now StandBy NN will not able to copy the fsimage_txid from ANN
 This is because, SNN not able to find the next txid (txid+1) in shared 
 storage.
 Just after {{saveNameSpace}} shared storage will have the new logsegment with 
 only START_LOG_SEGEMENT edits op.
 and BookKeeper will not be able to read last entry from inprogress ledger.



--
This message was sent by Atlassian JIRA
(v6.1#6144)