Hi Ryan, Thanks for replying so soon.
Whatever it was, it has stopped happening, so I am breathing normally again and it is not so urgent anymore. I need to try figure out what caused this though. I get the feeling it is server resource related - almost like something using the HDD or CPU heavily. atop did not show anything unusual, but the 1 regionserver/datanode was sluggish while I was debugging the problem. It has stopped being sluggish and it seems too much of a coincidence that it is sluggish at the same time hbase gave those errors. Also, the mention of codec and compression in the logs makes me thing it is related to CPU rather than HDD. Syslog and Kernel logs also reveal nothing unusual. Any ideas on how to figure out what happened? Logs in hadoop seem normal. Both datanodes are showing the following: 2010-11-10 00:06:48,510 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36783, bytes: 15480, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:48,621 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36784, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:48,688 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36785, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:48,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36786, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:48,940 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36787, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,039 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36788, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,110 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36789, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,204 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36790, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,291 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36791, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,375 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36792, bytes: 1548, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,449 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36793, bytes: 516, op: HDFS_READ, cliID: DFSClient_1620748290, srvID: DS-1090448426-192.168.2.97-50010-1282311128239, blockid: blk_3714134476848125077_129818 2010-11-10 00:06:49,555 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.2.97:50010, dest: /192.168.2.97:36794, bytes: 516, op: Namenode looks like this: 2010-11-10 00:03:17,947 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.2.90 2010-11-10 00:05:47,774 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase dst=null perm=null 2010-11-10 00:05:47,775 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/-ROOT- dst=null pe rm=null 2010-11-10 00:05:47,775 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/.META. dst=null pe rm=null 2010-11-10 00:05:47,776 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/ChannelUIDTable ds t=null perm=null 2010-11-10 00:05:47,777 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex dst=null perm=null 2010-11-10 00:05:47,820 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-hostCount dst=null perm=null 2010-11-10 00:05:47,820 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexHost dst=null perm=null 2010-11-10 00:05:47,864 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexUrlU ID dst=null perm=null 2010-11-10 00:08:17,953 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.2.90 2010-11-10 00:10:43,052 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase dst=null perm=null 2010-11-10 00:10:43,053 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/-ROOT- dst=null pe rm=null 2010-11-10 00:10:43,054 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/.META. dst=null pe rm=null 2010-11-10 00:10:43,054 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/ChannelUIDTable ds t=null perm=null 2010-11-10 00:10:43,056 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex dst=null perm=null 2010-11-10 00:10:43,100 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-hostCount dst=null perm=null 2010-11-10 00:10:43,101 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexHost dst=null perm=null 2010-11-10 00:10:43,143 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexUrlU ID dst=null perm=null 2010-11-10 00:13:17,960 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.2.90 Regards, Seraph On 2010/11/10 12:08 AM, "Ryan Rawson" <[email protected]> wrote: >Looks like you are running into HDFS issues, can you check the >datanode logs for errors? > >-ryan > >On Tue, Nov 9, 2010 at 2:06 PM, Seraph Imalia <[email protected]> wrote: >> Hi, >> >> Some more info: That same Region server just showed the following in the >> logs too - hope this explains it? >> >> Regards, >> Seraph >> >> 649681515:java.net.SocketTimeoutException: 60000 millis timeout while >> waiting for channel to be ready for connect. ch : >> java.nio.channels.SocketChannel[connection-pending >> remote=/192.168.2.97:50010] >> at >> >>org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.jav >>a: >> 213) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) >> at >> >>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSCl >>ie >> nt.java:1848) >> at >> >>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1922) >> at >>org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) >> at >> >>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded >>Ra >> ngeFileInputStream.java:101) >> at >> >>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded >>Ra >> ngeFileInputStream.java:88) >> at >> >>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded >>Ra >> ngeFileInputStream.java:81) >> at >> >>org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDec >>om >> pressorStream.java:121) >> at >> >>org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDec >>om >> pressorStream.java:66) >> at >> >>org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream. >>ja >> va:74) >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:317) >> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) >> at >> >>org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1018) >> at >>org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:966) >> at >> >>org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:115 >>9) >> at >> >>org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScann >>er >> .java:58) >> at >> >>org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java: >>79 >> ) >> at >> >>org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java: >>23 >> 6) >> at >> >>org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java: >>10 >> 6) >> at >> >>org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(H >>Re >> gion.java:1915) >> at >> >>org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.j >>av >> a:1879) >> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2500) >> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2486) >> at >> >>org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java >>:1 >> 733) >> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) >> at >> >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI >>mp >> l.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >> at >> >>org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> >> 2010-11-10 00:03:57,903 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=66.45012MB (69678000), Free=341.48737MB (358075472), >>Max=407.9375MB >> (427753472), Counts: Blocks=2147, Access=42032, Hit=39143, Miss=2889, >> Evictions=0, Evicted=0, Ratios: Hit Ratio=93.12666654586792%, Miss >> Ratio=6.8733349442481995%, Evicted/Run=NaN >> 2010-11-10 00:04:57,903 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=69.27812MB (72643376), Free=338.65936MB (355110096), >>Max=407.9375MB >> (427753472), Counts: Blocks=2192, Access=43926, Hit=40999, Miss=2927, >> Evictions=0, Evicted=0, Ratios: Hit Ratio=93.33652257919312%, Miss >> Ratio=6.663479655981064%, Evicted/Run=NaN >> >> >> >> On 2010/11/09 11:59 PM, "Seraph Imalia" <[email protected]> wrote: >> >>>Hi, >>> >>>One of our region servers keeps doing the following - it has only just >>>started doing this since 40 minutes ago. Our clients are able to get >>>data >>>from hBase, but after a short while, threads lock up and they start >>>waiting indefinitely for data to be returned. What is wrong? - What do >>>we >>>do? - I am desperate, please help as quick as you can. >>> >>>Regards, >>>Seraph >>> >>>2010-11-09 23:49:59,102 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:49:59,159 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:49:59,224 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:49:59,226 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:00,269 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:00,730 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:01,157 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:06,916 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:06,917 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:06,917 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:06,918 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:09,106 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:09,106 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:18,271 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:20,924 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:23,151 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:33,792 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:33,793 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:44,161 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:52,489 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:50:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=25.640144MB (26885640), Free=382.29736MB (400867832), >>>Max=407.9375MB >>>(427753472), Counts: Blocks=1493, Access=31181, Hit=28954, Miss=2227, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.85783171653748%, Miss >>>Ratio=7.142169773578644%, Evicted/Run=NaN >>>2010-11-09 23:50:57,996 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:51:31,922 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:51:31,923 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:51:31,924 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:51:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=28.028427MB (29389936), Free=379.90906MB (398363536), >>>Max=407.9375MB >>>(427753472), Counts: Blocks=1531, Access=31277, Hit=29008, Miss=2269, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.74546504020691%, Miss >>>Ratio=7.254531979560852%, Evicted/Run=NaN >>>2010-11-09 23:52:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=31.233871MB (32751088), Free=376.7036MB (395002384), >>>Max=407.9375MB >>>(427753472), Counts: Blocks=1582, Access=31483, Hit=29168, Miss=2315, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.64682531356812%, Miss >>>Ratio=7.353174686431885%, Evicted/Run=NaN >>>2010-11-09 23:53:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=34.532898MB (36210368), Free=373.4046MB (391543104), >>>Max=407.9375MB >>>(427753472), Counts: Blocks=1635, Access=31612, Hit=29246, Miss=2366, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.5154983997345%, Miss >>>Ratio=7.484499365091324%, Evicted/Run=NaN >>>2010-11-09 23:54:21,831 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:54:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=37.375MB (39190528), Free=370.5625MB (388562944), Max=407.9375MB >>>(427753472), Counts: Blocks=1681, Access=31761, Hit=29344, Miss=2417, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.39003658294678%, Miss >>>Ratio=7.609961926937103%, Evicted/Run=NaN >>>2010-11-09 23:55:45,289 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:55:45,289 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:55:48,079 INFO org.apache.hadoop.io.compress.CodecPool: >>>Got >>>brand-new decompressor >>>2010-11-09 23:55:57,903 DEBUG >>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >>>Total=40.266388MB (42222368), Free=367.6711MB (385531104), >>>Max=407.9375MB >>>(427753472), Counts: Blocks=1728, Access=33834, Hit=31364, Miss=2470, >>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.69965291023254%, Miss >>>Ratio=7.300348579883575%, Evicted/Run=NaN >>> >>> >>> >> >> >> >>
