Hi Ryan,

Thanks for replying so soon.

Whatever it was, it has stopped happening, so I am breathing normally
again and it is not so urgent anymore.  I need to try figure out what
caused this though.  I get the feeling it is server resource related -
almost like something using the HDD or CPU heavily.  atop did not show
anything unusual, but the 1 regionserver/datanode was sluggish while I was
debugging the problem.  It has stopped being sluggish and it seems too
much of a coincidence that it is sluggish at the same time hbase gave
those errors.  Also, the mention of codec and compression in the logs
makes me thing it is related to CPU rather than HDD.  Syslog and Kernel
logs also reveal nothing unusual.  Any ideas on how to figure out what
happened?

Logs in hadoop seem normal.  Both datanodes are showing the following:

2010-11-10 00:06:48,510 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36783, bytes: 15480, op:
HDFS_READ, cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:48,621 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36784, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:48,688 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36785, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:48,791 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36786, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:48,940 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36787, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,039 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36788, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,110 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36789, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,204 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36790, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,291 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36791, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,375 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36792, bytes: 1548, op:
HDFS_READ, cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,449 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36793, bytes: 516, op: HDFS_READ,
cliID: DFSClient_1620748290, srvID:
DS-1090448426-192.168.2.97-50010-1282311128239, blockid:
blk_3714134476848125077_129818
2010-11-10 00:06:49,555 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.2.97:50010, dest: /192.168.2.97:36794, bytes: 516, op: 


Namenode looks like this:

2010-11-10 00:03:17,947 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.168.2.90
2010-11-10 00:05:47,774 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase dst=null perm=null
2010-11-10 00:05:47,775 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/-ROOT- dst=null pe
rm=null
2010-11-10 00:05:47,775 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/.META. dst=null pe
rm=null
2010-11-10 00:05:47,776 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/ChannelUIDTable ds
t=null perm=null
2010-11-10 00:05:47,777 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex dst=null 
perm=null
2010-11-10 00:05:47,820 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-hostCount
 dst=null perm=null
2010-11-10 00:05:47,820 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexHost
 dst=null perm=null
2010-11-10 00:05:47,864 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexUrlU
ID dst=null perm=null
2010-11-10 00:08:17,953 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.168.2.90
2010-11-10 00:10:43,052 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase dst=null perm=null
2010-11-10 00:10:43,053 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/-ROOT- dst=null pe
rm=null
2010-11-10 00:10:43,054 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/.META. dst=null pe
rm=null
2010-11-10 00:10:43,054 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/ChannelUIDTable ds
t=null perm=null
2010-11-10 00:10:43,056 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex dst=null 
perm=null
2010-11-10 00:10:43,100 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-hostCount
 dst=null perm=null
2010-11-10 00:10:43,101 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexHost
 dst=null perm=null
2010-11-10 00:10:43,143 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root ip=/192.168.2.97 cmd=listStatus src=/hbase/UrlIndex-indexUrlU
ID dst=null perm=null
2010-11-10 00:13:17,960 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.168.2.90


Regards,
Seraph



On 2010/11/10 12:08 AM, "Ryan Rawson" <[email protected]> wrote:

>Looks like you are running into HDFS issues, can you check the
>datanode logs for errors?
>
>-ryan
>
>On Tue, Nov 9, 2010 at 2:06 PM, Seraph Imalia <[email protected]> wrote:
>> Hi,
>>
>> Some more info: That same Region server just showed the following in the
>> logs too - hope this explains it?
>>
>> Regards,
>> Seraph
>>
>> 649681515:java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending
>> remote=/192.168.2.97:50010]
>>  at
>> 
>>org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.jav
>>a:
>> 213)
>>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>  at
>> 
>>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSCl
>>ie
>> nt.java:1848)
>>  at
>> 
>>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1922)
>>  at 
>>org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
>>  at
>> 
>>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded
>>Ra
>> ngeFileInputStream.java:101)
>>  at
>> 
>>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded
>>Ra
>> ngeFileInputStream.java:88)
>>  at
>> 
>>org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bounded
>>Ra
>> ngeFileInputStream.java:81)
>>  at
>> 
>>org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDec
>>om
>> pressorStream.java:121)
>>  at
>> 
>>org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDec
>>om
>> pressorStream.java:66)
>>  at
>> 
>>org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.
>>ja
>> va:74)
>>  at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>>  at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
>>  at
>> 
>>org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1018)
>>  at 
>>org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:966)
>>  at
>> 
>>org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:115
>>9)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScann
>>er
>> .java:58)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:
>>79
>> )
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:
>>23
>> 6)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:
>>10
>> 6)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(H
>>Re
>> gion.java:1915)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.j
>>av
>> a:1879)
>>  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2500)
>>  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2486)
>>  at
>> 
>>org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java
>>:1
>> 733)
>>  at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>>  at
>> 
>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>mp
>> l.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>  at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>>  at
>> 
>>org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>>
>> 2010-11-10 00:03:57,903 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=66.45012MB (69678000), Free=341.48737MB (358075472),
>>Max=407.9375MB
>> (427753472), Counts: Blocks=2147, Access=42032, Hit=39143, Miss=2889,
>> Evictions=0, Evicted=0, Ratios: Hit Ratio=93.12666654586792%, Miss
>> Ratio=6.8733349442481995%, Evicted/Run=NaN
>> 2010-11-10 00:04:57,903 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=69.27812MB (72643376), Free=338.65936MB (355110096),
>>Max=407.9375MB
>> (427753472), Counts: Blocks=2192, Access=43926, Hit=40999, Miss=2927,
>> Evictions=0, Evicted=0, Ratios: Hit Ratio=93.33652257919312%, Miss
>> Ratio=6.663479655981064%, Evicted/Run=NaN
>>
>>
>>
>> On 2010/11/09 11:59 PM, "Seraph Imalia" <[email protected]> wrote:
>>
>>>Hi,
>>>
>>>One of our region servers keeps doing the following - it has only just
>>>started doing this since 40 minutes ago.  Our clients are able to get
>>>data
>>>from hBase, but after a short while, threads lock up and they start
>>>waiting indefinitely for data to be returned.  What is wrong? - What do
>>>we
>>>do? - I am desperate, please help as quick as you can.
>>>
>>>Regards,
>>>Seraph
>>>
>>>2010-11-09 23:49:59,102 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:49:59,159 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:49:59,224 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:49:59,226 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:00,269 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:00,730 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:01,157 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:06,916 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:06,917 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:06,917 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:06,918 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:09,106 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:09,106 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:18,271 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:20,924 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:23,151 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:33,792 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:33,793 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:44,161 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:52,489 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:50:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=25.640144MB (26885640), Free=382.29736MB (400867832),
>>>Max=407.9375MB
>>>(427753472), Counts: Blocks=1493, Access=31181, Hit=28954, Miss=2227,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.85783171653748%, Miss
>>>Ratio=7.142169773578644%, Evicted/Run=NaN
>>>2010-11-09 23:50:57,996 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:51:31,922 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:51:31,923 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:51:31,924 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:51:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=28.028427MB (29389936), Free=379.90906MB (398363536),
>>>Max=407.9375MB
>>>(427753472), Counts: Blocks=1531, Access=31277, Hit=29008, Miss=2269,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.74546504020691%, Miss
>>>Ratio=7.254531979560852%, Evicted/Run=NaN
>>>2010-11-09 23:52:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=31.233871MB (32751088), Free=376.7036MB (395002384),
>>>Max=407.9375MB
>>>(427753472), Counts: Blocks=1582, Access=31483, Hit=29168, Miss=2315,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.64682531356812%, Miss
>>>Ratio=7.353174686431885%, Evicted/Run=NaN
>>>2010-11-09 23:53:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=34.532898MB (36210368), Free=373.4046MB (391543104),
>>>Max=407.9375MB
>>>(427753472), Counts: Blocks=1635, Access=31612, Hit=29246, Miss=2366,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.5154983997345%, Miss
>>>Ratio=7.484499365091324%, Evicted/Run=NaN
>>>2010-11-09 23:54:21,831 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:54:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=37.375MB (39190528), Free=370.5625MB (388562944), Max=407.9375MB
>>>(427753472), Counts: Blocks=1681, Access=31761, Hit=29344, Miss=2417,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.39003658294678%, Miss
>>>Ratio=7.609961926937103%, Evicted/Run=NaN
>>>2010-11-09 23:55:45,289 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:55:45,289 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:55:48,079 INFO org.apache.hadoop.io.compress.CodecPool:
>>>Got
>>>brand-new decompressor
>>>2010-11-09 23:55:57,903 DEBUG
>>>org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>Total=40.266388MB (42222368), Free=367.6711MB (385531104),
>>>Max=407.9375MB
>>>(427753472), Counts: Blocks=1728, Access=33834, Hit=31364, Miss=2470,
>>>Evictions=0, Evicted=0, Ratios: Hit Ratio=92.69965291023254%, Miss
>>>Ratio=7.300348579883575%, Evicted/Run=NaN
>>>
>>>
>>>
>>
>>
>>
>>



Reply via email to