Hi Otis  thanks for reply,
servers are identical in terms of hardware, jvm.
right now I cannot afford to restart my any machines, it's in the
production environment :D.
I will give a shot for some other clusters some time later.

On Wed, Sep 26, 2012 at 11:50 AM, Otis Gospodnetic <
[email protected]> wrote:

> Hi,
>
> And the servers are identical in terms of hardware, JVM, etc.?
> What does your monitoring tool show is different on problematic
> machines, other than the load?  How's JVM heap and GC looking?  Have a
> look at SPM for HBase (see sig).  You could install the agent/client
> on the problematic servers + a couple non-problematic servers and
> visually compare a bunch of different HBase/JVM/OS metrics.
>
> HTH
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Tue, Sep 25, 2012 at 11:45 PM, Yusup Ashrap <[email protected]> wrote:
> > Hi all , I am new to hbase. I have a 30+ nodes sized cluster running up
> in
> > the production environment.
> > problem is that several nodes  of cluster suffer  from occassional  high
> > loads( 30-60) .twice or three times a day and it happens every day.
> > It's kinda emergeny situation for me now and I dont have too much time to
> > dive in hbase to figure out what's wrong with my cluster.
> > I hope someone help me  point out  what is or could be  wrong with my
> > cluster or any steps to how to find out the problem , thanks
> >
> >
> > *hbase version*:0.90.2, r
> > *region server metrics*:
> > request=3256.1, readRequest=26313.0, readDataSize=2173227.0,
> > failedReadRequest=0.0, writeRequest=6253.0, writeDataSize=4273037.0,
> > failedWriteRequest=0.0, readResponseTime=29935.0,
> writeResponseTime=9298.0,
> > memStoreHitCount=0.0, memStoreMissCount=16413.0, rpcRequestCount=31877.0,
> > rpcRequestTime=790.0, aliveHandlerNum=200.0, aliveReaderNum=10.0,
> > handlerQueueSize=0.0, regions=338, stores=426, storefiles=354,
> > storefileSize=108349, storefileIndexSize=1116, memstoreSize=1550,
> > compactionQueueSize=0, flushQueueSize=0, usedHeap=10217, maxHeap=15872,
> > blockCacheSize=5122994416, blockCacheFree=1534205200,
> > blockCacheCount=71481, blockCacheHitCount=36328173832,
> > metaBlockCacheHitCount=23066398881, dataBlockCacheHitCount=13261774951,
> > blockCacheMissCount=2133761468, metaBlockCacheMissCount=219363,
> > dataBlockCacheMissCount=2133542105, blockCacheEvictedCount=1634548925,
> > blockCacheHitRatio=94, metaBlockCacheHitRatio=99,
> > dataBlockCacheHitRatio=86, blockCacheHitCachingRatio=95,
> > hdfsBlocksLocalityIndex=86
> >
> > here is my region server( load avg 30+ )'s log:
> >
> > 2012-09-26 11:04:50,185 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Added
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/232d346a258d160bc61618bafe91f047/info/865605993795928725,
> > entries=79472, sequenceid=11417919090, memsize=21.7m, filesize=1.2m
> > 2012-09-26 11:04:50,187 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished memstore flush of ~21.7m for region
> >
> table_a,10396613269\x01NLKMLMHFGFOOO\x01NLKMLMHFGFLFM\x01AntispamPunisher\x01011\x01000000,1345016652050.232d346a258d160bc61618bafe91f047.
> > in 752ms, sequenceid=11417919090, compaction requested=true
> > 2012-09-26 11:04:50,187 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for
> >
> table_a,10396613269\x01NLKMLMHFGFOOO\x01NLKMLMHFGFLFM\x01AntispamPunisher\x01011\x01000000,1345016652050.232d346a258d160bc61618bafe91f047.
> > because regionserver60020.cacheFlusher; priority=16, compaction queue
> > size=19
> > 2012-09-26 11:05:01,431 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.93 MB of total=5.27 GB
> > 2012-09-26 11:05:01,505 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.96 MB, total=4.65 GB, single=1.43 GB, multi=3.79 GB,
> > memory=0 KB
> > 2012-09-26 11:05:01,544 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor
> > 2012-09-26 11:05:21,256 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.92 MB of total=5.27 GB
> > 2012-09-26 11:05:21,333 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.94 MB, total=4.65 GB, single=1.43 GB, multi=3.79 GB,
> > memory=0 KB
> > 2012-09-26 11:05:40,451 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.93 MB of total=5.27 GB
> > 2012-09-26 11:05:40,515 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.99 MB, total=4.65 GB, single=1.42 GB, multi=3.8 GB,
> > memory=0 KB
> > 2012-09-26 11:05:59,978 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.93 MB of total=5.27 GB
> > 2012-09-26 11:06:00,051 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.99 MB, total=4.65 GB, single=1.42 GB, multi=3.8 GB,
> > memory=0 KB
> > 2012-09-26 11:06:07,836 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new decompressor
> > 2012-09-26 11:06:18,443 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.91 MB of total=5.27 GB
> > 2012-09-26 11:06:18,505 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.93 MB, total=4.65 GB, single=1.43 GB, multi=3.79 GB,
> > memory=0 KB
> > 2012-09-26 11:06:38,674 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 634.89 MB of total=5.27 GB
> > 2012-09-26 11:06:38,739 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=634.93 MB, total=4.65 GB, single=1.43 GB, multi=3.79 GB,
> > memory=0 KB
> > 2012-09-26 11:06:50,616 INFO
> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
> > syncFs -- HDFS-200
> > 2012-09-26 11:06:50,630 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Roll
> >
> /hbase/.logs/my226021.cm4,60020,1333724247105/my226021.cm4%3A60020.1348628673307,
> > entries=81412, filesize=63758937. New hlog
> >
> /hbase/.logs/my226021.cm4,60020,1333724247105/my226021.cm4%3A60020.1348628810604
> > 2012-09-26 11:06:50,659 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Too many hlogs: logs=33, maxlogs=32; forcing flush of 45 regions(s):
> > 016138b1500faade07bc92694f222dd0, 064d4407646b317e7cfd4c35efa0ecfd,
> > 07a185ab3fd56c75ee51dadc75911d55, 0842a4e65aea6d65c1d3ce4425ab8ef0,
> > 0cfafa661d1819b4e81a4729c89df010, 1c435cddad229f4bf8283f4642cb5f0d,
> > 293f2e6c4925f65e370015784a1392a9, 363e160b42f4025a58f074260f1161ce,
> > 37e28e04133dfd3a74a6acb7ee5a45f8, 38766d1c77e373fb476fa9854443531c,
> > 3f4a868b7233ec6fe8be88e3c2ca73ba, 45cf01b2f8744b033087281308862859,
> > 468933ece8913bd7e06c2a6cbd9dc5e6, 480cc150655848e55217b3c63c2ea749,
> > 4c4c7beee4eeb8e3d4d1e689aa16ae39, 5045e2289406a2ebe6fda7d15cd64e4d,
> > 519a99c62386004dc6fb22308c6b0ac0, 578a6c0d02b8cc0807e60b452c1e5add,
> > 5c130a1e57b5dc85b40bc7dd10b4aa9b, 6259605302af1b5ef3accd2c34aeecd7,
> > 685370c4349d78fd04efbe071183cc0e, 69080c7e1d14fefccc5e106e955e405d,
> > 6d2d97fcef2886ce2936cbf611648de7, 855e3f6a7eaa27b6a6952a33c06f47c9,
> > 875b26219f934896756b3f80fd7e7a75, 8fd673d4475a0294845a8681253432ff,
> > 9993f31115e8e59450f833771b03a95e, a0cb8dd41fa75f46fc9917c53f7b3b92,
> > a6b0b04c13f22dd20b5fd5c3a0db5270, a8d401355719a12d9f7d2881a4be9fc8,
> > ad6625d0afe0199eb7ba05c07f0a87a3, b845d168b3ab52fe5ae9e2e4366098fc,
> > b9f1139ea5a2e9b512a3e1a173212d70, bec996064a36cc306af18f771965679c,
> > c744a27faf49953ea5aa17708e4aefdd, cab712596aeaa41fc6465696ed4ad74a,
> > cf83ec10220cd516ba55a0b2a5c2f711, d00c2663772b4f453a4c946032de66df,
> > d7e596ea520902f4ed35ce28449a0593, d9058055eab307828c4bb41fab664ab5,
> > e43b26752173becfdffbbc75c70f849e, ecc4925b7791123cc452a39c542cbf9f,
> > efbb1004f1b68c98350fbe38b343a308, f1288916c8b877ccf6cac252e1af0a18,
> > fc5446f5a44ca115ba63a21dca420c64
> > 2012-09-26 11:06:50,659 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Started memstore flush for
> >
> table_a,10904703985\x01NLKMHOJIOHOOO\x01NLKMHOJIOHHIM\x01PriceDispPunisher\x01031\x01000000,1345202033439.016138b1500faade07bc92694f222dd0.,
> > current region memstore size 24.9m
> > 2012-09-26 11:06:50,659 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished snapshotting, commencing flushing stores
> > 2012-09-26 11:06:51,389 INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
> >
> (hdfs://hdfs_xxx:9516/hbase/table_a/016138b1500faade07bc92694f222dd0/.tmp/4855940649973018735):
> > 13.4k, 5703/11406 (50%)
> > 2012-09-26 11:06:52,055 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Renaming flushed file at
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/016138b1500faade07bc92694f222dd0/.tmp/4855940649973018735
> > to
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/016138b1500faade07bc92694f222dd0/info/6325203002828183467
> > 2012-09-26 11:06:52,070 INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded row bloom
> > filter metadata for
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/016138b1500faade07bc92694f222dd0/info/6325203002828183467
> > 2012-09-26 11:06:52,070 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Added
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/016138b1500faade07bc92694f222dd0/info/6325203002828183467,
> > entries=91248, sequenceid=11417991467, memsize=24.9m, filesize=1.4m
> > 2012-09-26 11:06:52,071 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished memstore flush of ~24.9m for region
> >
> table_a,10904703985\x01NLKMHOJIOHOOO\x01NLKMHOJIOHHIM\x01PriceDispPunisher\x01031\x01000000,1345202033439.016138b1500faade07bc92694f222dd0.
> > in 1412ms, sequenceid=11417991467, compaction requested=true
> > 2012-09-26 11:06:52,071 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for
> >
> table_a,10904703985\x01NLKMHOJIOHOOO\x01NLKMHOJIOHHIM\x01PriceDispPunisher\x01031\x01000000,1345202033439.016138b1500faade07bc92694f222dd0.
> > because regionserver60020.cacheFlusher; priority=17, compaction queue
> > size=20
> > 2012-09-26 11:06:52,071 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Started memstore flush for
> >
> table_a,14095325123\x01NLKLKJFHIOOOO\x01NLKLKJFHIOHII\x01AntispamPunisher\x01001\x01000000,1345106676203.064d4407646b317e7cfd4c35efa0ecfd.,
> > current region memstore size 18.0m
> > 2012-09-26 11:06:52,071 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished snapshotting, commencing flushing stores
> > 2012-09-26 11:06:52,599 INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
> >
> (hdfs://hdfs_xxx:9516/hbase/table_a/064d4407646b317e7cfd4c35efa0ecfd/.tmp/3436108128903667477):
> > 9.7k, 4124/8248 (50%)
> > 2012-09-26 11:06:52,622 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Renaming flushed file at
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/064d4407646b317e7cfd4c35efa0ecfd/.tmp/3436108128903667477
> > to
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/064d4407646b317e7cfd4c35efa0ecfd/info/2027379172809216464
> > 2012-09-26 11:06:52,637 INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded row bloom
> > filter metadata for
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/064d4407646b317e7cfd4c35efa0ecfd/info/2027379172809216464
> > 2012-09-26 11:06:52,637 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Added
> >
> hdfs://hdfs_xxx:9516/hbase/table_a/064d4407646b317e7cfd4c35efa0ecfd/info/2027379172809216464,
> > entries=65984, sequenceid=11417992318, memsize=18.0m, filesize=1.0m
>



-- 
*Best Regards*
*===================*
*Yusup Ashrap*
*cell:18611205204*
*--------------------------------------*
*do or don't, the is no try.*
*--------------------------------------*

Reply via email to