Hi Jean, Yes, I was able to restart the region server & this is the 1st time i am seeing this issue. Also, the split regions have been transitioned to another RS. But the problem was that this RS was stuck & did not officially go down. Further it also had the .META. table & therefore HBase became unusable. After the restart, things are ok now. I always thought that the region split will happen after a major compaction, but as per the logs, the compaction request comes after a split.
Regards, Skanda On Thu, Aug 8, 2013 at 5:25 PM, Jean-Marc Spaggiari <[email protected] > wrote: > Hi Prasad, > > For a so old version it's a bit difficult to give some recommendations. Are > you able to restart you RegionServer? Or it's stuck offline because of the > issue it faced? Also, was it the first time you faced this issue? Looking > at the stack trace, seems that the region server tried to open the same > region twice, and at the same time, after a compaction or a split. Has this > region been transitioned to another server now? > > JM > > 2013/8/8 Prasad GS <[email protected]> > > > Hi Jean, > > > > We are planning to move to the latest CDH version in a couple of months, > > but until then we have to maintain the product with CDH3u5. If possible, > > can you provide me with some pointers to look into this issue further? > > > > Regards, > > Skanda > > > > > > On Thu, Aug 8, 2013 at 3:57 PM, Jean-Marc Spaggiari < > > [email protected] > > > wrote: > > > > > Hi Prasad, > > > > > > 0.90.6 is a pretty old HBase version, and so CDH3u5 is a pretty old CDH > > > version... > > > > > > Any chance to move to a more recent version? > > > > > > JM > > > > > > 2013/8/8 Prasad GS <[email protected]> > > > > > > > Hi, > > > > > > > > We are using Cloudera CDH3u5 distribution of HBase (0.90.6). The RS > > goes > > > > down suddenly & from the logs we see the following exception in the > > > region > > > > server : > > > > > > > > 2013-08-07 20:36:58,008 INFO > > org.apache.hadoop.hbase.regionserver.Store: > > > > Completed compaction of 18 file(s), new file=hdfs:// > > > > > > > > > > > > > > 192.168.0.29:9000/hbase/UsageHistoryMA/1f50c6795c7753315f1fbc04946753d1/d/3311452476716076182 > > > > , > > > > size=320.2m; total size for store is 320.2m > > > > 2013-08-07 20:36:58,008 INFO > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > completed compaction on region > > > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. after > > 1mins, > > > > 51sec > > > > 2013-08-07 20:36:58,009 INFO > > > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split > > of > > > > region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. > > > > 2013-08-07 20:36:58,010 DEBUG > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Closing UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.: > disabling > > > > compactions & flushes > > > > 2013-08-07 20:36:58,010 DEBUG > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Updates disabled for region > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. > > > > 2013-08-07 20:36:58,010 DEBUG > > org.apache.hadoop.hbase.regionserver.Store: > > > > closed d > > > > 2013-08-07 20:36:58,010 INFO > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Closed UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. > > > > 2013-08-07 20:36:58,029 DEBUG > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Instantiated UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375900618008.13150e07893adb4eded6d4dc98374e9e. > > > > 2013-08-07 20:36:58,031 DEBUG > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Instantiated UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 > > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. > > > > 2013-08-07 20:36:58,038 INFO > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > Offlined parent region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. in META > > > > 2013-08-07 20:36:58,085 DEBUG > > org.apache.hadoop.hbase.regionserver.Store: > > > > loaded hdfs:// > > > > > > > > > > > > > > 192.168.0.29:9000/hbase/UsageHistoryMA/6e9d9b93a9509909ed5c4d9e2bd321a8/d/3311452476716076182.1f50c6795c7753315f1fbc04946753d1 > > > > , > > > > isReference=true, isBulkLoadResult=false, seqid=26966370, > > > > majorCompaction=false > > > > 2013-08-07 20:36:58,087 INFO > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > Onlined UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 > > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.; next > > > > sequenceid=26966371 > > > > 2013-08-07 20:36:58,087 DEBUG > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction > > > > requested for UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 > > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. > because > > > > Region has references on open; priority=99, compaction queue size=18 > > > > 2013-08-07 20:36:58,092 INFO > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > Added daughter UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 > > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. in > > region > > > > .META.,,1, serverInfo=dl360x2807,60020,1374636004119 > > > > 2013-08-07 20:36:58,093 INFO > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running > > > > rollback/cleanup of failed split of > > > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 > > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.; Failed > > > > > > > > > > > > > > dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e > > > > > > > > java.io.IOException: Failed > > > > > > > > > > > > > > dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e > > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:307) > > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:205) > > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:135) > > > > > > > > Caused by: java.util.ConcurrentModificationException > > > > at > > > java.util.SubList.checkForComodification(AbstractList.java:752) > > > > at java.util.SubList.size(AbstractList.java:625) > > > > at java.util.AbstractList.add(AbstractList.java:91) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:75) > > > > > > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346) > > > > at > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2860) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:383) > > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:352) > > > > > > > > 2013-08-07 20:36:58,112 FATAL > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region > > > server > > > > serverName=dl360x2807,60020,1374636004119, load=(requests=91, > > > regions=170, > > > > usedHeap=7213, maxHeap=32730): Abort; we got an error after > > > > point-of-no-return > > > > 2013-08-07 20:36:58,113 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > > > > requests=30, regions=170, stores=171, storefiles=167, > > > > storefileIndexSize=134, memstoreSize=187, mbInMemoryWithoutWAL=0, > > > > numberOfPutsWithoutWAL=0, compactionQueueSize=17, flushQueueSize=0, > > > > usedHeap=6992, maxHeap=32730, blockCacheSize=3028798008, > > > > blockCacheFree=7267346888, blockCacheCount=51548, > > > > blockCacheHitCount=55248138, blockCacheMissCount=3593839, > > > > blockCacheEvictedCount=0, blockCacheHitRatio=93, > > > > blockCacheHitCachingRatio=99 > > > > 2013-08-07 20:36:58,119 INFO > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Abort; > we > > > got > > > > an error after point-of-no-return > > > > 2013-08-07 20:36:58,119 INFO > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > > > > regionserver60020.compactor exiting > > > > 2013-08-07 20:36:59,161 INFO org.apache.hadoop.ipc.HBaseServer: > > Stopping > > > > server on 60020 > > > > > > > > Could someone pls let me know as to why the region split failed & why > > the > > > > RS went down. According to me, the ConcurrentModificationException > > looks > > > > really trivial. > > > > > > > > > > > > Regards, > > > > Prasad > > > > > > > > > >
