RE: Trafodion meta table region in hbase cannot be opened

2016-09-23 Thread Eric Owhadi
Hi Qiao,

For big table you should use 'update statistics for table  on every
column sample’.

This will perform sampling to compute the stats,

That does not explain the problem, but just wanted to let you know,

Regards,
Eric



*From:* 乔彦克 [mailto:qya...@gmail.com]
*Sent:* Friday, September 23, 2016 1:58 AM
*To:* user@trafodion.incubator.apache.org
*Cc:* Liu, Ming (Ming) <ming@esgyn.cn>
*Subject:* Re: Trafodion meta table region in hbase cannot be opened



Hi, Ming,



This time it happened after I did 'update statistics for table  on
every column'  which leading to the HBase regionserver crashed because of
out of memory error. After I restart the regionservers this happened. When
HBase doing the recovery step,  errors like this



"ERROR [RS_OPEN_REGION-hadoop2slave7:60020-2] regionserver.HRegion: Found
decreasing SeqId. PreId=1517
key=TRAFODION._MD_.OBJECTS/836a95deb60624ab3e620e44ae54fe15/1419;
edit=[#edits: 1 = ]

2016-09-23 10:42:33,508 INFO  [RS_OPEN_REGION-hadoop2slave7:60020-2]
transactional.TrxRegionObserver: Trafodion Recovery Region Observer CP:
preWALRestore coprocessor is invoked ... in table TRAFODION._MD_.OBJECTS"



 comes up, each time is the Trafodion meta table TRAFODION._MD_.*



when the* bulkload* leading to this,  I think the reason may be that the
load operation from hive just create hfile rather than write the WALs.

but to the *update statistic *operation, I'm not sure how this happened
cause I am not clear how the update statistic work.





Qiao



Liu, Ming (Ming) <ming@esgyn.cn>于2016年9月23日周五 上午10:46写道:

Hi, Qiao,

before this region fail to open, did you do a bulkload from hive?
I know you hit same issue several times before, but have different java
error stack before. So want to confirm with you.
And you paste the error stack from sqlci, is it possible to find what is
the error stack in Region Server log?

Last time, it was something like below, could you find same issue this
time? So two questions: did you do a bulkload? And could you find the same
error stack?

2016-09-08 16:44:36,327 ERROR [RS_OPEN_REGION-hadoop2slave7:60020-0]
handler.OpenRegionHandler:
Failed open of
region=TRAFODION._MD_.COLUMNS,,1471946223350.b6191867e73d4203d3ac6fad3c860138.,
starting to roll back the global memstore size.
org.apache.hadoop.hbase.DroppedSnapshotException: region:
TRAFODION._MD_.COLUMNS,,1471946223350.b6191867e73d4203d3ac6fad3c860138.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2243)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1972)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3826)
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:969)
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:841)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:814)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5828)
at
org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion.openHRegion(TransactionalRegion.java:101)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5794)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AssertionError: Key \xB9"b*M3c\x00ADMCKID



 /#1:\x01/1473306352163/Put/vlen=8/seqid=1749 followed
by a smaller key \xB9"b*M3c\x00ADMCKID


 /#1:\x01/1473306352163/Put/vlen=8/seqid=4003 in cf #1
at
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:699)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:493)
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:115)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
at
org.apache.hadoop.hbase.regionserver.H

Re: Trafodion meta table region in hbase cannot be opened

2016-09-23 Thread 乔彦克
217)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2197)
> ... 17 more
>
> Ming
> -----Original Message-----
> From: 乔彦克 [mailto:qya...@gmail.com]
> Sent: Friday, September 23, 2016 9:59 AM
> To: d...@trafodion.incubator.apache.org;
> user@trafodion.incubator.apache.org
> Cc: Dave Birdsall <dave.birds...@esgyn.com>
> Subject: Re: Trafodion meta table region in hbase cannot be opened
>
> Thanks for your reply, Dave.
> Scan for  "TRAFODION._MD_.VERSIONS"  doesn't work beacuse the table's
> region is not online and cannot be assigned.
> I have nothing to do but delete all Trafodion tables in HBase and after
> the clearing work, I restart all the service, then the system gets Ok to
> work.
> I've get stuck by such problem several times due to the region failed to
> open. there maybe have some bugs when loading data to Trafodion from Hive,
> I don't know quite clear.
>
> Best Regards,
> Qiao
>
> Dave Birdsall <dave.birds...@esgyn.com>于2016年9月22日周四 下午11:52写道:
>
> > Hi Qiao,
> >
> >
> >
> > If you go into the hbase shell, and do the following:
> >
> >
> >
> > scan “TRAFODION._MD_.VERSIONS”
> >
> >
> >
> > What happens? If HBase is up and working correctly, and if nothing has
> > been corrupted, this should return 3 rows.
> >
> >
> >
> > Dave
> >
> >
> >
> > *From:* 乔彦克 [mailto:qya...@gmail.com]
> > *Sent:* Wednesday, September 21, 2016 8:13 PM
> > *To:* user@trafodion.incubator.apache.org
> > *Cc:* dev <d...@trafodion.incubator.apache.org>
> > *Subject:* Re: Trafodion meta table region in hbase cannot be opened
> >
> >
> >
> > I tried to assign the region by hand in hbase shell, but still cannot
> > open regions.
> >
> > Below is the log errors while do some basic query.
> >
> > SQL>get tables;
> >
> >
> >
> > *** ERROR[1398] Error -704 occured while accessing the hbase
> > subsystem. Fix that error and make sure hbase is up and running. Error
> Details:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=21, exceptions:
> >
> > Thu Sep 22 10:26:02 CST 2016, null, java.net.SocketTimeoutException:
> > callTimeout=60, callDuration=792843: row '' on table
> > 'TRAFODION._MD_.VERSIONS' at
> >
> > region=TRAFODION._MD_.VERSIONS,,1474445084590.0350ce973b1e43e17b2d14ec
> > 72b7f867., hostname=hadoop2slave7,60020,1474442934725, seqNum=2
> >
> >
> >
> >
> > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throw
> > EnrichedException(RpcRetryingCallerWithReadReplicas.java:270)
> >
> >
> > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(Scanne
> > rCallableWithReplicas.java:203)
> >
> >
> > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(Scanne
> > rCallableWithReplicas.java:57)
> >
> >
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(Rp
> > cRetryingCaller.java:200)
> >
> > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:2
> > 94)
> >
> >
> > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner
> > .java:269)
> >
> >
> > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstr
> > uction(ClientScanner.java:141)
> >
> > org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java
> > :136)
> >
> > org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:886)
> >
> >
> > org.apache.hadoop.hbase.client.transactional.TransactionalTable.getSca
> > nner(TransactionalTable.java:813)
> >
> >
> > org.apache.hadoop.hbase.client.transactional.RMInterface.getScanner(RM
> > Interface.java:428)
> >
> > org.trafodion.sql.HTableClient.startScan(HTableClient.java:983)
> >
> > .  [2016-09-22 11:09:54]
> >
> >
> >
> > Best regards,
> >
> > Qiao
> >
> >
> >
> > 乔彦克 <qya...@gmail.com>于2016年9月22日周四 上午11:07写道:
> >
> > Hi, deal group,
> >
> > Yesterday after I restarted the HBase regionservers(cause one
> > is down, I want to restart all the regionservers for load balance),
> > there are
> > 7 Trafodion table regions cannot be opened. And I cannot log in to the
> > Trafodion shell using 'trafci'.
> >
> > Does anyone get stuck by this problem ever?
> >
> > How Can I recover my trafodion to normal state?
> >
> > [image: region not open.png]
> >
> >
> >
> > Any reply is appreciated.
> >
> > Thanks,
> >
> > Qiao
> >
>


RE: Trafodion meta table region in hbase cannot be opened

2016-09-22 Thread Liu, Ming (Ming)
Hi, Qiao,

before this region fail to open, did you do a bulkload from hive?
I know you hit same issue several times before, but have different java error 
stack before. So want to confirm with you.
And you paste the error stack from sqlci, is it possible to find what is the 
error stack in Region Server log?

Last time, it was something like below, could you find same issue this time? So 
two questions: did you do a bulkload? And could you find the same error stack?

2016-09-08 16:44:36,327 ERROR [RS_OPEN_REGION-hadoop2slave7:60020-0] 
handler.OpenRegionHandler:
Failed open of 
region=TRAFODION._MD_.COLUMNS,,1471946223350.b6191867e73d4203d3ac6fad3c860138.,
starting to roll back the global memstore size.
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
TRAFODION._MD_.COLUMNS,,1471946223350.b6191867e73d4203d3ac6fad3c860138.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2243)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1972)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3826)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:969)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:841)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:814)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5828)
at 
org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion.openHRegion(TransactionalRegion.java:101)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5794)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.AssertionError: Key \xB9"b*M3c\x00ADMCKID  
 

 

 
 
/#1:\x01/1473306352163/Put/vlen=8/seqid=1749 followed
by a smaller key \xB9"b*M3c\x00ADMCKID  
 

 

 
 /#1:\x01/1473306352163/Put/vlen=8/seqid=4003 in cf #1
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:699)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:493)
at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:115)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:940)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2217)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2197)
... 17 more

Ming
-Original Message-
From: 乔彦克 [mailto:qya...@gmail.com] 
Sent: Friday, September 23, 2016 9:59 AM
To: d...@trafodion.incubator.apache.org; user@trafodion.incubator.apache.org
Cc: Dave Birdsall <dave.birds...@esgyn.com>
Subject: Re: Trafodion meta table region in hbase cannot be opened

Thanks for your reply, Dave.
Scan for  "TRAFODION._MD_.VERSIONS"  doesn't work beacuse the table's region is 
not online and cannot be assigned.
I have nothing to do but delete all Trafodion tables in HBase and after the 
clearing work, I restart all the service, then the system gets Ok to work.
I've get stuck by such problem several times due to the region failed to open. 
there maybe have some bugs when loading data to Trafodion from Hive, I don't 
know quite clear.

Best Regards,
Qiao

Re: Trafodion meta table region in hbase cannot be opened

2016-09-22 Thread 乔彦克
Thanks for your reply, Dave.
Scan for  "TRAFODION._MD_.VERSIONS"  doesn't work beacuse the table's
region is not online and cannot be assigned.
I have nothing to do but delete all Trafodion tables in HBase and after the
clearing work, I restart all the service, then the system gets Ok to work.
I've get stuck by such problem several times due to the region failed to
open. there maybe have some bugs when loading data to Trafodion from Hive,
I don't know quite clear.

Best Regards,
Qiao

Dave Birdsall <dave.birds...@esgyn.com>于2016年9月22日周四 下午11:52写道:

> Hi Qiao,
>
>
>
> If you go into the hbase shell, and do the following:
>
>
>
> scan “TRAFODION._MD_.VERSIONS”
>
>
>
> What happens? If HBase is up and working correctly, and if nothing has been
> corrupted, this should return 3 rows.
>
>
>
> Dave
>
>
>
> *From:* 乔彦克 [mailto:qya...@gmail.com]
> *Sent:* Wednesday, September 21, 2016 8:13 PM
> *To:* user@trafodion.incubator.apache.org
> *Cc:* dev <d...@trafodion.incubator.apache.org>
> *Subject:* Re: Trafodion meta table region in hbase cannot be opened
>
>
>
> I tried to assign the region by hand in hbase shell, but still cannot open
> regions.
>
> Below is the log errors while do some basic query.
>
> SQL>get tables;
>
>
>
> *** ERROR[1398] Error -704 occured while accessing the hbase subsystem. Fix
> that error and make sure hbase is up and running. Error Details:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=21, exceptions:
>
> Thu Sep 22 10:26:02 CST 2016, null, java.net.SocketTimeoutException:
> callTimeout=60, callDuration=792843: row '' on table
> 'TRAFODION._MD_.VERSIONS' at
>
> region=TRAFODION._MD_.VERSIONS,,1474445084590.0350ce973b1e43e17b2d14ec72b7f867.,
> hostname=hadoop2slave7,60020,1474442934725, seqNum=2
>
>
>
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270)
>
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:203)
>
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)
>
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:294)
>
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:269)
>
>
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:141)
>
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:136)
>
> org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:886)
>
>
> org.apache.hadoop.hbase.client.transactional.TransactionalTable.getScanner(TransactionalTable.java:813)
>
>
> org.apache.hadoop.hbase.client.transactional.RMInterface.getScanner(RMInterface.java:428)
>
> org.trafodion.sql.HTableClient.startScan(HTableClient.java:983)
>
> .  [2016-09-22 11:09:54]
>
>
>
> Best regards,
>
> Qiao
>
>
>
> 乔彦克 <qya...@gmail.com>于2016年9月22日周四 上午11:07写道:
>
> Hi, deal group,
>
> Yesterday after I restarted the HBase regionservers(cause one is
> down, I want to restart all the regionservers for load balance), there are
> 7 Trafodion table regions cannot be opened. And I cannot log in to the
> Trafodion shell using 'trafci'.
>
> Does anyone get stuck by this problem ever?
>
> How Can I recover my trafodion to normal state?
>
> [image: region not open.png]
>
>
>
> Any reply is appreciated.
>
> Thanks,
>
> Qiao
>