Re: About exceptions

Sumit Nigam Wed, 18 Nov 2015 20:59:52 -0800

Hello Ted,
I could finally replicate one of the issues below :
1. Wed Nov 18 02:27:36 EST 2015, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@1a8bbdc9, 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: org.apache.hadoop.hbase.master.TableNamespaceManager isn't 
ready to serve at 
org.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
 at 
org.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
 at 
org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
 at 
org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
 at 
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:43312)



At the same time, HMaster logs show following line:
2015-11-17 22:27:21,607 WARN  [master:ip-172-31-23-41:48470] 
master.TableNamespaceManager: Timedout waiting for namespace table to be 
assigned.
2015-11-17 22:27:21,607 INFO  [master:ip-172-31-23-41:48470] master.HMaster: 
Master has completed initialization2015-11-17 22:31:21,616 DEBUG 
[ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore] 
master.HMaster: Not running balancer because 159 region(s) in transition: 
{d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d 
state=PENDING_OPEN, ts=1447827986817, 
server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069}, 
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5 
state=PE...2015-11-17 22:36:21,616 DEBUG 
[ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore] 
master.HMaster: Not running balancer because 159 region(s) in transition: 
{d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d 
state=PENDING_OPEN, ts=1447827986817, 
server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069}, 
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5 state=PE...

Not sure, what makes it time out. I looked at that code and it seems it tries 
to load all the regions for a given table but times out. Not sure if it points 
to zookeeper or hdfs problem or some other.
Would this give any clues?
One more thing of interest is that the Hbase client (which shows up the error) 
and HMaster machines in this particular case are not time-synced. I notice a 
day's gap but I assume that NTP time-sync is only a requirement for Hbase 
master/ region servers and not also for their clients.
Thanks,Sumit 
      From: Ted Yu <[email protected]>
 To: Sumit Nigam <[email protected]> 
Cc: "[email protected]" <[email protected]>
 Sent: Sunday, November 15, 2015 9:14 PM
 Subject: Re: About exceptions
   
bq. if we increase #retries from our end, is there a chance that it may get 
past the issue?
Most likely the chance of getting past the issue would be low without manually 
fixing the condition.
For #2, it is a mystery because 0.98 master does not have Procedure V2 in 
Apache. What distro are you using ?
For #3, unclean shutdown could be one of the causes. To make further 
assessment, log snippet from master concerning the table is desirable.
Cheers


On Sun, Nov 15, 2015 at 2:25 AM, Sumit Nigam <[email protected]> wrote:

Thank you Ted.
I was unaware of both those issues. The issue with these exceptions is that 
they are intermittent and do not replicate easily. So, let me see if I can 
replicate it with trace enabled. For #1, should retrying be attempted? Or 
possibly, if we increase #retries from our end, is there a chance that it may 
get past the issue? I like the idea of master having a WAL (HBASE-14190) to 
find/ fix such inconsistencies.
#2 That trace showed up in a hbase client. 
#3 unclean shutdown is possibly one case? I do not explicitly enable/ disable 
tables. So, I assume those reasons may be related to Hbase code? And any advise 
on if I can somehow avoid it in first place? 
Thanks,Sumit
      From: Ted Yu <[email protected]>
 To: Sumit Nigam <[email protected]> 
Cc: "[email protected]" <[email protected]> 
 Sent: Sunday, November 15, 2015 3:34 PM
 Subject: Re: About exceptions
   
Sumit:For #1, I have seen a similar issue (HBASE-14190, though on hbase 1.x 
release).If you have debug logging enabled, please pastebin relevant master log 
snippet so that we can take a closer look.
For #2, I am bit confused - I didn't find CreateTableProcedure.java in 0.98 
branch. To my knowledge, CreateTableProcedure is only in hbase 1 release.Did 
you see the stack trace in master log ?
For #3, there could be various reasons a table was not enabled.You can trace 
the table assignment in master log, check log from hbase:meta server to see if 
you can find some clue.
bq. Hbase fails only after it exhausts its attempts so retrying may not be 
helpful?
Your understanding should be correct.
I want to bring your attention to HBASE-12070 which helps you fix ZK 
inconsistencies.
Cheers


On Sun, Nov 15, 2015 at 12:29 AM, Sumit Nigam <[email protected]> wrote:

Hi Ted,
Thanks for your reply. I am using Hbase 0.98.14. I have used hbck, but for some 
(unknown) reason it has not always resolved inconsistencies. 
I have been able to get around these issues so far by deleting ZK entries for 
the offending table and restarting Hbase. But I am not sure what causes them in 
the first place and if I can avoid those issues through code or not. Also, upon 
getting these exceptions is it a good idea to retry the operation. I think 
Hbase fails only after it exhausts its attempts so retrying may not be helpful?

Here are 3 logs snippets:
1. TableNamespaceManager isn't ready to serve:
Fri Nov 13 17:47:19 IST 2015, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@44726f67,org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):java.io.IOException:
 org.apache.hadoop.hbase.master.TableNamespaceManager isn'tready to serve       
 
atorg.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
        
atorg.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
        
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
        
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)


2. TableExistsException:
Caused by: org.apache.hadoop.hbase.TableExistsException: 
org.apache.hadoop.hbase.TableExistsException: ldmns:exDocStoreat 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:300)at
 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:106)at
 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:58)...
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)at
 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)at
 
org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:632)at
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:523)

3. TableNotEnabledException:
Caused by: org.apache.hadoop.hbase.TableNotEnabledException: 
ldmns:DataDomain_stage is disabled. at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1139)
 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:963)
 at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:74)
 at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:833) at 
org.apache.hadoop.hbase.client.HTable.get(HTable.java:810) at 
org.apache.hadoop.hbase.client.HTable.get(HTable.java:842) at 
com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:155)
      From: Ted Yu <[email protected]>
 To: "[email protected]" <[email protected]>; Sumit Nigam 
<[email protected]> 
 Sent: Sunday, November 15, 2015 10:50 AM
 Subject: Re: About exceptions
   
bq. TableNotEnabledExceptionTableNotFoundExceptionIOException
Can you show log snippets where these exceptions occurred ?Which release of 
hbase are you using ?
Have you run hbck to repair the inconsistencies ?
See http://hbase.apache.org/book.html#hbck.in.depth
Cheers


On Sat, Nov 14, 2015 at 8:42 PM, Sumit Nigam <[email protected]> 
wrote:

Hi,
There are some exceptions which I face intermittently with Hbase and I thought 
some help from experts online can really help me. These are:
TableNotEnabledExceptionTableNotFoundExceptionIOException - 
TableNamespaceManager isn't ready to serve

One of the reasons I can see for this seems to be zookeeper and Hbase/ Hdfs 
data being out of sync due to an unclean shutdown. 
So, my questions are these:
1. Are these exceptions only related to unclean shutdowns?2. Do I need to 
explicitly handle them and retry the operation again because they also seem to 
indicate that it is some race condition between trying to access a table vs 
Hbase enabling them?
Any help is greatly appreciated.
Thanks,Sumit

Re: About exceptions

Reply via email to