Hello Ted,
I could finally replicate one of the issues below :
1. Wed Nov 18 02:27:36 EST 2015,
org.apache.hadoop.hbase.client.RpcRetryingCaller@1a8bbdc9,
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
java.io.IOException: org.apache.hadoop.hbase.master.TableNamespaceManager isn't
ready to serve at
org.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
at
org.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
at
org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
at
org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:43312)
At the same time, HMaster logs show following line:
2015-11-17 22:27:21,607 WARN [master:ip-172-31-23-41:48470]
master.TableNamespaceManager: Timedout waiting for namespace table to be
assigned.
2015-11-17 22:27:21,607 INFO [master:ip-172-31-23-41:48470] master.HMaster:
Master has completed initialization2015-11-17 22:31:21,616 DEBUG
[ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore]
master.HMaster: Not running balancer because 159 region(s) in transition:
{d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
state=PENDING_OPEN, ts=1447827986817,
server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5
state=PE...2015-11-17 22:36:21,616 DEBUG
[ip-172-31-23-41.us-west-2.compute.internal,48470,1447827964772-BalancerChore]
master.HMaster: Not running balancer because 159 region(s) in transition:
{d93af1e3d8d460cf2ac980ad60ce3f3d={d93af1e3d8d460cf2ac980ad60ce3f3d
state=PENDING_OPEN, ts=1447827986817,
server=ip-172-31-23-41.us-west-2.compute.internal,37544,1447827973069},
83fc50ab0413f4a0e7f71e072ccaa6f5={83fc50ab0413f4a0e7f71e072ccaa6f5 state=PE...
Not sure, what makes it time out. I looked at that code and it seems it tries
to load all the regions for a given table but times out. Not sure if it points
to zookeeper or hdfs problem or some other.
Would this give any clues?
One more thing of interest is that the Hbase client (which shows up the error)
and HMaster machines in this particular case are not time-synced. I notice a
day's gap but I assume that NTP time-sync is only a requirement for Hbase
master/ region servers and not also for their clients.
Thanks,Sumit
From: Ted Yu <[email protected]>
To: Sumit Nigam <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Sunday, November 15, 2015 9:14 PM
Subject: Re: About exceptions
bq. if we increase #retries from our end, is there a chance that it may get
past the issue?
Most likely the chance of getting past the issue would be low without manually
fixing the condition.
For #2, it is a mystery because 0.98 master does not have Procedure V2 in
Apache. What distro are you using ?
For #3, unclean shutdown could be one of the causes. To make further
assessment, log snippet from master concerning the table is desirable.
Cheers
On Sun, Nov 15, 2015 at 2:25 AM, Sumit Nigam <[email protected]> wrote:
Thank you Ted.
I was unaware of both those issues. The issue with these exceptions is that
they are intermittent and do not replicate easily. So, let me see if I can
replicate it with trace enabled. For #1, should retrying be attempted? Or
possibly, if we increase #retries from our end, is there a chance that it may
get past the issue? I like the idea of master having a WAL (HBASE-14190) to
find/ fix such inconsistencies.
#2 That trace showed up in a hbase client.
#3 unclean shutdown is possibly one case? I do not explicitly enable/ disable
tables. So, I assume those reasons may be related to Hbase code? And any advise
on if I can somehow avoid it in first place?
Thanks,Sumit
From: Ted Yu <[email protected]>
To: Sumit Nigam <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Sunday, November 15, 2015 3:34 PM
Subject: Re: About exceptions
Sumit:For #1, I have seen a similar issue (HBASE-14190, though on hbase 1.x
release).If you have debug logging enabled, please pastebin relevant master log
snippet so that we can take a closer look.
For #2, I am bit confused - I didn't find CreateTableProcedure.java in 0.98
branch. To my knowledge, CreateTableProcedure is only in hbase 1 release.Did
you see the stack trace in master log ?
For #3, there could be various reasons a table was not enabled.You can trace
the table assignment in master log, check log from hbase:meta server to see if
you can find some clue.
bq. Hbase fails only after it exhausts its attempts so retrying may not be
helpful?
Your understanding should be correct.
I want to bring your attention to HBASE-12070 which helps you fix ZK
inconsistencies.
Cheers
On Sun, Nov 15, 2015 at 12:29 AM, Sumit Nigam <[email protected]> wrote:
Hi Ted,
Thanks for your reply. I am using Hbase 0.98.14. I have used hbck, but for some
(unknown) reason it has not always resolved inconsistencies.
I have been able to get around these issues so far by deleting ZK entries for
the offending table and restarting Hbase. But I am not sure what causes them in
the first place and if I can avoid those issues through code or not. Also, upon
getting these exceptions is it a good idea to retry the operation. I think
Hbase fails only after it exhausts its attempts so retrying may not be helpful?
Here are 3 logs snippets:
1. TableNamespaceManager isn't ready to serve:
Fri Nov 13 17:47:19 IST 2015,
org.apache.hadoop.hbase.client.RpcRetryingCaller@44726f67,org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):java.io.IOException:
org.apache.hadoop.hbase.master.TableNamespaceManager isn'tready to serve
atorg.apache.hadoop.hbase.master.TableNamespaceManager.getNamespaceTable(TableNamespaceManager.java:112)
atorg.apache.hadoop.hbase.master.TableNamespaceManager.list(TableNamespaceManager.java:211)
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3473)
atorg.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:3367)
2. TableExistsException:
Caused by: org.apache.hadoop.hbase.TableExistsException:
org.apache.hadoop.hbase.TableExistsException: ldmns:exDocStoreat
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:300)at
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:106)at
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:58)...
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)at
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)at
org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:632)at
org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:523)
3. TableNotEnabledException:
Caused by: org.apache.hadoop.hbase.TableNotEnabledException:
ldmns:DataDomain_stage is disabled. at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1139)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:963)
at
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:74)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:833) at
org.apache.hadoop.hbase.client.HTable.get(HTable.java:810) at
org.apache.hadoop.hbase.client.HTable.get(HTable.java:842) at
com.thinkaurelius.titan.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:155)
From: Ted Yu <[email protected]>
To: "[email protected]" <[email protected]>; Sumit Nigam
<[email protected]>
Sent: Sunday, November 15, 2015 10:50 AM
Subject: Re: About exceptions
bq. TableNotEnabledExceptionTableNotFoundExceptionIOException
Can you show log snippets where these exceptions occurred ?Which release of
hbase are you using ?
Have you run hbck to repair the inconsistencies ?
See http://hbase.apache.org/book.html#hbck.in.depth
Cheers
On Sat, Nov 14, 2015 at 8:42 PM, Sumit Nigam <[email protected]>
wrote:
Hi,
There are some exceptions which I face intermittently with Hbase and I thought
some help from experts online can really help me. These are:
TableNotEnabledExceptionTableNotFoundExceptionIOException -
TableNamespaceManager isn't ready to serve
One of the reasons I can see for this seems to be zookeeper and Hbase/ Hdfs
data being out of sync due to an unclean shutdown.
So, my questions are these:
1. Are these exceptions only related to unclean shutdowns?2. Do I need to
explicitly handle them and retry the operation again because they also seem to
indicate that it is some race condition between trying to access a table vs
Hbase enabling them?
Any help is greatly appreciated.
Thanks,Sumit