First off: You're on EMR? What version of HBase you're using? (Maybe
Zach or Stephen can help here too). Can you figure out the
RegionServer(s) which are stuck opening these PENDING_OPEN regions? Can
you get a jstack/thread-dump from those RS's?
In terms of how the system is supposed to work: the PENDING_OPEN state
for a Region "R" means: the active Master has asked a RegionServer to
open R. That RS should have an active thread which is trying to open R.
Upon success, the state of R will move from PENDING_OPEN to OPEN.
Otherwise, the Master will try to assign R again.
In absence of any custom coprocessors (including Phoenix), this would
mean some subset of RegionServers are in a bad state. Figuring out what
those RS's are trying to do will be the next step in figuring out why
they're stuck like that. It might be obvious from the UI, or you might
have to look at hbase:meta or the master log to figure it out.
One caveat, it's possible that the Master is just not doing the right
thing as described above. If the steps described above don't seem to be
matching what your system is doing, you might have to look closer at the
Master log. Make sure you have DEBUG on to get anything of value out of
the system.
On 9/30/18 1:43 PM, Austin Heyne wrote:
I'm having a strange problem that my usual bag of tricks is having
trouble sorting out. On Friday queries stoped returning for some reason.
You could see them come in and there would be a resource utilization
spike that would fade out after an appropriate amount of time, however,
the query would never actually return. This could be related to our
client code but I wasn't able to dig into it since this was the middle
of the day on a production system. Since this had happened before and
bouncing HBase cleared it up, I proceeded to disable tables and restart
HBase. Upon bringing HBase backup a few thousand regions are stuck in
PENDING_OPEN state and refuse to move from that state. I've run hbck
-repair a number of times under a few conditions (even the offline
repair), have deleted everything out of /hbase in zookeeper and even
migrated the cluster to new servers (EMR) with no luck. When I spin
HBase up the regions are already at PENDING_OPEN even though the tables
are offline.
Any ideas on what's going on here would be a huge help.
Thanks,
Austin