I'm running HBase 1.4.4 on EMR. In following your suggestions I realized that the master is trying to assign the regions to dead/non-existant region servers. While trying to fix this problem I had killed the EMR cluster and started a new one. It's still trying to assign some regions to those region servers in the previous cluster. I tried to manually move one of the regions to a good region server but I'm getting 'ERROR: No route to host' when I try to close the region.

I've tried nuking the /hbase directory in Zookeeper but that didn't seem to help so I'm not sure where it's getting these references from.

-Austin


On 09/30/2018 02:38 PM, Josh Elser wrote:
First off: You're on EMR? What version of HBase you're using? (Maybe Zach or Stephen can help here too). Can you figure out the RegionServer(s) which are stuck opening these PENDING_OPEN regions? Can you get a jstack/thread-dump from those RS's?

In terms of how the system is supposed to work: the PENDING_OPEN state for a Region "R" means: the active Master has asked a RegionServer to open R. That RS should have an active thread which is trying to open R. Upon success, the state of R will move from PENDING_OPEN to OPEN. Otherwise, the Master will try to assign R again.

In absence of any custom coprocessors (including Phoenix), this would mean some subset of RegionServers are in a bad state. Figuring out what those RS's are trying to do will be the next step in figuring out why they're stuck like that. It might be obvious from the UI, or you might have to look at hbase:meta or the master log to figure it out.

One caveat, it's possible that the Master is just not doing the right thing as described above. If the steps described above don't seem to be matching what your system is doing, you might have to look closer at the Master log. Make sure you have DEBUG on to get anything of value out of the system.

On 9/30/18 1:43 PM, Austin Heyne wrote:
I'm having a strange problem that my usual bag of tricks is having trouble sorting out. On Friday queries stoped returning for some reason. You could see them come in and there would be a resource utilization spike that would fade out after an appropriate amount of time, however, the query would never actually return. This could be related to our client code but I wasn't able to dig into it since this was the middle of the day on a production system. Since this had happened before and bouncing HBase cleared it up, I proceeded to disable tables and restart HBase. Upon bringing HBase backup a few thousand regions are stuck in PENDING_OPEN state and refuse to move from that state. I've run hbck -repair a number of times under a few conditions (even the offline repair), have deleted everything out of /hbase in zookeeper and even migrated the cluster to new servers (EMR) with no luck. When I spin HBase up the regions are already at PENDING_OPEN even though the tables are offline.

Any ideas on what's going on here would be a huge help.

Thanks,
Austin


--
Austin L. Heyne

Reply via email to