The message cited is from OpenRegionHandler #tryTransitionFromOpeningToFailedOpen()
'version 1' means the OpenRegionHandler instance was expecting version 1 in corresponding znode. Cheers On Wed, Apr 16, 2014 at 10:29 PM, Tao Xiao <[email protected]> wrote: > BTW, the region server reported: > > 2014-04-16 11:30:31,890 INFO [RS_OPEN_REGION-b05:60020-0] > handler.OpenRegionHandler: Opening of region {ENCODED => > 6886ac98a71a47dc78a9e0ab5b3f07cd, NAME => > 'E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd.', > STARTKEY => '', ENDKEY => '170000346762_20140315'} failed, transitioning > from OPENING to FAILED_OPEN in ZK, expecting version 1 > > Here what does "expecting version 1" indicate? > > > 2014-04-17 13:27 GMT+08:00 Tao Xiao <[email protected]>: > > > Take the region > > > E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cdfor > example. > > > > I checked the master's log and the region server (*b05.jsepc.com > > <http://b05.jsepc.com>*) log, and found that in the master log there are > > just 4 logging lines about that region and the logging time was as early > as > > 2014-04-02. > > > > In the region server's log, there are more logging lines about that > > region, but the logging time is quite recent, say 2014-04-16. It seems > that > > the master has lost control of that region for a long time, but the > region > > server is still managing that region although it cannot open it. > > > > The master log is here <http://pastebin.com/6J6v9tSg>, and the region > > server log is here <http://pastebin.com/fbuu0RpC>. > > > > > > 2014-04-17 9:34 GMT+08:00 Ted Yu <[email protected]>: > > > > You can pick a region which is stuck in transition, find which region > >> server is hosting it and search region server log on that server. > >> > >> By correlating events from master and region server logs, you should see > >> what is happening. > >> > >> > >> On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <[email protected]> > >> wrote: > >> > >> > Actually, open that link and then click on the picture, it will zoom > in > >> and > >> > become quite clear. > >> > > >> > I checked the HMaster UI just now and I am sure that these regions are > >> > always in transition, I suppose there would be some exceptions > >> happening. > >> > How to prevent regions from being in transition for a long time ? > >> > > >> > > >> > 2014-04-17 9:00 GMT+08:00 Ted Yu <[email protected]>: > >> > > >> > > The picture is not very clear. > >> > > I don't see E_MP_DAY_READ having regions in transition. > >> > > > >> > > Anyway, as long as there is region in transition, balancer would not > >> run. > >> > > > >> > > Cheers > >> > > > >> > > > >> > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <[email protected] > > > >> > > wrote: > >> > > > >> > > > Ted, > >> > > > > >> > > > I can see some regions of other tables in transition now , but I'm > >> not > >> > > sure > >> > > > how long have them been in transition and I will check the HBase > >> master > >> > > UI > >> > > > later. Here is the > >> > > > screenshot< > >> > > > > >> > > > >> > > >> > http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png > >> > > > >. > >> > > > From the screenshot, there is a region with state of FAILED_OPEN, > >> which > >> > > is > >> > > > in red, and there are 9 regions in transition for more than 60 > >> seconds. > >> > > > > >> > > > Note that the table whose regions all stay in 2 nodes is > >> E_MP_DAY_READ, > >> > > > while the other tables shown in the screenshot are named as > >> > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, > >> E_MP_DAY_READ_20140324, > >> > > and > >> > > > so on. > >> > > > > >> > > > Thanks. > >> > > > > >> > > > > >> > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <[email protected]>: > >> > > > > >> > > > > bq. found some regions of other tables in transition, not of > this > >> > > table. > >> > > > > > >> > > > > That can explain why "balancer" command returned false. > >> > > > > Are those regions stuck in transition ? > >> > > > > > >> > > > > Cheers > >> > > > > > >> > > > > > >> > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao < > >> [email protected] > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > The command "balance_switch true" returns true, but the > command > >> > > > > "balancer" > >> > > > > > returns false. I checked the HMaster UI and found some regions > >> of > >> > > other > >> > > > > > tables in transition, not of this table. > >> > > > > > > >> > > > > > This table's name is E_MP_DAY_READ, I did grep it in the > master > >> log > >> > > and > >> > > > > > found only the following lines: > >> > > > > > > >> > > > > > 2014-04-15 15:50:59,925 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-1] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > 2014-04-15 15:50:59,926 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-1] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > 2014-04-15 15:50:59,926 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-1] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > 2014-04-15 15:50:59,937 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-2] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > 2014-04-15 15:50:59,938 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-2] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > 2014-04-15 15:50:59,940 INFO > >> > [MASTER_SERVER_OPERATIONS-b03:60000-2] > >> > > > > > handler.ServerShutdownHandler: Skip assigning region > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7. > >> > > > > > because it has been opened in a04.jsepc.com > ,60020,1397548219084 > >> > > > > > > >> > > > > > so few logging lines about it, looks strange ? > >> > > > > > > >> > > > > > > >> > > > > > BTW, I can spread the regions of this table evenly across the > >> whole > >> > > > > cluster > >> > > > > > after I shutdown the two region servers where the regions of > >> this > >> > > table > >> > > > > > resided originally. > >> > > > > > > >> > > > > > > >> > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <[email protected]>: > >> > > > > > > >> > > > > > > Is load balancer enabled ? > >> > > > > > > > >> > > > > > > Can you grep this table in master log and pastebin what you > >> > found ? > >> > > > > > > > >> > > > > > > Cheers > >> > > > > > > > >> > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao < > >> [email protected]> > >> > > > > wrote: > >> > > > > > > > >> > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). > >> One > >> > of > >> > > my > >> > > > > > HBase > >> > > > > > > > tables has 50 regions but I found that the 50 regions all > >> stay > >> > in > >> > > > > just > >> > > > > > > two > >> > > > > > > > nodes, not spread evenly in the 18 nodes. I did not > >> pre-create > >> > > > splits > >> > > > > > so > >> > > > > > > > this table was gradually split into 50 regions itself. > >> > > > > > > > > >> > > > > > > > I'd like to know why all the regions stay in just two > nodes, > >> > not > >> > > > the > >> > > > > 18 > >> > > > > > > > nodes of the cluster, and how to spread the regions evenly > >> > across > >> > > > all > >> > > > > > the > >> > > > > > > > region servers. Thanks. > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
