Can you check master log to see why 'm_data,2fd811c2b1d7476efb16499ccb2b823d' went offline ?
Thanks On Sun, Aug 10, 2014 at 12:13 PM, Thomas Kwan <[email protected]> wrote: > Hi Ted, > > Hbase version is 0.96.0.2.0 > > Nothing interesting in the hbase log on dn29 and confirmed that region > server is running on dn29 > > When I do 'get', i see > > hbase(main):001:0> get 'm_data','2fd811c2b1d7476efb16499ccb2b823d' > > COLUMN CELL > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region > > m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. > is not online > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26925) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > On Sun, Aug 10, 2014 at 10:32 AM, Ted Yu <[email protected]> wrote: > > bq. if I can just rmr stuff under /hbase-unsecure/splitWAL/... > > > > Please don't. > > > > Have you checked region server log on dn29.manage.com ? > > > > What hbase version are you using ? > > > > Cheers > > > > > > On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan <[email protected]> > > wrote: > > > >> And I have a program that do some read operations and it hangs. And I am > >> seeing > >> > >> 2014-08-10 12:22:05,359 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed all > >> cached region locations that map to > >> dn29.manage.com,60020,1407600154728 > >> 2014-08-10 12:22:06,173 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed > >> dn29.manage.com:60020 as a location of > >> > >> > m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. > >> for tableName=m_data from cache > >> 2014-08-10 12:22:07,180 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed > >> dn29.manage.com:60020 as a location of > >> > >> > m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. > >> for tableName=m_data from cache > >> 2014-08-10 12:22:09,193 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed > >> dn29.manage.com:60020 as a location of > >> > >> > m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. > >> for tableName=m_data from cache > >> 2014-08-10 12:22:09,196 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed all > >> cached region locations that map to > >> dn29.manage.com,60020,1407600154728 > >> 2014-08-10 12:22:13,208 DEBUG [main] > >> client.HConnectionManager$HConnectionImplementation: Removed all > >> cached region locations that map to > >> dn29.manage.com,60020,1407600154728 > >> > >> I am seeing the following in the hbase master also > >> > >> 2014-08-10 10:22:25,016 INFO > >> [master02.manage.com,60000,1407690402682.splitLogManagerTimeoutMonitor] > >> master.SplitLogManager: total tasks = 1 unassigned = 0 > >> tasks={/hbase-unsecure/splitWAL/WALs%2Fdn29.manage.com > >> %2C60020%2C1407600154728-splitting%2Fdn29.manage.com > >> %252C60020%252C1407600154728.1407621759364=last_update > >> = 1407690428226 last_version = 53 cur_worker_name = > >> dn21.manage.com,60020,1407650188526 status = in_progress incarnation = > >> 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} > >> > >> I wonder if I can just rmr stuff under /hbase-unsecure/splitWAL/... > >> > >> thanks > >> thomas > >> >
