bq. there was a compaction

There was request for compaction.

bq.  if hbase hbck --repairHoles can fix this kind of thing?

You can try the above command.

As Qiang said, tracing back to the earlier failure would help determine
root cause.

Cheers


On Sun, Aug 10, 2014 at 7:21 PM, Thomas Kwan <[email protected]> wrote:

> Ted,
>
> From the master log, there was a compaction around the time.
>
>
> 2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287]
> client.HBaseAdmin: Trying to compact {ENCODED =>
> 12c9a609765ad0bbd6468d93368f860a, NAME =>
>
> 'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.',
> STARTKEY => '2fd811c2b1d7476efb16499ccb2b823d', ENDKEY =>
> '3328d07989225a29067b7b7981150052'}:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region
>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> is not online
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.compactRegion(HRegionServer.java:3750)
>         at
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19803)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
>         at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
>
>
> Also, hbase hbck shows a lot of errors. In particular, I see
>
> ERROR: Region { meta =>
>
> m_hashes,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.,
> hdfs =>
> hdfs://cluster01/apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a,
> deployed =>  } not deployed on any region server.
> ...
> ERROR: There is a hole in the region chain between
> 2fd811c2b1d7476efb16499ccb2b823d and 3328d07989225a29067b7b7981150052.
> You need to create a new .regioninfo and region dir in hdfs to plug
> the hole.
>
> Looks like the data is there
>
> [hbase@db03 ~]$ hadoop fs -du
> /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a
> 105
> /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/.regioninfo
> 4023827732
>  /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/cf1
> 1773806
> /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/recovered.edits
>
> Wonder if hbase hbck --repairHoles can fix this kind of thing?
>
> thomas
>
> On Sun, Aug 10, 2014 at 5:17 PM, Ted Yu <[email protected]> wrote:
> > bq. it's host dn29.manage.com,60020,1407600154728 is dead but not
> processed
> > yet
> >
> > Can you look back (from 22:50:51) in master log to see what happened to
> > dn29 ?
> >
> > Thanks
> >
> >
> > On Sun, Aug 10, 2014 at 2:51 PM, Thomas Kwan <[email protected]>
> wrote:
> >
> >> Thanks for your help Ted.
> >>
> >> From the master's log, I see
> >>
> >> 2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287]
> >> client.HBaseAdmin: Trying to compact {ENCODED =>
> >> 12c9a609765ad0bbd6468d93368f860a, NAME =>
> >>
> >>
> 'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.',
> >> STARTKEY => '2fd811c2b1d7476efb16499ccb2b823d', ENDKEY =>
> >> '3328d07989225a29067b7b7981150052'}:
> >> org.apache.hadoop.hbase.NotServingRegionException:
> >> org.apache.hadoop.hbase.NotServingRegionException: Region
> >>
> >>
> m_hashes,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> >> is not online
> >>         at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585)
> >>         at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952)
> >>         at
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.compactRegion(HRegionServer.java:3750)
> >>         at
> >>
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19803)
> >>         at
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> >>         at
> >> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> >>
> >>         at
> sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown
> >> Source)
> >>         at
> >>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >>         at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >>         at
> >>
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> >>         at
> >>
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> >>         at
> >>
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
> >>         at
> >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1647)
> >>         at
> >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1623)
> >>         at
> >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1504)
> >>         at
> >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1491)
> >>         at
> >>
> org.apache.hadoop.hbase.generated.master.table_jsp._jspService(table_jsp.java:111)
> >>         at
> >> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
> >>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
> >>         at
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> >>         at
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >>         at
> >>
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> >>         at
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>         at
> >>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> >>         at
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>         at
> >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> >>         at
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>         at
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >>         at
> >>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>         at
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >>         at
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >>         at
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >>         at
> >>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >>         at
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >>         at org.mortbay.jetty.Server.handle(Server.java:326)
> >>         at
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >>         at
> >>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >>         at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> >>         at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >>         at
> >>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >>         at
> >>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >> ...
> >> 2014-08-09 23:11:29,846 INFO  [AM.-pool1-t3] master.AssignmentManager:
> >> Skip assigning {ENCODED => d5887dd2b5897d14a6d2a041fc2ace1f, NAME =>
> >>
> >>
> 'm_data,2f03f0fa374de8af4880ba49401cd441,1406839342141.d5887dd2b5897d14a6d2a041fc2ace1f.',
> >> STARTKEY => '2f03f0fa374de8af4880ba49401cd441', ENDKEY =>
> >> '2fd811c2b1d7476efb16499ccb2b823d'}, we couldn't close it:
> >> {d5887dd2b5897d14a6d2a041fc2ace1f state=FAILED_CLOSE,
> >> ts=1407651089846, server=dn05.manage.com,60020,1407649977124}
> >> ...
> >> 2014-08-10 07:49:17,589 INFO  [RpcServer.handler=237,port=60000]
> >> master.AssignmentManager: Skip assigning
> >>
> >>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.,
> >> it's host dn29.manage.com,60020,1407600154728 is dead but not
> >> processed yet
> >>
> >> And I checked dn29 via hbase UI running at
> >> http://dn29.manage.com:60030/rs-status, looks like there is no regions
> >> on dn29.
> >>
> >> thanks
> >> thomas
> >>
> >>
> >> On Sun, Aug 10, 2014 at 12:28 PM, Ted Yu <[email protected]> wrote:
> >> > Can you check master log to see why
> >> 'm_data,2fd811c2b1d7476efb16499ccb2b823d'
> >> > went offline ?
> >> >
> >> > Thanks
> >> >
> >> >
> >> > On Sun, Aug 10, 2014 at 12:13 PM, Thomas Kwan <[email protected]
> >
> >> > wrote:
> >> >
> >> >> Hi Ted,
> >> >>
> >> >> Hbase version is 0.96.0.2.0
> >> >>
> >> >> Nothing interesting in the hbase log on dn29 and confirmed that
> region
> >> >> server is running on dn29
> >> >>
> >> >> When I do 'get', i see
> >> >>
> >> >> hbase(main):001:0> get 'm_data','2fd811c2b1d7476efb16499ccb2b823d'
> >> >>
> >> >> COLUMN                           CELL
> >> >>
> >> >> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >> >>
> >> >>
> >>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> >> >> is not online
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
> >> >> at
> >> >>
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26925)
> >> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> >> >> at
> >> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> >> >>
> >> >> On Sun, Aug 10, 2014 at 10:32 AM, Ted Yu <[email protected]>
> wrote:
> >> >> > bq.  if I can just rmr stuff under /hbase-unsecure/splitWAL/...
> >> >> >
> >> >> > Please don't.
> >> >> >
> >> >> > Have you checked region server log on dn29.manage.com ?
> >> >> >
> >> >> > What hbase version are you using ?
> >> >> >
> >> >> > Cheers
> >> >> >
> >> >> >
> >> >> > On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan <
> [email protected]
> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> And I have a program that do some read operations and it hangs.
> And
> >> I am
> >> >> >> seeing
> >> >> >>
> >> >> >> 2014-08-10 12:22:05,359 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed all
> >> >> >> cached region locations that map to
> >> >> >> dn29.manage.com,60020,1407600154728
> >> >> >> 2014-08-10 12:22:06,173 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed
> >> >> >> dn29.manage.com:60020 as a location of
> >> >> >>
> >> >> >>
> >> >>
> >>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> >> >> >> for tableName=m_data from cache
> >> >> >> 2014-08-10 12:22:07,180 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed
> >> >> >> dn29.manage.com:60020 as a location of
> >> >> >>
> >> >> >>
> >> >>
> >>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> >> >> >> for tableName=m_data from cache
> >> >> >> 2014-08-10 12:22:09,193 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed
> >> >> >> dn29.manage.com:60020 as a location of
> >> >> >>
> >> >> >>
> >> >>
> >>
> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.
> >> >> >> for tableName=m_data from cache
> >> >> >> 2014-08-10 12:22:09,196 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed all
> >> >> >> cached region locations that map to
> >> >> >> dn29.manage.com,60020,1407600154728
> >> >> >> 2014-08-10 12:22:13,208 DEBUG [main]
> >> >> >> client.HConnectionManager$HConnectionImplementation: Removed all
> >> >> >> cached region locations that map to
> >> >> >> dn29.manage.com,60020,1407600154728
> >> >> >>
> >> >> >> I am seeing the following in the hbase master also
> >> >> >>
> >> >> >> 2014-08-10 10:22:25,016 INFO
> >> >> >> [master02.manage.com
> >> ,60000,1407690402682.splitLogManagerTimeoutMonitor]
> >> >> >> master.SplitLogManager: total tasks = 1 unassigned = 0
> >> >> >> tasks={/hbase-unsecure/splitWAL/WALs%2Fdn29.manage.com
> >> >> >> %2C60020%2C1407600154728-splitting%2Fdn29.manage.com
> >> >> >> %252C60020%252C1407600154728.1407621759364=last_update
> >> >> >> = 1407690428226 last_version = 53 cur_worker_name =
> >> >> >> dn21.manage.com,60020,1407650188526 status = in_progress
> >> incarnation =
> >> >> >> 3 resubmits = 3 batch = installed = 1 done = 0 error = 0}
> >> >> >>
> >> >> >> I wonder if I can just rmr stuff under
> /hbase-unsecure/splitWAL/...
> >> >> >>
> >> >> >> thanks
> >> >> >> thomas
> >> >> >>
> >> >>
> >>
>

Reply via email to