Re: Disabling a table taking very long time

Nanheng Wu Mon, 28 Feb 2011 14:55:32 -0800

I see, so I should jstack the .META region. I'll do that.

The master log pretty much looks like this: should I grep for
something specific?


11/02/28 22:52:56 INFO master.BaseScanner: RegionManager.rootScanner
scan of 1 row(s) of meta region complete
2011-02-28T22:52:57.138+0000: 263241.014: [GC 263241.014: [ParNew:
19136K->2112K(19136K), 0.0033350 secs] 378606K->36159
6K(641944K) icms_dc=0 , 0.0034330 secs] [Times: user=0.02 sys=0.00,
real=0.00 secs]
2011-02-28T22:53:04.956+0000: 263248.832: [GC 263248.832: [ParNew:
19068K->2112K(19136K), 0.0029690 secs] 378552K->36163
0K(641944K) icms_dc=0 , 0.0030460 secs] [Times: user=0.02 sys=0.00,
real=0.00 secs]
2011-02-28T22:53:12.664+0000: 263256.540: [GC 263256.541: [ParNew:
19136K->2112K(19136K), 0.0037690 secs] 378654K->36164
7K(641944K) icms_dc=0 , 0.0038660 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
2011-02-28T22:53:20.608+0000: 263264.484: [GC 263264.484: [ParNew:
19136K->1810K(19136K), 0.0032600 secs] 378671K->36134
6K(641944K) icms_dc=0 , 0.0033470 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
2011-02-28T22:53:28.278+0000: 263272.154: [GC 263272.154: [ParNew:
18834K->2112K(19136K), 0.0029870 secs] 378370K->36166
1K(641944K) icms_dc=0 , 0.0030670 secs] [Times: user=0.01 sys=0.00,
real=0.01 secs]
2011-02-28T22:53:35.919+0000: 263279.795: [GC 263279.795: [ParNew:
19136K->1747K(19136K), 0.0037090 secs] 378685K->36129
8K(641944K) icms_dc=0 , 0.0037920 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
11/02/28 22:53:36 INFO master.ServerManager: 9 region servers, 0 dead,
average load 760.2222222222222


On Mon, Feb 28, 2011 at 2:49 PM, Jean-Daniel Cryans <[email protected]> wrote:
> Are you able to tell if that call in metaScanner is hanging or it's
> making multiple calls to the .META. region?
>
> If former, then jstack the region server that hosts .META. and see
> where it's blocked.
>
> if latter, then it means your .META. region is slow? Again, what's
> going on on the RS that hosts .META.?
>
> Finally, what's the master's log like during that time?
>
> J-D
>
> On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu <[email protected]> wrote:
>> J-D,
>>
>>  Thanks so much for your help so far! I sent disable commands on 4
>> rather small tables and they got stuck for a long time again, so I
>> took jstack of the master. From what I can tell, all disableTable
>> calls are blocked by a meta scanner thread (sample log below). At the
>> moment there were no other requests to the server at all. How should I
>> investigate this further? If it helps, here's some background: I have
>> several datasets, each of them is in a separate table; Our data
>> pipeline produces a new version of each dataset everyday and only the
>> lastest version should be used. This is how the data is loaded: for
>> each dataset 1) run a MR job and outputs HFiles 2) call loadTable.rb
>> to create a new table 3) disable and drop the previous version. As a
>> result some calls to load table and drop table would overlap. Please
>> let me know if something stands out to you as a potential culprit.
>> Thanks!
>>
>> BTW, I am running Hadoop 0.20.2 with HBase 0.20.6
>>
>> Thread 47 (IPC Server handler 13 on 60000):
>>  State: BLOCKED
>>  Blocked count: 3801
>>  Waited count: 72719
>>  Blocked on java.lang.Object@75ac522c
>>  Blocked by 27 (RegionManager.metaScanner)
>>  Stack:
>>    
>> org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:154)
>>    org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:842)
>>    sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>    
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>    java.lang.reflect.Method.invoke(Method.java:597)
>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>>    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>>
>>
>> Thread 27 (RegionManager.metaScanner):
>>  State: WAITING
>>  Blocked count: 1526058
>>  Waited count: 1488998
>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@4dd44ab0
>>  Stack:
>>    java.lang.Object.wait(Native Method)
>>    java.lang.Object.wait(Object.java:485)
>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>    $Proxy1.get(Unknown Source)
>>    
>> org.apache.hadoop.hbase.master.BaseScanner.checkAssigned(BaseScanner.java:543)
>>    
>> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:192)
>>    
>> org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>    
>> org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>
>>
>>
>>
>> On Fri, Feb 25, 2011 at 10:23 AM, Jean-Daniel Cryans
>> <[email protected]> wrote:
>>> An hour to disable? That doesn't sound right at all :)
>>>
>>> I would approach this problem like I generally do with HBase issue,
>>> first check the master log for any weirdness regarding my problem (in
>>> this case, grep for the table name).
>>>
>>> Then I would look the region server log(s) of the nodes that were
>>> hosting regions from that table. You should see the steps taken to
>>> disable the regions (starting to close, flush, region completely
>>> closed).
>>>
>>> If you are able to do it while it's taking a very long time to
>>> disable, try to jstack the process the seems to be hanging.
>>>
>>> Finally, like I said in another thread, there's a bug in 0.20.6 that
>>> almost prevent disabling a table (or re-enabling) if any region
>>> recently split and the parent wasn't cleaned yet from .META. and that
>>> is fixed in 0.90.1
>>>
>>> J-D
>>>
>>> On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu <[email protected]> wrote:
>>>> I think you are right, maybe in the long run I need to re-architect my
>>>> system so that it doesn't need to create new and delete old tables all
>>>> the time. In the short term I am having a really hard time with the
>>>> disabling function, I ran a disable command on a very small table
>>>> (probably dozen of MBs in size) and are no client using the cluster at
>>>> all, and that took about 1 hour to complete! The weird thing is on the
>>>> web UI only the region server carrying the META table has non-zero
>>>> requests, all other RS have 0 requests the entire time. I would think
>>>> they should get some request to flush the memstore at least. I *am*
>>>> using the same RS nodes for some map reduce job at the time and top
>>>> shows the memory usage is almost full on the META region. Would you
>>>> have some idea of what I should investigate?
>>>> Thanks so much.
>>>
>>
>

Re: Disabling a table taking very long time

Reply via email to