Re: Disabling a table taking very long time

Jean-Daniel Cryans Tue, 01 Mar 2011 11:02:05 -0800

That is correct. Also, disabling a table requires scanning and
modifying the .META. region so if it happens to be slow then that
operation will be slow too.


J-D

On Tue, Mar 1, 2011 at 10:55 AM, Nanheng Wu <[email protected]> wrote:
> I see, so the Get only gets affected on regions not in client's cache.
> I think that explains why that error seems hard to reproduce. This is
> how I understood a Get works: if the client doesn't have the region in
> its cache it will read .META. table; and if it does it makes a RPC to
> the RS directly; also each RS has some data about the regions it
> actually hosts, so if the region is not there it means the client's
> cache is stale and the client has to read the .META. again. Is this
> correct?
>
> On Tue, Mar 1, 2011 at 10:48 AM, Jean-Daniel Cryans <[email protected]> 
> wrote:
>> Depends if you have to query .META. first to get the location of the
>> region that hosts the row.
>>
>> J-D
>>
>> On Tue, Mar 1, 2011 at 10:45 AM, Nanheng Wu <[email protected]> wrote:
>>> Man I appreciate so much all the help you provided so far. I guess
>>> I'll keep digging. Would this meta scan cause Get or Scan on user
>>> tables to be very very slow too? That was actually was the problem I
>>> first noticed: from time to time a client Get or Scan would be
>>> extremely slow on my tables.
>>>
>>> On Tue, Mar 1, 2011 at 10:38 AM, Jean-Daniel Cryans <[email protected]> 
>>> wrote:
>>>> The store file looks fairly new, maybe it was just major compacted?
>>>> You could easily figure that out from the region server's log.
>>>>
>>>> Your problem is definitely a .META. region slowness. If scanning 8k
>>>> rows takes 30 minutes, scanning and updating 260 rows at the same time
>>>> probably isn't much faster.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Mar 1, 2011 at 10:34 AM, Nanheng Wu <[email protected]> wrote:
>>>>> J-D,
>>>>>
>>>>> It looks like I don't have that many files:
>>>>>
>>>>> drwxr-xr-x   - root supergroup          0 2011-02-28 23:21
>>>>> /hbase/.META./1028785192
>>>>> -rw-r--r--   3 root supergroup        932 2011-02-03 19:23
>>>>> /hbase/.META./1028785192/.regioninfo
>>>>> drwxr-xr-x   - root supergroup          0 2011-03-01 17:13
>>>>> /hbase/.META./1028785192/info
>>>>> -rw-r--r--   3 root supergroup   11831330 2011-03-01 17:13
>>>>> /hbase/.META./1028785192/info/6687606720393313750
>>>>> drwxr-xr-x   - root supergroup          0 2011-03-01 17:13
>>>>> /hbase/.META./compaction.dir
>>>>>
>>>>> So you think compaction on .META. table on the RS could be the
>>>>> problem? while the log metaScanner is running I saw virtually no logs
>>>>> on that server though, I did see compactions once the meta scan ended
>>>>> and region close started to happen. I wonder what the cluster is doing
>>>>> while the meta scan is going.
>>>>>
>>>>> I would definitely upgrade in the near future, it's just that I am
>>>>> afraid upgrading won't guarantee a fix if I don't understand the
>>>>> nature of the problem more clearly. We also have another test cluster,
>>>>> running the same versions of everything on cheaper hardware and it
>>>>> doesn't have this problem.
>>>>>
>>>>> On Tue, Mar 1, 2011 at 9:52 AM, Jean-Daniel Cryans <[email protected]> 
>>>>> wrote:
>>>>>> That long .META. scan might be the culprit, good catch!
>>>>>>
>>>>>> If you run this from the hadoop folder:
>>>>>>
>>>>>> bin/hadoop dfs -lsr /hbase/.META.
>>>>>>
>>>>>> Do you see a ton of files? Let's define "ton" as more than 16 files.
>>>>>> If so, and I see you have a very high number of regions, then you
>>>>>> could be hitting this problem:
>>>>>> https://issues.apache.org/jira/browse/HBASE-3499
>>>>>>
>>>>>> In short, every few .META. change will result in a flush and it could
>>>>>> be that the region server hosting it isn't able to keep up with the
>>>>>> compactions. It's done so that since append isn't supported in your
>>>>>> hadoop that you won't lose too much data if the node dies, which is
>>>>>> really bad when it happens to .META.
>>>>>>
>>>>>> In your case it might not be so bad to set your .META.'s MEMSTORE_SIZE
>>>>>> back to the default value if you're not writing hence not splitting.
>>>>>> Running the script attached in that jira would do it (don't forget to
>>>>>> restart HBase after running it).
>>>>>>
>>>>>> Lastly, upgrading to HBase 0.90.1 and a hadoop that supports append
>>>>>> should be a priority.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Mar 1, 2011 at 9:30 AM, Nanheng Wu <[email protected]> wrote:
>>>>>>> Hi J-D:
>>>>>>>
>>>>>>>  I did the scan like you suggested but no splits came up. This kind
>>>>>>> of makes sense to me, since we write Hfiles using a MR job and then
>>>>>>> pretty much use HBase for read-only so maybe that's why the regions
>>>>>>> never got split? I think my slow disable problem is related to this
>>>>>>> RegionManager.metaScanner. I greped for that in the Masters log and I
>>>>>>> saw:
>>>>>>> 11/03/01 15:30:37 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scanning meta region {server: 10.146.14.38:60020, regionname:
>>>>>>> .META.,,1, startKey: <>}
>>>>>>> 11/03/01 15:55:40 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scan of 8296 row(s) of meta region {server: 10.146.14.38:60020,
>>>>>>> regionname: .META.,,1, startKey: <>} complete
>>>>>>> 11/03/01 15:55:40 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scanning meta region {server: 10.146.14.38:60020, regionname:
>>>>>>> .META.,,1, startKey: <>}
>>>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>>> 11/03/01 16:21:04 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scan of 8296 row(s) of meta region {server: 10.146.14.38:60020,
>>>>>>> regionname: .META.,,1, startKey: <>} complete
>>>>>>> 11/03/01 16:21:04 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scanning meta region {server: 10.146.14.38:60020, regionname:
>>>>>>> .META.,,1, startKey: <>}
>>>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>>> 11/03/01 16:46:07 INFO master.BaseScanner: RegionManager.metaScanner
>>>>>>> scan of 8298 row(s) of meta region {server: 10.146.14.38:60020,
>>>>>>> regionname: .META.,,1, startKey: <>} complete
>>>>>>>
>>>>>>> It looks like meta scanner is always running, each time taking 20-30
>>>>>>> minutes. Here's what I did:
>>>>>>>
>>>>>>> 1) disable table -> at 16:28:31, according the log above there's a
>>>>>>> meta scan in progress.
>>>>>>>
>>>>>>> 2 ) the command hangs, no output in .META RS's log
>>>>>>> at 11/03/01 16:46:57 the client got 11/03/01 16:46:58 DEBUG
>>>>>>> zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got
>>>>>>> 10.146.14.38:60020
>>>>>>> 11/03/01 16:46:58 DEBUG client.HConnectionManager$TableServers: Found
>>>>>>> ROOT at 10.146.14.38:60020
>>>>>>> 11/03/01 16:46:58 DEBUG client.HConnectionManager$TableServers: Cached
>>>>>>> location for .META.,,1 is 10.146.14.38:60020
>>>>>>> This corresponds exact to when the meta scan completed. .META RS
>>>>>>> started to spew some logs about closing regions.
>>>>>>>
>>>>>>> 3) Another meta scan started and the client hangs again until it
>>>>>>> finishes, and disable command returns.The whole thing took 2688.5930
>>>>>>> seconds
>>>>>>>
>>>>>>> What kind of problem does it look like to you J-D? What does this
>>>>>>> background task metaScanner do?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 28, 2011 at 5:17 PM, Jean-Daniel Cryans 
>>>>>>> <[email protected]> wrote:
>>>>>>>> I didn't get most of the information I was looking for... but I do
>>>>>>>> have something to work with. So I asked whether the metaScanner was
>>>>>>>> making progress or not, and from that log I think I see it does:
>>>>>>>>
>>>>>>>>> 11/03/01 00:47:30 DEBUG client.HConnectionManager$TableServers:
>>>>>>>>> Rowscanned=261, rowsOffline=3
>>>>>>>>> 11/03/01 00:47:30 DEBUG client.HBaseAdmin: Sleep. Waiting for all
>>>>>>>>> regions to be disabled from myTable
>>>>>>>>
>>>>>>>> Only 3 rows are offline, whereas we expect all the region to be
>>>>>>>> offline. I expect you see those lines a lot? If so, one thing I'd like
>>>>>>>> to see is the result of this command:
>>>>>>>>
>>>>>>>> scan '.META.', {STARTROW => "myTable,,", LIMIT => 261}
>>>>>>>>
>>>>>>>> It's going to be big. Then grep in the result for the string SPLIT,
>>>>>>>> and please post back here the lines that match.
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Mon, Feb 28, 2011 at 5:04 PM, Nanheng Wu <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> I issued two more disable commands and looked at the .META RS. The
>>>>>>>>> last log entry looks like:
>>>>>>>>>
>>>>>>>>> 11/03/01 00:29:14 INFO regionserver.HLog: Roll
>>>>>>>>> /hbase/.logs/<RS_HOSTNAME>,60020,1298935751670/hlog
>>>>>>>>> .dat.1298935752660, entries=13527, calcsize=3839046, filesize=2560143.
>>>>>>>>> New hlog 
>>>>>>>>> /hbase/.logs/<RS_HOSTNAME>,60020,1298935751670/hlog.dat.1298939354072
>>>>>>>>>
>>>>>>>>> I issued the command at: 00:34:59, the client debug message has
>>>>>>>>> nothing but "11/03/01 00:34:59 DEBUG zookeeper.ClientCnxn: Got ping
>>>>>>>>> response for sessionid:0x12e6e938e9c000b after 90ms" until 00:47:19.
>>>>>>>>> Then all of a sudden it outputs the following:
>>>>>>>>>
>>>>>>>>> 0:47:28 DEBUG client.HConnectionManager$TableServers: Cache hit for
>>>>>>>>> row <> in tableName .META.: location server 10.146.14.38:60020,
>>>>>>>>> location region name .META.,,1
>>>>>>>>> 11/03/01 00:47:28 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
>>>>>>>>> /hbase/root-region-server got 10.146.14.38:60020
>>>>>>>>> 11/03/01 00:47:28 DEBUG client.HConnectionManager$TableServers: Found
>>>>>>>>> ROOT at 10.146.14.38:60020
>>>>>>>>> 11/03/01 00:47:28 DEBUG client.HConnectionManager$TableServers: Cached
>>>>>>>>> location for .META.,,1 is 10.146.14.38:60020
>>>>>>>>> 11/03/01 00:47:30 DEBUG client.HConnectionManager$TableServers:
>>>>>>>>> Rowscanned=261, rowsOffline=3
>>>>>>>>> 11/03/01 00:47:30 DEBUG client.HBaseAdmin: Sleep. Waiting for all
>>>>>>>>> regions to be disabled from myTable
>>>>>>>>> 11/03/01 00:47:31 DEBUG client.HBaseAdmin: Wake. Waiting for all
>>>>>>>>> regions to be disabled from myTable
>>>>>>>>>
>>>>>>>>> Here's the full thread dump of the .META RS from 00:34:59 to  00:47:31
>>>>>>>>> http://pastebin.com/Ywwx8afQ
>>>>>>>>>
>>>>>>>>> I hope I compiled some useful information here. I am pretty lost at
>>>>>>>>> this point, I would really appreciate your help!
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 28, 2011 at 2:58 PM, Jean-Daniel Cryans 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> Ok so no debug from the disable command? BTW you should output your 
>>>>>>>>>> GC
>>>>>>>>>> somewhere else like the .out file or a completely separate file.
>>>>>>>>>>
>>>>>>>>>> J-D
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 28, 2011 at 2:54 PM, Nanheng Wu <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> I see, so I should jstack the .META region. I'll do that.
>>>>>>>>>>>
>>>>>>>>>>> The master log pretty much looks like this: should I grep for
>>>>>>>>>>> something specific?
>>>>>>>>>>>
>>>>>>>>>>> 11/02/28 22:52:56 INFO master.BaseScanner: RegionManager.rootScanner
>>>>>>>>>>> scan of 1 row(s) of meta region complete
>>>>>>>>>>> 2011-02-28T22:52:57.138+0000: 263241.014: [GC 263241.014: [ParNew:
>>>>>>>>>>> 19136K->2112K(19136K), 0.0033350 secs] 378606K->36159
>>>>>>>>>>> 6K(641944K) icms_dc=0 , 0.0034330 secs] [Times: user=0.02 sys=0.00,
>>>>>>>>>>> real=0.00 secs]
>>>>>>>>>>> 2011-02-28T22:53:04.956+0000: 263248.832: [GC 263248.832: [ParNew:
>>>>>>>>>>> 19068K->2112K(19136K), 0.0029690 secs] 378552K->36163
>>>>>>>>>>> 0K(641944K) icms_dc=0 , 0.0030460 secs] [Times: user=0.02 sys=0.00,
>>>>>>>>>>> real=0.00 secs]
>>>>>>>>>>> 2011-02-28T22:53:12.664+0000: 263256.540: [GC 263256.541: [ParNew:
>>>>>>>>>>> 19136K->2112K(19136K), 0.0037690 secs] 378654K->36164
>>>>>>>>>>> 7K(641944K) icms_dc=0 , 0.0038660 secs] [Times: user=0.01 sys=0.00,
>>>>>>>>>>> real=0.01 secs]
>>>>>>>>>>> 2011-02-28T22:53:20.608+0000: 263264.484: [GC 263264.484: [ParNew:
>>>>>>>>>>> 19136K->1810K(19136K), 0.0032600 secs] 378671K->36134
>>>>>>>>>>> 6K(641944K) icms_dc=0 , 0.0033470 secs] [Times: user=0.02 sys=0.00,
>>>>>>>>>>> real=0.01 secs]
>>>>>>>>>>> 2011-02-28T22:53:28.278+0000: 263272.154: [GC 263272.154: [ParNew:
>>>>>>>>>>> 18834K->2112K(19136K), 0.0029870 secs] 378370K->36166
>>>>>>>>>>> 1K(641944K) icms_dc=0 , 0.0030670 secs] [Times: user=0.01 sys=0.00,
>>>>>>>>>>> real=0.01 secs]
>>>>>>>>>>> 2011-02-28T22:53:35.919+0000: 263279.795: [GC 263279.795: [ParNew:
>>>>>>>>>>> 19136K->1747K(19136K), 0.0037090 secs] 378685K->36129
>>>>>>>>>>> 8K(641944K) icms_dc=0 , 0.0037920 secs] [Times: user=0.02 sys=0.00,
>>>>>>>>>>> real=0.01 secs]
>>>>>>>>>>> 11/02/28 22:53:36 INFO master.ServerManager: 9 region servers, 0 
>>>>>>>>>>> dead,
>>>>>>>>>>> average load 760.2222222222222
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 28, 2011 at 2:49 PM, Jean-Daniel Cryans 
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> Are you able to tell if that call in metaScanner is hanging or it's
>>>>>>>>>>>> making multiple calls to the .META. region?
>>>>>>>>>>>>
>>>>>>>>>>>> If former, then jstack the region server that hosts .META. and see
>>>>>>>>>>>> where it's blocked.
>>>>>>>>>>>>
>>>>>>>>>>>> if latter, then it means your .META. region is slow? Again, what's
>>>>>>>>>>>> going on on the RS that hosts .META.?
>>>>>>>>>>>>
>>>>>>>>>>>> Finally, what's the master's log like during that time?
>>>>>>>>>>>>
>>>>>>>>>>>> J-D
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 28, 2011 at 2:41 PM, Nanheng Wu <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> J-D,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Thanks so much for your help so far! I sent disable commands on 4
>>>>>>>>>>>>> rather small tables and they got stuck for a long time again, so I
>>>>>>>>>>>>> took jstack of the master. From what I can tell, all disableTable
>>>>>>>>>>>>> calls are blocked by a meta scanner thread (sample log below). At 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> moment there were no other requests to the server at all. How 
>>>>>>>>>>>>> should I
>>>>>>>>>>>>> investigate this further? If it helps, here's some background: I 
>>>>>>>>>>>>> have
>>>>>>>>>>>>> several datasets, each of them is in a separate table; Our data
>>>>>>>>>>>>> pipeline produces a new version of each dataset everyday and only 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> lastest version should be used. This is how the data is loaded: 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> each dataset 1) run a MR job and outputs HFiles 2) call 
>>>>>>>>>>>>> loadTable.rb
>>>>>>>>>>>>> to create a new table 3) disable and drop the previous version. 
>>>>>>>>>>>>> As a
>>>>>>>>>>>>> result some calls to load table and drop table would overlap. 
>>>>>>>>>>>>> Please
>>>>>>>>>>>>> let me know if something stands out to you as a potential culprit.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW, I am running Hadoop 0.20.2 with HBase 0.20.6
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thread 47 (IPC Server handler 13 on 60000):
>>>>>>>>>>>>>  State: BLOCKED
>>>>>>>>>>>>>  Blocked count: 3801
>>>>>>>>>>>>>  Waited count: 72719
>>>>>>>>>>>>>  Blocked on java.lang.Object@75ac522c
>>>>>>>>>>>>>  Blocked by 27 (RegionManager.metaScanner)
>>>>>>>>>>>>>  Stack:
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:154)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:842)
>>>>>>>>>>>>>    sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>>>>>>    java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>>>>>>>>>  State: WAITING
>>>>>>>>>>>>>  Blocked count: 1526058
>>>>>>>>>>>>>  Waited count: 1488998
>>>>>>>>>>>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@4dd44ab0
>>>>>>>>>>>>>  Stack:
>>>>>>>>>>>>>    java.lang.Object.wait(Native Method)
>>>>>>>>>>>>>    java.lang.Object.wait(Object.java:485)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>>>>>>>>>>>    $Proxy1.get(Unknown Source)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.BaseScanner.checkAssigned(BaseScanner.java:543)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:192)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>>>>>>>>>>>    
>>>>>>>>>>>>> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>>>>>>>>>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Feb 25, 2011 at 10:23 AM, Jean-Daniel Cryans
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> An hour to disable? That doesn't sound right at all :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would approach this problem like I generally do with HBase 
>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>> first check the master log for any weirdness regarding my 
>>>>>>>>>>>>>> problem (in
>>>>>>>>>>>>>> this case, grep for the table name).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I would look the region server log(s) of the nodes that were
>>>>>>>>>>>>>> hosting regions from that table. You should see the steps taken 
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> disable the regions (starting to close, flush, region completely
>>>>>>>>>>>>>> closed).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you are able to do it while it's taking a very long time to
>>>>>>>>>>>>>> disable, try to jstack the process the seems to be hanging.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Finally, like I said in another thread, there's a bug in 0.20.6 
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> almost prevent disabling a table (or re-enabling) if any region
>>>>>>>>>>>>>> recently split and the parent wasn't cleaned yet from .META. and 
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> is fixed in 0.90.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> J-D
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Feb 24, 2011 at 11:37 PM, Nanheng Wu 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>> I think you are right, maybe in the long run I need to 
>>>>>>>>>>>>>>> re-architect my
>>>>>>>>>>>>>>> system so that it doesn't need to create new and delete old 
>>>>>>>>>>>>>>> tables all
>>>>>>>>>>>>>>> the time. In the short term I am having a really hard time with 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> disabling function, I ran a disable command on a very small 
>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>> (probably dozen of MBs in size) and are no client using the 
>>>>>>>>>>>>>>> cluster at
>>>>>>>>>>>>>>> all, and that took about 1 hour to complete! The weird thing is 
>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>> web UI only the region server carrying the META table has 
>>>>>>>>>>>>>>> non-zero
>>>>>>>>>>>>>>> requests, all other RS have 0 requests the entire time. I would 
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> they should get some request to flush the memstore at least. I 
>>>>>>>>>>>>>>> *am*
>>>>>>>>>>>>>>> using the same RS nodes for some map reduce job at the time and 
>>>>>>>>>>>>>>> top
>>>>>>>>>>>>>>> shows the memory usage is almost full on the META region. Would 
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> have some idea of what I should investigate?
>>>>>>>>>>>>>>> Thanks so much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Disabling a table taking very long time

Reply via email to