Re: One of the regionserver aborted, then the master shut down itself

茅旭峰 Wed, 16 Mar 2011 04:28:39 -0700

Regarding AssignmentManager, it looks like only hold regions in transition.
We can see lots of region split and unsignment in the master log. I guess
it was due to our large cells and the endless insertion. Does this make
sense?
I have not dig into the code, I do belive it removes the regions from the
AssignmentManager.regions once the transition completes, right?


Mao Xu-Feng

On Wed, Mar 16, 2011 at 7:09 PM, 茅旭峰 <[email protected]> wrote:

> Hi J-D,
>
> Thanks for your reply.
>
> You said,
> ==
>
> Just as an example, every value that
> you insert first has to be copied from the socket before it can be
> inserted into the MemStore.  If you are using a big write buffer, that
> means that every insert currently in flight in a region server takes
> double that amount of space.
> ==
>
> How can I control the size of write buffer? I find a property
> 'hbase.client.write.buffer' in hbase-default.xml, do you mean this one?
> We use RESTful api to put our cells, hopefully, this would not make
> any difference.
>
> As for the memroy usage of the master, I did a further investigation today.
> What I was doing was keeping putting cells as before. As I said yesterday,
> the Java heap kept increasing accordingly, and eventually OOME happened
> as I expected. I set -Xmx to 1GB to speed up OOME.
>
> Then I used Eclipse Memory Analyzer to analyze the hprof file. It tells
> that
> most of the java heap is occupied by an instance of Class AssignmentManager
>
> (For ease of reading, I think you can copy the result part to what ever
> editor you like, at least it works for me.)
>
> Class
> Name
> | Shallow Heap | Retained Heap
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> org.apache.hadoop.hbase.master.AssignmentManager @
> 0x7f01050d4c98
> |          112 |   974,967,592
> |- <class> class org.apache.hadoop.hbase.master.AssignmentManager @
> 0x7f013c21ebd0
> |            8 |             8
> |- master org.apache.hadoop.hbase.master.HMaster @ 0x7f01050521e0
> master-cloud135:60000 Busy Monitor, Thread
> |          328 |         3,000
> |- regionsInTransition java.util.concurrent.ConcurrentSkipListMap @
> 0x7f01050c1000
> |           88 |           296
> |- watcher org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher @
> 0x7f01051cce68
> |          136 |         1,720
> |- timeoutMonitor
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor @
> 0x7f01052505a8  cloud135:60000.timeoutMonitor Thread|          208
> |           592
> |- zkTable org.apache.hadoop.hbase.zookeeper.ZKTable @
> 0x7f01052c0318
> |           32 |           400
> |- catalogTracker org.apache.hadoop.hbase.catalog.CatalogTracker @
> 0x7f01052c5fd0
> |           72 |           376
> |- serverManager org.apache.hadoop.hbase.master.ServerManager @
> 0x7f01052f0138
> |           80 |       932,000
> |- regionPlans java.util.TreeMap @
> 0x7f01052f01d8
> |           80 |           104
> |- servers java.util.TreeMap @
> 0x7f01052f0228
> |           80 |        75,128
> |- regions java.util.TreeMap @
> 0x7f01052f0278
> |           80 |   950,435,488
> |  |- <class> class java.util.TreeMap @ 0x7f013be45c30 System
> Class
> |           16 |            16
> |  |- root java.util.TreeMap$Entry @
> 0x7f010542b790
> |           64 |   950,435,408
> |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System
> Class
> |            0 |             0
> |  |  |- left java.util.TreeMap$Entry @
> 0x7f01053d34b0
> |           64 |   579,650,616
> |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System
> Class                                                         |            0
> |             0
> |  |  |  |- right java.util.TreeMap$Entry @
> 0x7f01053d34f0
> |           64 |   270,674,784
> |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> System Class
> |            0 |             0
> |  |  |  |  |- left java.util.TreeMap$Entry @
> 0x7f01053c7568
> |           64 |   162,321,936
> |  |  |  |  |- parent java.util.TreeMap$Entry @
> 0x7f01053d34b0
> |           64 |   579,650,616
> |  |  |  |  |- right java.util.TreeMap$Entry @
> 0x7f01054cbbe8
> |           64 |   107,828,656
> |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> 0x7f010f6866c0
> |           72 |       154,328
> |  |  |  |  |  |- <class> class org.apache.hadoop.hbase.HServerInfo @
> 0x7f013c61e3e0
> |            8 |             8
> |  |  |  |  |  |- load org.apache.hadoop.hbase.HServerLoad @
> 0x7f010540a548
> |           40 |       153,776
> |  |  |  |  |  |- serverName java.lang.String @ 0x7f010540a9a8
> cloud138,60020,1300161207678
> |           40 |           120
> |  |  |  |  |  |- hostname java.lang.String @ 0x7f010540ab60
> cloud138
> |           40 |            80
> |  |  |  |  |  |- serverAddress org.apache.hadoop.hbase.HServerAddress @
> 0x7f01054c3020                                                 |
> 32 |           280
> |  |  |  |  |  '- Total: 5
> entries
> |              |
> |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> 0x7f010f77bd68
> |           88 |         3,200
> |  |  |  |  '- Total: 6
> entries
> |              |
> |  |  |  |- parent java.util.TreeMap$Entry @
> 0x7f010542b790
> |           64 |   950,435,408
> |  |  |  |- left java.util.TreeMap$Entry @
> 0x7f0105432b70
> |           64 |   307,135,480
> |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> System Class
> |            0 |             0
> |  |  |  |  |- parent java.util.TreeMap$Entry @
> 0x7f01053d34b0
> |           64 |   579,650,616
> |  |  |  |  |- left java.util.TreeMap$Entry @
> 0x7f01054512f8
> |           64 |   139,023,720
> |  |  |  |  |- right java.util.TreeMap$Entry @
> 0x7f0105681960
> |           64 |   167,467,512
> |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> 0x7f0112027ca8
> |           88 |         3,200
> |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> 0x7f01123a1188
> |           72 |       184,040
> |  |  |  |  '- Total: 6
> entries
> |              |
> |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> 0x7f010804cdc0
> |           88 |         3,200
> |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> 0x7f01080e00b0
> |           72 |       220,672
> |  |  |  '- Total: 6
> entries
> |              |
> |  |  |- right java.util.TreeMap$Entry @
> 0x7f0105426ff0
> |           64 |   366,632,232
> |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> 0x7f010a1689e8
> |           72 |       192,552
> |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> 0x7f010ae01598
> |           88 |         3,200
> |  |  '- Total: 5
> entries
> |              |
> |  '- Total: 2
> entries
> |              |
> |- executorService org.apache.hadoop.hbase.executor.ExecutorService @
> 0x7f010531ede0
> |           40 |         5,792
> '- Total: 12
> entries
> |              |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> We have over 7600 regions. It looks like AssignmentManager.regions keeps a
> <HRegionInfo,HServerInfo>
> pair for each region, and more over, even we have only four region servers
> in our environment, each
> <HRegionInfo,HServerInfo> pair has its own instance of HServerInfo, which
> is about hundrads of thousand
> bytes per instance. It looks like most of the memory of HServerInfo are to
> contain RegionLoads for each
> region. Then the space requirement is cM x M, where M stands for the number
> of region. I'm not clear
> if my analysis is correct, and if so, we should take the issue into account
> while doing capacity schedule
> for the master, right?
>
> Thanks again for your patience.
>
> Mao Xu-Feng
>
>
> On Wed, Mar 16, 2011 at 1:41 AM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> Inline.
>>
>> J-D
>>
>> On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <[email protected]> wrote:
>> > Thanks J-D for your reply.
>> >
>> > It looks like HBASE-3617 will be included in 0.92, then when will 0.92
>> be
>> > released?
>>
>> It should be included in the bug fix release 0.90.2, which isn't
>> scheduled at the moment. Historically, HBase never had a tight
>> schedule and releases are made whenever a committer feels like there's
>> enough fixed jiras and gathers enough votes.
>>
>> >
>> > Yes, you're right, we launched tens of threads, putting values of 4MB on
>> > average, endless.
>> > Does the region server meant to die because of OOM? I thought it's
>> region
>> > servers'
>> > responsibilty to flush memory stores into HFDS, the limitation while
>> doing
>> > insertion endlessly
>> > should be the size of HDFS, rather than java heap memory(we set 4GB java
>> > heap for region
>> > server).
>>
>> Yes, the RS does control the MemStores. What it doesn't control very
>> well is all the queries that are in flight, plus the heap required to
>> do compactions, plus the data copied when flushing, plus all the other
>> small tidbits all over the place. Just as an example, every value that
>> you insert first has to be copied from the socket before it can be
>> inserted into the MemStore.  If you are using a big write buffer, that
>> means that every insert currently in flight in a region server takes
>> double that amount of space.
>>
>> Garbage collection also isn't done as soon as the objects aren't used,
>> that wouldn't make sense given how it works, so there's space occupied
>> by dead objects.
>>
>> The jira tracking the handling of OOMEs in HBase is
>> https://issues.apache.org/jira/browse/HBASE-2506
>>
>> >
>> > Today, we cleaned up the HDFS, rerun the stress tests, I mean inserting
>> > endlessly.
>> > With java memory monitor tools, like jconsole, we find that the java
>> heap of
>> > master
>> > is also keeping increasing, another OOM is expected now, though not
>> happened
>> > so far.
>> > Is the master meant to die in this regarding?
>>
>> I think your monitoring is a bit naive, memory isn't cleaned as soon
>> as it's unused, that's not how the garbage collector works. Your OOME
>> in the master happens after a region server died because it's trying
>> to load too much data into memory.
>>
>> >
>> > Our keys are SHA1 hashed, which should spread uniformly. But from the
>> web
>> > page(master:60010),
>> > we can see most requests are handled only by one region server, and in
>> the
>> > master log,
>> > there are lots of region split, and eventually, the regions are spreaded
>> > uniformly among the region
>> > servers, is this workflow correct?
>>
>> That's how it works. There's always one region in the beginning and
>> then it's split organically. You can create your tables pre-splitted
>> with this HBaseAdmin method:
>>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor
>> ,
>> byte[][])
>>
>> Or instead of trying to force your data into HBase, you could use the
>> bulk loader: http://hbase.apache.org/bulk-loads.html
>>
>> >
>> > Thanks again for your time, J-D.
>> >
>> > Mao Xu-Feng
>> >
>>
>
>

Re: One of the regionserver aborted, then the master shut down itself

Reply via email to