Re: One of the regionserver aborted, then the master shut down itself

Ted Yu Wed, 16 Mar 2011 08:04:25 -0700

Thanks for your analysis.
Once a region is offline, it is removed from regions


BTW your cluster needs more machines. 7600 regions over 4 nodes place too
much load on the servers.

On Wed, Mar 16, 2011 at 4:28 AM, 茅旭峰 <[email protected]> wrote:

> Regarding AssignmentManager, it looks like only hold regions in transition.
> We can see lots of region split and unsignment in the master log. I guess
> it was due to our large cells and the endless insertion. Does this make
> sense?
> I have not dig into the code, I do belive it removes the regions from the
> AssignmentManager.regions once the transition completes, right?
>
> Mao Xu-Feng
>
> On Wed, Mar 16, 2011 at 7:09 PM, 茅旭峰 <[email protected]> wrote:
>
> > Hi J-D,
> >
> > Thanks for your reply.
> >
> > You said,
> > ==
> >
> > Just as an example, every value that
> > you insert first has to be copied from the socket before it can be
> > inserted into the MemStore.  If you are using a big write buffer, that
> > means that every insert currently in flight in a region server takes
> > double that amount of space.
> > ==
> >
> > How can I control the size of write buffer? I find a property
> > 'hbase.client.write.buffer' in hbase-default.xml, do you mean this one?
> > We use RESTful api to put our cells, hopefully, this would not make
> > any difference.
> >
> > As for the memroy usage of the master, I did a further investigation
> today.
> > What I was doing was keeping putting cells as before. As I said
> yesterday,
> > the Java heap kept increasing accordingly, and eventually OOME happened
> > as I expected. I set -Xmx to 1GB to speed up OOME.
> >
> > Then I used Eclipse Memory Analyzer to analyze the hprof file. It tells
> > that
> > most of the java heap is occupied by an instance of Class
> AssignmentManager
> >
> > (For ease of reading, I think you can copy the result part to what ever
> > editor you like, at least it works for me.)
> >
> > Class
> > Name
> > | Shallow Heap | Retained Heap
> >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > org.apache.hadoop.hbase.master.AssignmentManager @
> > 0x7f01050d4c98
> > |          112 |   974,967,592
> > |- <class> class org.apache.hadoop.hbase.master.AssignmentManager @
> > 0x7f013c21ebd0
> > |            8 |             8
> > |- master org.apache.hadoop.hbase.master.HMaster @ 0x7f01050521e0
> > master-cloud135:60000 Busy Monitor, Thread
> > |          328 |         3,000
> > |- regionsInTransition java.util.concurrent.ConcurrentSkipListMap @
> > 0x7f01050c1000
> > |           88 |           296
> > |- watcher org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher @
> > 0x7f01051cce68
> > |          136 |         1,720
> > |- timeoutMonitor
> > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor @
> > 0x7f01052505a8  cloud135:60000.timeoutMonitor Thread|          208
> > |           592
> > |- zkTable org.apache.hadoop.hbase.zookeeper.ZKTable @
> > 0x7f01052c0318
> > |           32 |           400
> > |- catalogTracker org.apache.hadoop.hbase.catalog.CatalogTracker @
> > 0x7f01052c5fd0
> > |           72 |           376
> > |- serverManager org.apache.hadoop.hbase.master.ServerManager @
> > 0x7f01052f0138
> > |           80 |       932,000
> > |- regionPlans java.util.TreeMap @
> > 0x7f01052f01d8
> > |           80 |           104
> > |- servers java.util.TreeMap @
> > 0x7f01052f0228
> > |           80 |        75,128
> > |- regions java.util.TreeMap @
> > 0x7f01052f0278
> > |           80 |   950,435,488
> > |  |- <class> class java.util.TreeMap @ 0x7f013be45c30 System
> > Class
> > |           16 |            16
> > |  |- root java.util.TreeMap$Entry @
> > 0x7f010542b790
> > |           64 |   950,435,408
> > |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System
> > Class
> > |            0 |             0
> > |  |  |- left java.util.TreeMap$Entry @
> > 0x7f01053d34b0
> > |           64 |   579,650,616
> > |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System
> > Class                                                         |
>  0
> > |             0
> > |  |  |  |- right java.util.TreeMap$Entry @
> > 0x7f01053d34f0
> > |           64 |   270,674,784
> > |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> > System Class
> > |            0 |             0
> > |  |  |  |  |- left java.util.TreeMap$Entry @
> > 0x7f01053c7568
> > |           64 |   162,321,936
> > |  |  |  |  |- parent java.util.TreeMap$Entry @
> > 0x7f01053d34b0
> > |           64 |   579,650,616
> > |  |  |  |  |- right java.util.TreeMap$Entry @
> > 0x7f01054cbbe8
> > |           64 |   107,828,656
> > |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > 0x7f010f6866c0
> > |           72 |       154,328
> > |  |  |  |  |  |- <class> class org.apache.hadoop.hbase.HServerInfo @
> > 0x7f013c61e3e0
> > |            8 |             8
> > |  |  |  |  |  |- load org.apache.hadoop.hbase.HServerLoad @
> > 0x7f010540a548
> > |           40 |       153,776
> > |  |  |  |  |  |- serverName java.lang.String @ 0x7f010540a9a8
> > cloud138,60020,1300161207678
> > |           40 |           120
> > |  |  |  |  |  |- hostname java.lang.String @ 0x7f010540ab60
> > cloud138
> > |           40 |            80
> > |  |  |  |  |  |- serverAddress org.apache.hadoop.hbase.HServerAddress @
> > 0x7f01054c3020                                                 |
> > 32 |           280
> > |  |  |  |  |  '- Total: 5
> > entries
> > |              |
> > |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > 0x7f010f77bd68
> > |           88 |         3,200
> > |  |  |  |  '- Total: 6
> > entries
> > |              |
> > |  |  |  |- parent java.util.TreeMap$Entry @
> > 0x7f010542b790
> > |           64 |   950,435,408
> > |  |  |  |- left java.util.TreeMap$Entry @
> > 0x7f0105432b70
> > |           64 |   307,135,480
> > |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> > System Class
> > |            0 |             0
> > |  |  |  |  |- parent java.util.TreeMap$Entry @
> > 0x7f01053d34b0
> > |           64 |   579,650,616
> > |  |  |  |  |- left java.util.TreeMap$Entry @
> > 0x7f01054512f8
> > |           64 |   139,023,720
> > |  |  |  |  |- right java.util.TreeMap$Entry @
> > 0x7f0105681960
> > |           64 |   167,467,512
> > |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > 0x7f0112027ca8
> > |           88 |         3,200
> > |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > 0x7f01123a1188
> > |           72 |       184,040
> > |  |  |  |  '- Total: 6
> > entries
> > |              |
> > |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > 0x7f010804cdc0
> > |           88 |         3,200
> > |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > 0x7f01080e00b0
> > |           72 |       220,672
> > |  |  |  '- Total: 6
> > entries
> > |              |
> > |  |  |- right java.util.TreeMap$Entry @
> > 0x7f0105426ff0
> > |           64 |   366,632,232
> > |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > 0x7f010a1689e8
> > |           72 |       192,552
> > |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > 0x7f010ae01598
> > |           88 |         3,200
> > |  |  '- Total: 5
> > entries
> > |              |
> > |  '- Total: 2
> > entries
> > |              |
> > |- executorService org.apache.hadoop.hbase.executor.ExecutorService @
> > 0x7f010531ede0
> > |           40 |         5,792
> > '- Total: 12
> > entries
> > |              |
> >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > We have over 7600 regions. It looks like AssignmentManager.regions keeps
> a
> > <HRegionInfo,HServerInfo>
> > pair for each region, and more over, even we have only four region
> servers
> > in our environment, each
> > <HRegionInfo,HServerInfo> pair has its own instance of HServerInfo, which
> > is about hundrads of thousand
> > bytes per instance. It looks like most of the memory of HServerInfo are
> to
> > contain RegionLoads for each
> > region. Then the space requirement is cM x M, where M stands for the
> number
> > of region. I'm not clear
> > if my analysis is correct, and if so, we should take the issue into
> account
> > while doing capacity schedule
> > for the master, right?
> >
> > Thanks again for your patience.
> >
> > Mao Xu-Feng
> >
> >
> > On Wed, Mar 16, 2011 at 1:41 AM, Jean-Daniel Cryans <[email protected]
> >wrote:
> >
> >> Inline.
> >>
> >> J-D
> >>
> >> On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <[email protected]> wrote:
> >> > Thanks J-D for your reply.
> >> >
> >> > It looks like HBASE-3617 will be included in 0.92, then when will 0.92
> >> be
> >> > released?
> >>
> >> It should be included in the bug fix release 0.90.2, which isn't
> >> scheduled at the moment. Historically, HBase never had a tight
> >> schedule and releases are made whenever a committer feels like there's
> >> enough fixed jiras and gathers enough votes.
> >>
> >> >
> >> > Yes, you're right, we launched tens of threads, putting values of 4MB
> on
> >> > average, endless.
> >> > Does the region server meant to die because of OOM? I thought it's
> >> region
> >> > servers'
> >> > responsibilty to flush memory stores into HFDS, the limitation while
> >> doing
> >> > insertion endlessly
> >> > should be the size of HDFS, rather than java heap memory(we set 4GB
> java
> >> > heap for region
> >> > server).
> >>
> >> Yes, the RS does control the MemStores. What it doesn't control very
> >> well is all the queries that are in flight, plus the heap required to
> >> do compactions, plus the data copied when flushing, plus all the other
> >> small tidbits all over the place. Just as an example, every value that
> >> you insert first has to be copied from the socket before it can be
> >> inserted into the MemStore.  If you are using a big write buffer, that
> >> means that every insert currently in flight in a region server takes
> >> double that amount of space.
> >>
> >> Garbage collection also isn't done as soon as the objects aren't used,
> >> that wouldn't make sense given how it works, so there's space occupied
> >> by dead objects.
> >>
> >> The jira tracking the handling of OOMEs in HBase is
> >> https://issues.apache.org/jira/browse/HBASE-2506
> >>
> >> >
> >> > Today, we cleaned up the HDFS, rerun the stress tests, I mean
> inserting
> >> > endlessly.
> >> > With java memory monitor tools, like jconsole, we find that the java
> >> heap of
> >> > master
> >> > is also keeping increasing, another OOM is expected now, though not
> >> happened
> >> > so far.
> >> > Is the master meant to die in this regarding?
> >>
> >> I think your monitoring is a bit naive, memory isn't cleaned as soon
> >> as it's unused, that's not how the garbage collector works. Your OOME
> >> in the master happens after a region server died because it's trying
> >> to load too much data into memory.
> >>
> >> >
> >> > Our keys are SHA1 hashed, which should spread uniformly. But from the
> >> web
> >> > page(master:60010),
> >> > we can see most requests are handled only by one region server, and in
> >> the
> >> > master log,
> >> > there are lots of region split, and eventually, the regions are
> spreaded
> >> > uniformly among the region
> >> > servers, is this workflow correct?
> >>
> >> That's how it works. There's always one region in the beginning and
> >> then it's split organically. You can create your tables pre-splitted
> >> with this HBaseAdmin method:
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor
> >> ,
> >> byte[][])
> >>
> >> Or instead of trying to force your data into HBase, you could use the
> >> bulk loader: http://hbase.apache.org/bulk-loads.html
> >>
> >> >
> >> > Thanks again for your time, J-D.
> >> >
> >> > Mao Xu-Feng
> >> >
> >>
> >
> >
>

Re: One of the regionserver aborted, then the master shut down itself

Reply via email to