Thanks for your analysis. Once a region is offline, it is removed from regions
BTW your cluster needs more machines. 7600 regions over 4 nodes place too much load on the servers. On Wed, Mar 16, 2011 at 4:28 AM, 茅旭峰 <[email protected]> wrote: > Regarding AssignmentManager, it looks like only hold regions in transition. > We can see lots of region split and unsignment in the master log. I guess > it was due to our large cells and the endless insertion. Does this make > sense? > I have not dig into the code, I do belive it removes the regions from the > AssignmentManager.regions once the transition completes, right? > > Mao Xu-Feng > > On Wed, Mar 16, 2011 at 7:09 PM, 茅旭峰 <[email protected]> wrote: > > > Hi J-D, > > > > Thanks for your reply. > > > > You said, > > == > > > > Just as an example, every value that > > you insert first has to be copied from the socket before it can be > > inserted into the MemStore. If you are using a big write buffer, that > > means that every insert currently in flight in a region server takes > > double that amount of space. > > == > > > > How can I control the size of write buffer? I find a property > > 'hbase.client.write.buffer' in hbase-default.xml, do you mean this one? > > We use RESTful api to put our cells, hopefully, this would not make > > any difference. > > > > As for the memroy usage of the master, I did a further investigation > today. > > What I was doing was keeping putting cells as before. As I said > yesterday, > > the Java heap kept increasing accordingly, and eventually OOME happened > > as I expected. I set -Xmx to 1GB to speed up OOME. > > > > Then I used Eclipse Memory Analyzer to analyze the hprof file. It tells > > that > > most of the java heap is occupied by an instance of Class > AssignmentManager > > > > (For ease of reading, I think you can copy the result part to what ever > > editor you like, at least it works for me.) > > > > Class > > Name > > | Shallow Heap | Retained Heap > > > > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > org.apache.hadoop.hbase.master.AssignmentManager @ > > 0x7f01050d4c98 > > | 112 | 974,967,592 > > |- <class> class org.apache.hadoop.hbase.master.AssignmentManager @ > > 0x7f013c21ebd0 > > | 8 | 8 > > |- master org.apache.hadoop.hbase.master.HMaster @ 0x7f01050521e0 > > master-cloud135:60000 Busy Monitor, Thread > > | 328 | 3,000 > > |- regionsInTransition java.util.concurrent.ConcurrentSkipListMap @ > > 0x7f01050c1000 > > | 88 | 296 > > |- watcher org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher @ > > 0x7f01051cce68 > > | 136 | 1,720 > > |- timeoutMonitor > > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor @ > > 0x7f01052505a8 cloud135:60000.timeoutMonitor Thread| 208 > > | 592 > > |- zkTable org.apache.hadoop.hbase.zookeeper.ZKTable @ > > 0x7f01052c0318 > > | 32 | 400 > > |- catalogTracker org.apache.hadoop.hbase.catalog.CatalogTracker @ > > 0x7f01052c5fd0 > > | 72 | 376 > > |- serverManager org.apache.hadoop.hbase.master.ServerManager @ > > 0x7f01052f0138 > > | 80 | 932,000 > > |- regionPlans java.util.TreeMap @ > > 0x7f01052f01d8 > > | 80 | 104 > > |- servers java.util.TreeMap @ > > 0x7f01052f0228 > > | 80 | 75,128 > > |- regions java.util.TreeMap @ > > 0x7f01052f0278 > > | 80 | 950,435,488 > > | |- <class> class java.util.TreeMap @ 0x7f013be45c30 System > > Class > > | 16 | 16 > > | |- root java.util.TreeMap$Entry @ > > 0x7f010542b790 > > | 64 | 950,435,408 > > | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System > > Class > > | 0 | 0 > > | | |- left java.util.TreeMap$Entry @ > > 0x7f01053d34b0 > > | 64 | 579,650,616 > > | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System > > Class | > 0 > > | 0 > > | | | |- right java.util.TreeMap$Entry @ > > 0x7f01053d34f0 > > | 64 | 270,674,784 > > | | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 > > System Class > > | 0 | 0 > > | | | | |- left java.util.TreeMap$Entry @ > > 0x7f01053c7568 > > | 64 | 162,321,936 > > | | | | |- parent java.util.TreeMap$Entry @ > > 0x7f01053d34b0 > > | 64 | 579,650,616 > > | | | | |- right java.util.TreeMap$Entry @ > > 0x7f01054cbbe8 > > | 64 | 107,828,656 > > | | | | |- value org.apache.hadoop.hbase.HServerInfo @ > > 0x7f010f6866c0 > > | 72 | 154,328 > > | | | | | |- <class> class org.apache.hadoop.hbase.HServerInfo @ > > 0x7f013c61e3e0 > > | 8 | 8 > > | | | | | |- load org.apache.hadoop.hbase.HServerLoad @ > > 0x7f010540a548 > > | 40 | 153,776 > > | | | | | |- serverName java.lang.String @ 0x7f010540a9a8 > > cloud138,60020,1300161207678 > > | 40 | 120 > > | | | | | |- hostname java.lang.String @ 0x7f010540ab60 > > cloud138 > > | 40 | 80 > > | | | | | |- serverAddress org.apache.hadoop.hbase.HServerAddress @ > > 0x7f01054c3020 | > > 32 | 280 > > | | | | | '- Total: 5 > > entries > > | | > > | | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > > 0x7f010f77bd68 > > | 88 | 3,200 > > | | | | '- Total: 6 > > entries > > | | > > | | | |- parent java.util.TreeMap$Entry @ > > 0x7f010542b790 > > | 64 | 950,435,408 > > | | | |- left java.util.TreeMap$Entry @ > > 0x7f0105432b70 > > | 64 | 307,135,480 > > | | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 > > System Class > > | 0 | 0 > > | | | | |- parent java.util.TreeMap$Entry @ > > 0x7f01053d34b0 > > | 64 | 579,650,616 > > | | | | |- left java.util.TreeMap$Entry @ > > 0x7f01054512f8 > > | 64 | 139,023,720 > > | | | | |- right java.util.TreeMap$Entry @ > > 0x7f0105681960 > > | 64 | 167,467,512 > > | | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > > 0x7f0112027ca8 > > | 88 | 3,200 > > | | | | |- value org.apache.hadoop.hbase.HServerInfo @ > > 0x7f01123a1188 > > | 72 | 184,040 > > | | | | '- Total: 6 > > entries > > | | > > | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > > 0x7f010804cdc0 > > | 88 | 3,200 > > | | | |- value org.apache.hadoop.hbase.HServerInfo @ > > 0x7f01080e00b0 > > | 72 | 220,672 > > | | | '- Total: 6 > > entries > > | | > > | | |- right java.util.TreeMap$Entry @ > > 0x7f0105426ff0 > > | 64 | 366,632,232 > > | | |- value org.apache.hadoop.hbase.HServerInfo @ > > 0x7f010a1689e8 > > | 72 | 192,552 > > | | |- key org.apache.hadoop.hbase.HRegionInfo @ > > 0x7f010ae01598 > > | 88 | 3,200 > > | | '- Total: 5 > > entries > > | | > > | '- Total: 2 > > entries > > | | > > |- executorService org.apache.hadoop.hbase.executor.ExecutorService @ > > 0x7f010531ede0 > > | 40 | 5,792 > > '- Total: 12 > > entries > > | | > > > > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > We have over 7600 regions. It looks like AssignmentManager.regions keeps > a > > <HRegionInfo,HServerInfo> > > pair for each region, and more over, even we have only four region > servers > > in our environment, each > > <HRegionInfo,HServerInfo> pair has its own instance of HServerInfo, which > > is about hundrads of thousand > > bytes per instance. It looks like most of the memory of HServerInfo are > to > > contain RegionLoads for each > > region. Then the space requirement is cM x M, where M stands for the > number > > of region. I'm not clear > > if my analysis is correct, and if so, we should take the issue into > account > > while doing capacity schedule > > for the master, right? > > > > Thanks again for your patience. > > > > Mao Xu-Feng > > > > > > On Wed, Mar 16, 2011 at 1:41 AM, Jean-Daniel Cryans <[email protected] > >wrote: > > > >> Inline. > >> > >> J-D > >> > >> On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <[email protected]> wrote: > >> > Thanks J-D for your reply. > >> > > >> > It looks like HBASE-3617 will be included in 0.92, then when will 0.92 > >> be > >> > released? > >> > >> It should be included in the bug fix release 0.90.2, which isn't > >> scheduled at the moment. Historically, HBase never had a tight > >> schedule and releases are made whenever a committer feels like there's > >> enough fixed jiras and gathers enough votes. > >> > >> > > >> > Yes, you're right, we launched tens of threads, putting values of 4MB > on > >> > average, endless. > >> > Does the region server meant to die because of OOM? I thought it's > >> region > >> > servers' > >> > responsibilty to flush memory stores into HFDS, the limitation while > >> doing > >> > insertion endlessly > >> > should be the size of HDFS, rather than java heap memory(we set 4GB > java > >> > heap for region > >> > server). > >> > >> Yes, the RS does control the MemStores. What it doesn't control very > >> well is all the queries that are in flight, plus the heap required to > >> do compactions, plus the data copied when flushing, plus all the other > >> small tidbits all over the place. Just as an example, every value that > >> you insert first has to be copied from the socket before it can be > >> inserted into the MemStore. If you are using a big write buffer, that > >> means that every insert currently in flight in a region server takes > >> double that amount of space. > >> > >> Garbage collection also isn't done as soon as the objects aren't used, > >> that wouldn't make sense given how it works, so there's space occupied > >> by dead objects. > >> > >> The jira tracking the handling of OOMEs in HBase is > >> https://issues.apache.org/jira/browse/HBASE-2506 > >> > >> > > >> > Today, we cleaned up the HDFS, rerun the stress tests, I mean > inserting > >> > endlessly. > >> > With java memory monitor tools, like jconsole, we find that the java > >> heap of > >> > master > >> > is also keeping increasing, another OOM is expected now, though not > >> happened > >> > so far. > >> > Is the master meant to die in this regarding? > >> > >> I think your monitoring is a bit naive, memory isn't cleaned as soon > >> as it's unused, that's not how the garbage collector works. Your OOME > >> in the master happens after a region server died because it's trying > >> to load too much data into memory. > >> > >> > > >> > Our keys are SHA1 hashed, which should spread uniformly. But from the > >> web > >> > page(master:60010), > >> > we can see most requests are handled only by one region server, and in > >> the > >> > master log, > >> > there are lots of region split, and eventually, the regions are > spreaded > >> > uniformly among the region > >> > servers, is this workflow correct? > >> > >> That's how it works. There's always one region in the beginning and > >> then it's split organically. You can create your tables pre-splitted > >> with this HBaseAdmin method: > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor > >> , > >> byte[][]) > >> > >> Or instead of trying to force your data into HBase, you could use the > >> bulk loader: http://hbase.apache.org/bulk-loads.html > >> > >> > > >> > Thanks again for your time, J-D. > >> > > >> > Mao Xu-Feng > >> > > >> > > > > >
