Regarding AssignmentManager, it looks like only hold regions in transition. We can see lots of region split and unsignment in the master log. I guess it was due to our large cells and the endless insertion. Does this make sense? I have not dig into the code, I do belive it removes the regions from the AssignmentManager.regions once the transition completes, right?
Mao Xu-Feng On Wed, Mar 16, 2011 at 7:09 PM, 茅旭峰 <[email protected]> wrote: > Hi J-D, > > Thanks for your reply. > > You said, > == > > Just as an example, every value that > you insert first has to be copied from the socket before it can be > inserted into the MemStore. If you are using a big write buffer, that > means that every insert currently in flight in a region server takes > double that amount of space. > == > > How can I control the size of write buffer? I find a property > 'hbase.client.write.buffer' in hbase-default.xml, do you mean this one? > We use RESTful api to put our cells, hopefully, this would not make > any difference. > > As for the memroy usage of the master, I did a further investigation today. > What I was doing was keeping putting cells as before. As I said yesterday, > the Java heap kept increasing accordingly, and eventually OOME happened > as I expected. I set -Xmx to 1GB to speed up OOME. > > Then I used Eclipse Memory Analyzer to analyze the hprof file. It tells > that > most of the java heap is occupied by an instance of Class AssignmentManager > > (For ease of reading, I think you can copy the result part to what ever > editor you like, at least it works for me.) > > Class > Name > | Shallow Heap | Retained Heap > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > org.apache.hadoop.hbase.master.AssignmentManager @ > 0x7f01050d4c98 > | 112 | 974,967,592 > |- <class> class org.apache.hadoop.hbase.master.AssignmentManager @ > 0x7f013c21ebd0 > | 8 | 8 > |- master org.apache.hadoop.hbase.master.HMaster @ 0x7f01050521e0 > master-cloud135:60000 Busy Monitor, Thread > | 328 | 3,000 > |- regionsInTransition java.util.concurrent.ConcurrentSkipListMap @ > 0x7f01050c1000 > | 88 | 296 > |- watcher org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher @ > 0x7f01051cce68 > | 136 | 1,720 > |- timeoutMonitor > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor @ > 0x7f01052505a8 cloud135:60000.timeoutMonitor Thread| 208 > | 592 > |- zkTable org.apache.hadoop.hbase.zookeeper.ZKTable @ > 0x7f01052c0318 > | 32 | 400 > |- catalogTracker org.apache.hadoop.hbase.catalog.CatalogTracker @ > 0x7f01052c5fd0 > | 72 | 376 > |- serverManager org.apache.hadoop.hbase.master.ServerManager @ > 0x7f01052f0138 > | 80 | 932,000 > |- regionPlans java.util.TreeMap @ > 0x7f01052f01d8 > | 80 | 104 > |- servers java.util.TreeMap @ > 0x7f01052f0228 > | 80 | 75,128 > |- regions java.util.TreeMap @ > 0x7f01052f0278 > | 80 | 950,435,488 > | |- <class> class java.util.TreeMap @ 0x7f013be45c30 System > Class > | 16 | 16 > | |- root java.util.TreeMap$Entry @ > 0x7f010542b790 > | 64 | 950,435,408 > | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System > Class > | 0 | 0 > | | |- left java.util.TreeMap$Entry @ > 0x7f01053d34b0 > | 64 | 579,650,616 > | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System > Class | 0 > | 0 > | | | |- right java.util.TreeMap$Entry @ > 0x7f01053d34f0 > | 64 | 270,674,784 > | | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 > System Class > | 0 | 0 > | | | | |- left java.util.TreeMap$Entry @ > 0x7f01053c7568 > | 64 | 162,321,936 > | | | | |- parent java.util.TreeMap$Entry @ > 0x7f01053d34b0 > | 64 | 579,650,616 > | | | | |- right java.util.TreeMap$Entry @ > 0x7f01054cbbe8 > | 64 | 107,828,656 > | | | | |- value org.apache.hadoop.hbase.HServerInfo @ > 0x7f010f6866c0 > | 72 | 154,328 > | | | | | |- <class> class org.apache.hadoop.hbase.HServerInfo @ > 0x7f013c61e3e0 > | 8 | 8 > | | | | | |- load org.apache.hadoop.hbase.HServerLoad @ > 0x7f010540a548 > | 40 | 153,776 > | | | | | |- serverName java.lang.String @ 0x7f010540a9a8 > cloud138,60020,1300161207678 > | 40 | 120 > | | | | | |- hostname java.lang.String @ 0x7f010540ab60 > cloud138 > | 40 | 80 > | | | | | |- serverAddress org.apache.hadoop.hbase.HServerAddress @ > 0x7f01054c3020 | > 32 | 280 > | | | | | '- Total: 5 > entries > | | > | | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > 0x7f010f77bd68 > | 88 | 3,200 > | | | | '- Total: 6 > entries > | | > | | | |- parent java.util.TreeMap$Entry @ > 0x7f010542b790 > | 64 | 950,435,408 > | | | |- left java.util.TreeMap$Entry @ > 0x7f0105432b70 > | 64 | 307,135,480 > | | | | |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 > System Class > | 0 | 0 > | | | | |- parent java.util.TreeMap$Entry @ > 0x7f01053d34b0 > | 64 | 579,650,616 > | | | | |- left java.util.TreeMap$Entry @ > 0x7f01054512f8 > | 64 | 139,023,720 > | | | | |- right java.util.TreeMap$Entry @ > 0x7f0105681960 > | 64 | 167,467,512 > | | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > 0x7f0112027ca8 > | 88 | 3,200 > | | | | |- value org.apache.hadoop.hbase.HServerInfo @ > 0x7f01123a1188 > | 72 | 184,040 > | | | | '- Total: 6 > entries > | | > | | | |- key org.apache.hadoop.hbase.HRegionInfo @ > 0x7f010804cdc0 > | 88 | 3,200 > | | | |- value org.apache.hadoop.hbase.HServerInfo @ > 0x7f01080e00b0 > | 72 | 220,672 > | | | '- Total: 6 > entries > | | > | | |- right java.util.TreeMap$Entry @ > 0x7f0105426ff0 > | 64 | 366,632,232 > | | |- value org.apache.hadoop.hbase.HServerInfo @ > 0x7f010a1689e8 > | 72 | 192,552 > | | |- key org.apache.hadoop.hbase.HRegionInfo @ > 0x7f010ae01598 > | 88 | 3,200 > | | '- Total: 5 > entries > | | > | '- Total: 2 > entries > | | > |- executorService org.apache.hadoop.hbase.executor.ExecutorService @ > 0x7f010531ede0 > | 40 | 5,792 > '- Total: 12 > entries > | | > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > We have over 7600 regions. It looks like AssignmentManager.regions keeps a > <HRegionInfo,HServerInfo> > pair for each region, and more over, even we have only four region servers > in our environment, each > <HRegionInfo,HServerInfo> pair has its own instance of HServerInfo, which > is about hundrads of thousand > bytes per instance. It looks like most of the memory of HServerInfo are to > contain RegionLoads for each > region. Then the space requirement is cM x M, where M stands for the number > of region. I'm not clear > if my analysis is correct, and if so, we should take the issue into account > while doing capacity schedule > for the master, right? > > Thanks again for your patience. > > Mao Xu-Feng > > > On Wed, Mar 16, 2011 at 1:41 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Inline. >> >> J-D >> >> On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <[email protected]> wrote: >> > Thanks J-D for your reply. >> > >> > It looks like HBASE-3617 will be included in 0.92, then when will 0.92 >> be >> > released? >> >> It should be included in the bug fix release 0.90.2, which isn't >> scheduled at the moment. Historically, HBase never had a tight >> schedule and releases are made whenever a committer feels like there's >> enough fixed jiras and gathers enough votes. >> >> > >> > Yes, you're right, we launched tens of threads, putting values of 4MB on >> > average, endless. >> > Does the region server meant to die because of OOM? I thought it's >> region >> > servers' >> > responsibilty to flush memory stores into HFDS, the limitation while >> doing >> > insertion endlessly >> > should be the size of HDFS, rather than java heap memory(we set 4GB java >> > heap for region >> > server). >> >> Yes, the RS does control the MemStores. What it doesn't control very >> well is all the queries that are in flight, plus the heap required to >> do compactions, plus the data copied when flushing, plus all the other >> small tidbits all over the place. Just as an example, every value that >> you insert first has to be copied from the socket before it can be >> inserted into the MemStore. If you are using a big write buffer, that >> means that every insert currently in flight in a region server takes >> double that amount of space. >> >> Garbage collection also isn't done as soon as the objects aren't used, >> that wouldn't make sense given how it works, so there's space occupied >> by dead objects. >> >> The jira tracking the handling of OOMEs in HBase is >> https://issues.apache.org/jira/browse/HBASE-2506 >> >> > >> > Today, we cleaned up the HDFS, rerun the stress tests, I mean inserting >> > endlessly. >> > With java memory monitor tools, like jconsole, we find that the java >> heap of >> > master >> > is also keeping increasing, another OOM is expected now, though not >> happened >> > so far. >> > Is the master meant to die in this regarding? >> >> I think your monitoring is a bit naive, memory isn't cleaned as soon >> as it's unused, that's not how the garbage collector works. Your OOME >> in the master happens after a region server died because it's trying >> to load too much data into memory. >> >> > >> > Our keys are SHA1 hashed, which should spread uniformly. But from the >> web >> > page(master:60010), >> > we can see most requests are handled only by one region server, and in >> the >> > master log, >> > there are lots of region split, and eventually, the regions are spreaded >> > uniformly among the region >> > servers, is this workflow correct? >> >> That's how it works. There's always one region in the beginning and >> then it's split organically. You can create your tables pre-splitted >> with this HBaseAdmin method: >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor >> , >> byte[][]) >> >> Or instead of trying to force your data into HBase, you could use the >> bulk loader: http://hbase.apache.org/bulk-loads.html >> >> > >> > Thanks again for your time, J-D. >> > >> > Mao Xu-Feng >> > >> > >
