Coming up is the following enhancement which would make MSLAB even better: HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB
FYI On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta <[email protected]>wrote: > Thanks a lot for the explanation. It's good to know that MSlab is stable > and safe to enable (we don't have it enable right now, we're using 0.92). > This would allow us to more freely allocate memory to HBase. I really > enjoyed the depth of explanation from both Enis and J-D. I was indeed > mistakenly referring to HFile as HLog, fortunately you were still able > understand my question. > > Thanks, > Pankaj > On Mar 21, 2013, at 1:28 PM, Enis Söztutar <[email protected]> wrote: > > > I think the page cache is not totally useless, but as long as you can > > control the GC, you should prefer the block cache. Some of the reasons of > > the top of my head: > > - In case of a cache hit, for OS cache, you have to go through the DN > > layer (an RPC if ssr disabled), and do a kernel jump, and read using the > > read() libc vs for reading a block from the block cache, only the HBase > > process is involved. There is no process switch involved and no kernel > > jumps. > > - The read access path is optimized per hfile block. FS page boundaries > > and hfile block boundaries are not aligned at all. > > - There is very little control to the page cache to cache / not cache > > based on expected access patterns. For example, we can mark META region > > blocks, and some column families, and hfile index blocks always cached or > > cached with high priority. Also, for full table scans, we can explicitly > > disable block caching to not trash the current working set. With OS page > > cache, you do not have this control. > > > > Enis > > > > > > On Wed, Mar 20, 2013 at 10:30 AM, Jean-Daniel Cryans < > [email protected]>wrote: > > > >> First, MSLAB has been enabled by default since 0.92.0 as it was deemed > >> stable enough. So, unless you are on 0.90, you are already using it. > >> > >> Also, I'm not sure why you are referencing the HLog in your first > >> paragraph in the context of reading from disk, because the HLogs are > >> rarely read (only on recovery). Maybe you meant HFile? > >> > >> In any case, your email covers most arguments except for one: > >> checksumming. Retrieving a block from HDFS, even when using short > >> circuit reads to go directly to the OS instead of passing through the > >> DN, will take quite a bit more time than reading directly from the > >> block cache. This is why even if you disable block caching on a family > >> that the index and root blocks will still be block cached, as reading > >> those very hot blocks from disk would take way too long. > >> > >> Regarding your main question (how does the OS buffer help?), I don't > >> have a good answer. It kind of depends on the amount of RAM you have > >> and what your workload is like. As a data point, I've been successful > >> running with 24GB of heap (50% dedicated to the block cache) with a > >> workload consisting mainly of small writes, short scans, and a typical > >> random read distribution for a website. I can't remember the last time > >> I saw a full GC and it's been running for more than a year like this. > >> > >> Hope this somehow helps, > >> > >> J-D > >> > >> On Wed, Mar 20, 2013 at 12:34 AM, Pankaj Gupta <[email protected]> > >> wrote: > >>> Given that HBase has it's own cache (block cache and bloom filters) and > >> that all the table data is stored in HDFS, I'm wondering if HBase > benefits > >> from OS page cache at all. In the set up I'm using HBase Region Servers > run > >> on the same boxes as the HDFS data node. In such a scenario if the > >> underlying HLog files lives on the same machine then having a healthy > >> memory surplus may mean that the data node can serve underlying file > from > >> page cache and thus improving HBase performance. Is this really the > case? > >> (I guess page cache should also help in case where HLog file lives on a > >> different machine but in that case network I/O will probably drown the > >> speedup achieved due to not hitting the disk. > >>> > >>> I'm asking because if page cache were useful then in an HBase set up > not > >> utilizing all the memory on the machine for the region server may not be > >> that bad. The reason one would not want to use all the memory for region > >> server would be long garbage collection pauses that large heap size may > >> induce. I understand that work has been done to fix the long pauses > caused > >> due to memory fragmentation in the old generation, mostly concurrent > >> garbage collector by using slab cache allocator for memstore but that > >> feature is marked experimental and we're not ready to take risks yet. > So if > >> the page cache was useful in any way on Region Servers we could go with > >> less memory for RegionServer process with the understanding that free > >> memory on the machine is not completely going to waste. Thus my > curiosity > >> about utility of os page cache to performance of HBase. > >>> > >>> Thanks in Advance, > >>> Pankaj > >> > >
