Re: Does HBase RegionServer benefit from OS Page Cache

Enis Söztutar Mon, 25 Mar 2013 13:27:37 -0700

> With very large heaps and a GC that can handle them (perhaps the G1 GC),
another option which might be worth experimenting with is a value (KV)
cache independent of the block cache which could be enabled on a per-table
basis
Thanks Andy for bringing this up. We've had some discussions some time ago
about a row-cache (or KV cache)
http://search-hadoop.com/m/XTlxT1xRtYw/hbase+key+value+cache+from%253Aenis&subj=RE+keyvalue+cache


The takeaway was that if you are mostly doing point gets, rather than
scans, this cache might be better.

> 1) [HBASE-7404]: L1/L2 block cache
I knew about the Bucket cache, but not that bucket cache could hold
compressed blocks. Is it the case, or are you suggesting we can add that to
this L2 cache.

>  2) [HBASE-5263] Preserving cached data on compactions through
cache-on-write
Thanks, this is the same idea. I'll track the ticket.

Enis


On Mon, Mar 25, 2013 at 12:18 PM, Liyin Tang <[email protected]> wrote:

> Hi Enis,
> Good ideas ! And hbase community is driving on these 2 items.
> 1) [HBASE-7404]: L1/L2 block cache
> 2) [HBASE-5263] Preserving cached data on compactions through
> cache-on-write
>
> Thanks a lot
> Liyin
> ________________________________________
> From: Enis Söztutar [[email protected]]
> Sent: Monday, March 25, 2013 11:24 AM
> To: hbase-user
> Cc: lars hofhansl
> Subject: Re: Does HBase RegionServer benefit from OS Page Cache
>
> Thanks Liyin for sharing your use cases.
>
> Related to those, I was thinking of two improvements:
>  - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
> in its block cache, failing over the compressed one if decompressed one
> gets evicted. With very large heaps, maybe keeping around the compressed
> blocks in a secondary cache makes sense?
>  - A compaction will trash the cache. But maybe we can track keyvalues
> (inside cached blocks are cached) for the files in the compaction, and mark
> the blocks of the resulting compacted file which contain previously cached
> keyvalues to be cached after the compaction. I have to research the
> feasibility of this approach.
>
> Enis
>
>
> On Sun, Mar 24, 2013 at 10:15 PM, Liyin Tang <[email protected]> wrote:
>
> > Block cache is for uncompressed data while OS page contains the
> compressed
> > data. Unless the request pattern is full-table sequential scan, the block
> > cache is still quite useful. I think the size of the block cache should
> be
> > the amont of hot data we want to retain within a compaction cycle, which
> is
> > quite hard to estimate in some use cases.
> >
> >
> > Thanks a lot
> > Liyin
> > ________________________________________
> > From: lars hofhansl [[email protected]]
> > Sent: Saturday, March 23, 2013 10:20 PM
> > To: [email protected]
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > Interesting.
> >
> > > 2) The blocks in the block cache will be naturally invalid quickly
> after
> > the compactions.
> >
> > Should one keep the block cache small in order to increase the OS page
> > cache?
> >
> > Does you data suggest we should not use the block cache at all?
> >
> >
> > Thanks.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Liyin Tang <[email protected]>
> > To: [email protected]
> > Sent: Saturday, March 23, 2013 9:44 PM
> > Subject: Re: Does HBase RegionServer benefit from OS Page Cache
> >
> > We (Facebook) are closely monitoring the OS page cache hit ratio in the
> > production environments. My experience is if your data access pattern is
> > very random, then the OS page cache won't help you so much even though
> the
> > data locality is very high. On the other hand, if the requests are always
> > against the recent data points, then the page cache hit ratio could be
> much
> > higher.
> >
> > Actually, there are lots of optimizations could be done in HDFS. For
> > example, we are working on fadvice away the 2nd/3rd replicated data from
> OS
> > page cache so that it potentially could improve your OS page cache by 3X.
> > Also, by taking advantage of the tiered-based compaction+fadvice in HDFS,
> > the region server could keep more hot data in OS page cache based on the
> > read access pattern.
> >
> > Another separate point is that we probably should NOT reply on the
> > memstore/block cache to keep hot data. 1) The more data in the memstore,
> > the more data the region server need to recovery from the server
> failures.
> > So the tradeoff is the recovery time. 2) The blocks in the block cache
> will
> > be naturally invalid quickly after the compactions. So region server
> > probably won't be benefit from large JVM size at all.
> >
> > Thanks a lot
> > Liyin
> >
> > On Sat, Mar 23, 2013 at 6:13 PM, Ted Yu <[email protected]> wrote:
> >
> > > Coming up is the following enhancement which would make MSLAB even
> > better:
> > >
> > > HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using
> MSLAB
> > >
> > > FYI
> > >
> > > On Sat, Mar 23, 2013 at 5:31 PM, Pankaj Gupta <[email protected]
> > > >wrote:
> > >
> > > > Thanks a lot for the explanation. It's good to know that MSlab is
> > stable
> > > > and safe to enable (we don't have it enable right now, we're using
> > 0.92).
> > > > This would allow us to more freely allocate memory to HBase. I really
> > > > enjoyed the depth of explanation from both Enis and J-D. I was indeed
> > > > mistakenly referring to HFile as HLog, fortunately you were still
> able
> > > > understand my question.
> > > >
> > > > Thanks,
> > > > Pankaj
> > > > On Mar 21, 2013, at 1:28 PM, Enis Söztutar <[email protected]>
> wrote:
> > > >
> > > > > I think the page cache is not totally useless, but as long as you
> can
> > > > > control the GC, you should prefer the block cache. Some of the
> > reasons
> > > of
> > > > > the top of my head:
> > > > > - In case of a cache hit, for OS cache, you have to go through the
> DN
> > > > > layer (an RPC if ssr disabled), and do a kernel jump, and read
> using
> > > the
> > > > > read() libc vs  for reading a block from the block cache, only the
> > > HBase
> > > > > process is involved. There is no process switch involved and no
> > kernel
> > > > > jumps.
> > > > > - The read access path is optimized per hfile block. FS page
> > boundaries
> > > > > and hfile block boundaries are not aligned at all.
> > > > > - There is very little control to the page cache to cache / not
> cache
> > > > > based on expected access patterns. For example, we can mark META
> > region
> > > > > blocks, and some column families, and hfile index blocks always
> > cached
> > > or
> > > > > cached with high priority. Also, for full table scans, we can
> > > explicitly
> > > > > disable block caching to not trash the current working set. With OS
> > > page
> > > > > cache, you do not have this control.
> > > > >
> > > > > Enis
> > > > >
> > > > >
> > > > > On Wed, Mar 20, 2013 at 10:30 AM, Jean-Daniel Cryans <
> > > > [email protected]>wrote:
> > > > >
> > > > >> First, MSLAB has been enabled by default since 0.92.0 as it was
> > deemed
> > > > >> stable enough. So, unless you are on 0.90, you are already using
> it.
> > > > >>
> > > > >> Also, I'm not sure why you are referencing the HLog in your first
> > > > >> paragraph in the context of reading from disk, because the HLogs
> are
> > > > >> rarely read (only on recovery). Maybe you meant HFile?
> > > > >>
> > > > >> In any case, your email covers most arguments except for one:
> > > > >> checksumming. Retrieving a block from HDFS, even when using short
> > > > >> circuit reads to go directly to the OS instead of passing through
> > the
> > > > >> DN, will take quite a bit more time than reading directly from the
> > > > >> block cache. This is why even if you disable block caching on a
> > family
> > > > >> that the index and root blocks will still be block cached, as
> > reading
> > > > >> those very hot blocks from disk would take way too long.
> > > > >>
> > > > >> Regarding your main question (how does the OS buffer help?), I
> don't
> > > > >> have a good answer. It kind of depends on the amount of RAM you
> have
> > > > >> and what your workload is like. As a data point, I've been
> > successful
> > > > >> running with 24GB of heap (50% dedicated to the block cache) with
> a
> > > > >> workload consisting mainly of small writes, short scans, and a
> > typical
> > > > >> random read distribution for a website. I can't remember the last
> > time
> > > > >> I saw a full GC and it's been running for more than a year like
> > this.
> > > > >>
> > > > >> Hope this somehow helps,
> > > > >>
> > > > >> J-D
> > > > >>
> > > > >> On Wed, Mar 20, 2013 at 12:34 AM, Pankaj Gupta <
> > > [email protected]>
> > > > >> wrote:
> > > > >>> Given that HBase has it's own cache (block cache and bloom
> filters)
> > > and
> > > > >> that all the table data is stored in HDFS, I'm wondering if HBase
> > > > benefits
> > > > >> from OS page cache at all. In the set up I'm using HBase Region
> > > Servers
> > > > run
> > > > >> on the same boxes as the HDFS data node. In such a scenario if the
> > > > >> underlying HLog files lives on the same machine then having a
> > healthy
> > > > >> memory surplus may mean that the data node can serve underlying
> file
> > > > from
> > > > >> page cache and thus improving HBase performance. Is this really
> the
> > > > case?
> > > > >> (I guess page cache should also help in case where HLog file lives
> > on
> > > a
> > > > >> different machine but in that case network I/O will probably drown
> > the
> > > > >> speedup achieved due to not hitting the disk.
> > > > >>>
> > > > >>> I'm asking because if page cache were useful then in an HBase set
> > up
> > > > not
> > > > >> utilizing all the memory on the machine for the region server may
> > not
> > > be
> > > > >> that bad. The reason one would not want to use all the memory for
> > > region
> > > > >> server would be long garbage collection pauses that large heap
> size
> > > may
> > > > >> induce. I understand that work has been done to fix the long
> pauses
> > > > caused
> > > > >> due to memory fragmentation in the old generation, mostly
> concurrent
> > > > >> garbage collector by using slab cache allocator for memstore but
> > that
> > > > >> feature is marked experimental and we're not ready to take risks
> > yet.
> > > > So if
> > > > >> the page cache was useful in any way on Region Servers we could go
> > > with
> > > > >> less memory for RegionServer process with the understanding that
> > free
> > > > >> memory on the machine is not completely going to waste. Thus my
> > > > curiosity
> > > > >> about utility of os page cache to performance of HBase.
> > > > >>>
> > > > >>> Thanks in Advance,
> > > > >>> Pankaj
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Does HBase RegionServer benefit from OS Page Cache

Reply via email to