回复： BucketCache Configuration Problem

donhoff_h Tue, 03 Mar 2015 18:28:18 -0800

Hi, Stack

Yes, what I mean is that the working set will not fit in RAM and so we consider 
using SSD. As to the access pattern, no, they are not accessed randomly. 
Usually in a loan business, after the pictures are loaded, some users will 
first read these pictures to check the correctness of the information and if 
they fullfil the requirements of regulation. And then, after that step, some 
other users will read the same pictures to verify the risk. So they are not 
accessed randomly.


And meanwhile, thanks very much for submit the port problem for HBase1.0.




------------------ 原始邮件 ------------------
发件人: "Stack";<[email protected]>;
发送时间: 2015年3月4日(星期三) 上午6:48
收件人: "Hbase-User"<[email protected]>; 

主题: Re: BucketCache Configuration Problem



On Tue, Mar 3, 2015 at 1:02 AM, donhoff_h <[email protected]> wrote:

> Hi, Stack
>
> Still thanks much for your quick reply.
>
> The reason that we don't shrink the heap and allocate the savings to the
> offheap is that we want to cache datablocks as many as possible. The memory
> size is limited. No matter how much we shrink it can not store so many
> datablocks. So we want to try "FILE" instead of "offheap". And yes, in this
> situation we are considering using SSD.
>
>
You are saying that your working set will not fit in RAM or a good portion
of your working set?  Are the accesses totally random?  If so, would it be
better not to cache at all (don't pay the price of cache churn if it is not
doing your serving any good).



> As to my configuration, I have attached them in this mail.  Thanks very
> for trying my config.
>
>
Ok. Will try.



> I did not read the post you recommended. I will read it carefully. Perhaps
> I can make my decision according to this post.
>
>
It tries various configs and tries to do +/-. Hopefully will help
(references another similar study that may help also).



> By the way, I also asked my colleagues to try HBase1.0. But we found that
> we could not start the master and regionserver on the same node. (Because
> we are making tests, we deploy HBASE on a very small cluster. We hope the
> nodes that run master and backup-master also run the regionserver).  I read
> the ref-guide again. And it seems that the master and regionserver use the
> same port 16020. Is that mean that there is no way that I can deploy the
> master and regionserver on the same node and start them with a simple
> "start-hbase.sh" command?
>
>
I see.  It looks like explicity master port has been purged.  I think that
is a bug (HBASE-13148). You can set the http/info port but where master
listens for requests is hbase.regionserver.port.  You probably have to
start the master first passing a system property for ports to listen on --
-Dhbase.regionserver.port=XXX should work -- and then start the
regionservers.

St.Ack




> Thanks!
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Stack";<[email protected]>;
> *发送时间:* 2015年3月3日(星期二) 中午1:30
> *收件人:* "Hbase-User"<[email protected]>;
> *主题:* Re: BucketCache Configuration Problem
>
> On Mon, Mar 2, 2015 at 6:54 PM, donhoff_h <[email protected]> wrote:
>
> > Hi, Stack
> >
> > Thanks very much for your quick reply.
> >
> > It's OK to tell you my app sinario. I am not sure if it is very extreme.
> :)
> >
> > Recently we consider using HBase to store all the pictures of our bank.
> > For example , the pictures from loan apply & contract, credit card apply,
> > draft etc. Usually the total count of pictures that the business users
> will
> > access daily is not very high. But since each picture takes a lot of
> space,
> > the total amount of pictures that business users will access daily is
> very
> > high. So we considing using cache to improve the performance.
> >
> >
> Pictures are compressed, right?
>
>
>
> > Since the total space daily accessed is very large and the memory may not
> > contain so much space, we consider use "FILE" instead of "offheap" for
> the
> > BucketCache.
>
>
> Ok. Is your FILE located on an SSD? If not, FILE is probably a suboptimal
> option.  Only reason I could see you favoring FILE is if you want your
> cache to be 'warm' on startup (the FILE can be persisted and reopened
> across restarts)
>
>
>
> > At this situation if we use CBC policy, it will lead to that the memory
> > will only cache the meta blocks and leave the datablocks stored in
> "FILE'.
> > And it seems the memory is not taking full usage, because the total count
> > of pictures daily accessed is not very high and we may not need to cache
> > many meta blocks.
>
>
> Right. Onheap, we'll only have the indices and blooms. Offheap or in FILE,
> we'll have the data blocks.
>
> The issue is that you believe your java heap is going to waste? If so,
> shrink the JVM heap -- less memory to GC -- and allocate the savings to the
> offheap (or to the os and let it worry about caching).
>
>
>
> > So we want to know if the RAW L1+L2 is better. Maybe it can take full use
> > of memory and meanwhile cache a lot of datablocks. That's the reason why
> I
> > tried to setup a RAW L1+L2 configuration.
> >
> >
> You've seen this post I presume:
> https://blogs.apache.org/hbase/entry/comparing_blockcache_deploys
>
> RAW L1+L2 is not tested. Blocks are first cached in L1 and if evicted, they
> go to L2. Going first into the java heap (L1) and then out to L2 could make
> for some ugly churn if blocks are being flipped frequently between the two
> tiers. We used to have a "SlabCache" option and it had a similar policy;
> all it seemed to do in testing was run slow and generate GC garbage so we
> removed it (and passed on testing L1+L2 RAW)
>
> High-level, it sounds like you cannot cache the dataset in memory and that
> you will have some cache churn over the day; in this case, CBC and a
> shrunken java heap with the savings given over to offheap or the os cache
> would seem to be the way to go?  Your fetches will be slower than if you
> could cache it all onheap but you should have a nice GC profile and fairly
> predicatable latency.
>
>
>
> > By the way, I tried your advice to set
> > hbase.bucketcache.combinedcache.enabled=fales. But the WebUI of the
> > regionserver said that I did not deploy L2 cache. You can see it in the
> > attachement. Is that still caused by my HBase Version?  Is RAW L1+L2 only
> > applicable in HBase1.0?
> >
> > Send me your config and I'll try it here.
>
>
>
> > At last, you said that the refguide still needs to be maintained. I
> > totally understand.  :) It is the same in my bank. We also have a lot of
> > docs that need to be updated. And I shall be very glad if my questions
> can
> > help you to find those locations and can help others.
> >
> >
> Smile. Thanks for being understanding.
> St.Ack
>
>
>
> > Thanks again!‍
> >
> >
> > ------------------ 原始邮件 ------------------
> > *发件人:* "Stack";<[email protected]>;
> > *发送时间:* 2015年3月3日(星期二) 凌晨0:13
> > *收件人:* "Hbase-User"<[email protected]>;
> > *主题:* Re: BucketCache Configuration Problem
>
> >
> > On Mon, Mar 2, 2015 at 6:19 AM, donhoff_h <[email protected]> wrote:
> >
> > > Hi, Stack
> > >
> > > Thanks for your reply and also thanks very much for your reply to my
> > > previous mail "Questions about BucketCache".
> > >
> > > The hbase.bucketcache.bucket.sizes property takes effect. I did not
> > notice
> > > that in your first mail you have told me the name should be
> > > hbase.bucketcache.bucket.sizes, instead of hbase.bucketcache.sizes. I
> > > noticed this point until your last mail. It's my fault. Thanks for your
> > > patient.
> > >
> > >
> > No harm. Thanks for your patience and writing the list.
> >
> >
> >
> > > As to the relationship between HBASE_OFFHEAPSIZE and
> > > -XX:MaxDirectMemorySize, I followed your advice to find in the
> bin/hbase
> > > the statements that contain HBASE_OFFHEAPSIZE, but I found that there
> > isn't
> > > any statement which contains HBASE_OFFHEAPSIZE. I also tried
> > > "bin/hbase-daemon.sh" and "bin/hbase-config.sh", they don't contain
> > > HBASE_OFFHEAPSIZE either. So I still don't know their relationship. My
> > > HBase version is 0.98.10. Is HBASE_OFFHEAPSIZE not used in this
> version ?
> > >
> > >
> > This is my fault. The above applies to versions beyond 0.98, not your
> > version. Please pass MaxDirectMemorySize inside HBASE_OPTS.
> >
> >
> >
> >
> > > As to "the pure secondary cache" or "bypass CBC", I mean use
> BucketCache
> > > as a strict L2 cache to the L1 LruBlockCache, ie the Raw L1+L2. The
> point
> > > comes from the reference guide of Apache HBase, which says : "It is
> > > possible to deploy an L1+L2 setup where we bypass the
> CombinedBlockCache
> > > policy and have BucketCache working as a strict L2 cache to the L1
> > > LruBlockCache. For such a setup, set
> > CacheConfig.BUCKET_CACHE_COMBINED_KEY
> > > to false. In this mode, on eviction from L1, blocks go to L2. When a
> > block
> > > is cached, it is cached first in L1. When we go to look for a cached
> > block,
> > > we look first in L1 and if none found, then search L2. Let us call this
> > > deploy format, Raw L1+L2."
> > > I want to configure this kind of cache not because the CBC poliy is not
> > > good, but because I am a tech-leader in a bank. I need to compare these
> > two
> > > kinds of cache to make a decision for our differenct kinds of apps. The
> > > reference guide said I can configure in
> > > "CacheConfig.BUCKET_CACHE_COMBINED_KEY‍", but is there anyway I can
> > > configure in hbase-site.xml?
> > >
> > >
> > I see.
> >
> > BUCKET_CACHE_COMBINED_KEY == hbase.bucketcache.combinedcache.enabled.
> Set
> > it to false in your hbase-site.xml.
> >
> > But lets backup and let me save you some work. Such an 'option' should be
> > struck from the refguide as an exotic permutation that only in the most
> > extreme of cases would perform better than CBC. Do you have a loading
> where
> > you think this combination would be better suited? If so, would you mind
> > telling us of it?
> >
> > Meantime, we need to work on posting different versions of our doc so
> > others don't have the difficult time you have had above. We've been lazy
> up
> > to this because the doc was generally applicable but bucketcache is one
> of
> > the locations where version matters.
> >
> > Yours,
> > St.Ack
> >
> >
> >
> >
> >
> >
> > > Many Thanks!‍
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Stack";<[email protected]>;
> > > 发送时间: 2015年3月2日(星期一) 下午2:30
> > > 收件人: "Hbase-User"<[email protected]>;
> > >
> > > 主题: Re: BucketCache Configuration Problem
> > >
> > >
> > >
> > > On Sun, Mar 1, 2015 at 12:57 AM, donhoff_h <[email protected]> wrote:
> > >
> > > > Hi, experts
> > > >
> > > > I am using HBase0.98.10 and have 3 problems about BucketCache
> > > > configuration.
> > > >
> > > > First:
> > > > I  read the reference guide of Apache HBase to learn how to config
> > > > BucketCache. I find that when using offheap BucketCache, the
> reference
> > > > guide says that I should config the HBASE_OFFHEAPSIZE , it also says
> > > that
> > > > I should config -XX:MaxDirectMemorySize. Since these two parameters
> > are
> > > > both related to the DirectMemory, I am confused which one should I
> > > > configure?
> > > >
> > > >
> > > See bin/hbase how HBASE_OFFHEAPSIZE gets interpolated as value of the
> > > -XX:MaxDirectMemorySize passed to java (so set HBASE_OFFHEAPSIZE) (will
> > fix
> > > the doc so more clear)
> > >
> > >
> > >
> > > > Second:
> > > > I want to know how to configure the  BucketCache as a pure secondary
> > > > cache, which I mean to bypass the  CombinedBlockCache policy. I tried
> > to
> > > > configure as following , but when I  go to the regionserver's webUI,
> I
> > > > found it says "No L2 deployed"
> > > >
> > > > hbase.bucketcache.ioengine=offheap
> > > > hbase.bucketcache.size=200
> > > > hbase.bucketcache.combinedcache.enabled=false
> > > >
> > > >
> > > What do you mean by pure secondary cache? Which block types do you want
> > in
> > > the bucketcache?
> > >
> > > Why bypass CBC? We've been trying to simplify bucketcache deploy. Part
> of
> > > this streamlining has been removing the myriad options because they
> tend
> > to
> > > confuse and give user a few simple choices. Do the options not work for
> > > you?
> > >
> > >
> > >
> > > > Third:
> > > > I  made the following configuration to set the Bucket Sizes. But from
> > > > regionserver's WebUI, I found that (4+1)K and (8+1)K sizes are used,
> > > > (64+1)K sizes are not used. What's wrong with my configuration?
> > > > hbase.bucketcache.ioengine=offheap
> > > > hbase.bucketcache.size=200
> > > > hbase.bucketcache.combinedcache.enabled=true
> > > > hbase.bucketcache.sizes=65536  hfile.block.cache.sizes=65536 I
> > configured
> > > > these two both for I don't  know which one is in use now.
> > > >
> > > >
> > > As per previous mail (and HBASE-13125), hfile.block.cache.sizes has no
> > > effect in 0.98.x.  Also per our previous mail "Questions about
> > > BucketCache", isn't it hbase.bucketcache.bucket.sizes that you want?
> > >
> > > You have tried the defaults and they do not fit your access patten?
> > >
> > > Yours,
> > > St.Ack
> > >
> > >
> > > > Many Thanks!‍
> > >
> >
> >
>
>

回复： BucketCache Configuration Problem

Reply via email to