On Tue, Mar 3, 2015 at 6:26 PM, donhoff_h <[email protected]> wrote: > Hi, Stack > > Yes, what I mean is that the working set will not fit in RAM and so we > consider using SSD. As to the access pattern, no, they are not accessed > randomly. Usually in a loan business, after the pictures are loaded, some > users will first read these pictures to check the correctness of the > information and if they fullfil the requirements of regulation. And then, > after that step, some other users will read the same pictures to verify the > risk. So they are not accessed randomly. > > Ok. Sounds like cache can help. You can cache more if an SSD in the mix.
Yours, St.Ack > And meanwhile, thanks very much for submit the port problem for HBase1.0. > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Stack";<[email protected]>; > 发送时间: 2015年3月4日(星期三) 上午6:48 > 收件人: "Hbase-User"<[email protected]>; > > 主题: Re: BucketCache Configuration Problem > > > > On Tue, Mar 3, 2015 at 1:02 AM, donhoff_h <[email protected]> wrote: > > > Hi, Stack > > > > Still thanks much for your quick reply. > > > > The reason that we don't shrink the heap and allocate the savings to the > > offheap is that we want to cache datablocks as many as possible. The > memory > > size is limited. No matter how much we shrink it can not store so many > > datablocks. So we want to try "FILE" instead of "offheap". And yes, in > this > > situation we are considering using SSD. > > > > > You are saying that your working set will not fit in RAM or a good portion > of your working set? Are the accesses totally random? If so, would it be > better not to cache at all (don't pay the price of cache churn if it is not > doing your serving any good). > > > > > As to my configuration, I have attached them in this mail. Thanks very > > for trying my config. > > > > > Ok. Will try. > > > > > I did not read the post you recommended. I will read it carefully. > Perhaps > > I can make my decision according to this post. > > > > > It tries various configs and tries to do +/-. Hopefully will help > (references another similar study that may help also). > > > > > By the way, I also asked my colleagues to try HBase1.0. But we found that > > we could not start the master and regionserver on the same node. (Because > > we are making tests, we deploy HBASE on a very small cluster. We hope the > > nodes that run master and backup-master also run the regionserver). I > read > > the ref-guide again. And it seems that the master and regionserver use > the > > same port 16020. Is that mean that there is no way that I can deploy the > > master and regionserver on the same node and start them with a simple > > "start-hbase.sh" command? > > > > > I see. It looks like explicity master port has been purged. I think that > is a bug (HBASE-13148). You can set the http/info port but where master > listens for requests is hbase.regionserver.port. You probably have to > start the master first passing a system property for ports to listen on -- > -Dhbase.regionserver.port=XXX should work -- and then start the > regionservers. > > St.Ack > > > > > > Thanks! > > > > ------------------ 原始邮件 ------------------ > > *发件人:* "Stack";<[email protected]>; > > *发送时间:* 2015年3月3日(星期二) 中午1:30 > > *收件人:* "Hbase-User"<[email protected]>; > > *主题:* Re: BucketCache Configuration Problem > > > > On Mon, Mar 2, 2015 at 6:54 PM, donhoff_h <[email protected]> wrote: > > > > > Hi, Stack > > > > > > Thanks very much for your quick reply. > > > > > > It's OK to tell you my app sinario. I am not sure if it is very > extreme. > > :) > > > > > > Recently we consider using HBase to store all the pictures of our bank. > > > For example , the pictures from loan apply & contract, credit card > apply, > > > draft etc. Usually the total count of pictures that the business users > > will > > > access daily is not very high. But since each picture takes a lot of > > space, > > > the total amount of pictures that business users will access daily is > > very > > > high. So we considing using cache to improve the performance. > > > > > > > > Pictures are compressed, right? > > > > > > > > > Since the total space daily accessed is very large and the memory may > not > > > contain so much space, we consider use "FILE" instead of "offheap" for > > the > > > BucketCache. > > > > > > Ok. Is your FILE located on an SSD? If not, FILE is probably a suboptimal > > option. Only reason I could see you favoring FILE is if you want your > > cache to be 'warm' on startup (the FILE can be persisted and reopened > > across restarts) > > > > > > > > > At this situation if we use CBC policy, it will lead to that the memory > > > will only cache the meta blocks and leave the datablocks stored in > > "FILE'. > > > And it seems the memory is not taking full usage, because the total > count > > > of pictures daily accessed is not very high and we may not need to > cache > > > many meta blocks. > > > > > > Right. Onheap, we'll only have the indices and blooms. Offheap or in > FILE, > > we'll have the data blocks. > > > > The issue is that you believe your java heap is going to waste? If so, > > shrink the JVM heap -- less memory to GC -- and allocate the savings to > the > > offheap (or to the os and let it worry about caching). > > > > > > > > > So we want to know if the RAW L1+L2 is better. Maybe it can take full > use > > > of memory and meanwhile cache a lot of datablocks. That's the reason > why > > I > > > tried to setup a RAW L1+L2 configuration. > > > > > > > > You've seen this post I presume: > > https://blogs.apache.org/hbase/entry/comparing_blockcache_deploys > > > > RAW L1+L2 is not tested. Blocks are first cached in L1 and if evicted, > they > > go to L2. Going first into the java heap (L1) and then out to L2 could > make > > for some ugly churn if blocks are being flipped frequently between the > two > > tiers. We used to have a "SlabCache" option and it had a similar policy; > > all it seemed to do in testing was run slow and generate GC garbage so we > > removed it (and passed on testing L1+L2 RAW) > > > > High-level, it sounds like you cannot cache the dataset in memory and > that > > you will have some cache churn over the day; in this case, CBC and a > > shrunken java heap with the savings given over to offheap or the os cache > > would seem to be the way to go? Your fetches will be slower than if you > > could cache it all onheap but you should have a nice GC profile and > fairly > > predicatable latency. > > > > > > > > > By the way, I tried your advice to set > > > hbase.bucketcache.combinedcache.enabled=fales. But the WebUI of the > > > regionserver said that I did not deploy L2 cache. You can see it in the > > > attachement. Is that still caused by my HBase Version? Is RAW L1+L2 > only > > > applicable in HBase1.0? > > > > > > Send me your config and I'll try it here. > > > > > > > > > At last, you said that the refguide still needs to be maintained. I > > > totally understand. :) It is the same in my bank. We also have a lot > of > > > docs that need to be updated. And I shall be very glad if my questions > > can > > > help you to find those locations and can help others. > > > > > > > > Smile. Thanks for being understanding. > > St.Ack > > > > > > > > > Thanks again! > > > > > > > > > ------------------ 原始邮件 ------------------ > > > *发件人:* "Stack";<[email protected]>; > > > *发送时间:* 2015年3月3日(星期二) 凌晨0:13 > > > *收件人:* "Hbase-User"<[email protected]>; > > > *主题:* Re: BucketCache Configuration Problem > > > > > > > > On Mon, Mar 2, 2015 at 6:19 AM, donhoff_h <[email protected]> wrote: > > > > > > > Hi, Stack > > > > > > > > Thanks for your reply and also thanks very much for your reply to my > > > > previous mail "Questions about BucketCache". > > > > > > > > The hbase.bucketcache.bucket.sizes property takes effect. I did not > > > notice > > > > that in your first mail you have told me the name should be > > > > hbase.bucketcache.bucket.sizes, instead of hbase.bucketcache.sizes. I > > > > noticed this point until your last mail. It's my fault. Thanks for > your > > > > patient. > > > > > > > > > > > No harm. Thanks for your patience and writing the list. > > > > > > > > > > > > > As to the relationship between HBASE_OFFHEAPSIZE and > > > > -XX:MaxDirectMemorySize, I followed your advice to find in the > > bin/hbase > > > > the statements that contain HBASE_OFFHEAPSIZE, but I found that there > > > isn't > > > > any statement which contains HBASE_OFFHEAPSIZE. I also tried > > > > "bin/hbase-daemon.sh" and "bin/hbase-config.sh", they don't contain > > > > HBASE_OFFHEAPSIZE either. So I still don't know their relationship. > My > > > > HBase version is 0.98.10. Is HBASE_OFFHEAPSIZE not used in this > > version ? > > > > > > > > > > > This is my fault. The above applies to versions beyond 0.98, not your > > > version. Please pass MaxDirectMemorySize inside HBASE_OPTS. > > > > > > > > > > > > > > > > As to "the pure secondary cache" or "bypass CBC", I mean use > > BucketCache > > > > as a strict L2 cache to the L1 LruBlockCache, ie the Raw L1+L2. The > > point > > > > comes from the reference guide of Apache HBase, which says : "It is > > > > possible to deploy an L1+L2 setup where we bypass the > > CombinedBlockCache > > > > policy and have BucketCache working as a strict L2 cache to the L1 > > > > LruBlockCache. For such a setup, set > > > CacheConfig.BUCKET_CACHE_COMBINED_KEY > > > > to false. In this mode, on eviction from L1, blocks go to L2. When a > > > block > > > > is cached, it is cached first in L1. When we go to look for a cached > > > block, > > > > we look first in L1 and if none found, then search L2. Let us call > this > > > > deploy format, Raw L1+L2." > > > > I want to configure this kind of cache not because the CBC poliy is > not > > > > good, but because I am a tech-leader in a bank. I need to compare > these > > > two > > > > kinds of cache to make a decision for our differenct kinds of apps. > The > > > > reference guide said I can configure in > > > > "CacheConfig.BUCKET_CACHE_COMBINED_KEY", but is there anyway I can > > > > configure in hbase-site.xml? > > > > > > > > > > > I see. > > > > > > BUCKET_CACHE_COMBINED_KEY == hbase.bucketcache.combinedcache.enabled. > > Set > > > it to false in your hbase-site.xml. > > > > > > But lets backup and let me save you some work. Such an 'option' should > be > > > struck from the refguide as an exotic permutation that only in the most > > > extreme of cases would perform better than CBC. Do you have a loading > > where > > > you think this combination would be better suited? If so, would you > mind > > > telling us of it? > > > > > > Meantime, we need to work on posting different versions of our doc so > > > others don't have the difficult time you have had above. We've been > lazy > > up > > > to this because the doc was generally applicable but bucketcache is one > > of > > > the locations where version matters. > > > > > > Yours, > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > Many Thanks! > > > > > > > > > > > > > > > > > > > > ------------------ 原始邮件 ------------------ > > > > 发件人: "Stack";<[email protected]>; > > > > 发送时间: 2015年3月2日(星期一) 下午2:30 > > > > 收件人: "Hbase-User"<[email protected]>; > > > > > > > > 主题: Re: BucketCache Configuration Problem > > > > > > > > > > > > > > > > On Sun, Mar 1, 2015 at 12:57 AM, donhoff_h <[email protected]> wrote: > > > > > > > > > Hi, experts > > > > > > > > > > I am using HBase0.98.10 and have 3 problems about BucketCache > > > > > configuration. > > > > > > > > > > First: > > > > > I read the reference guide of Apache HBase to learn how to config > > > > > BucketCache. I find that when using offheap BucketCache, the > > reference > > > > > guide says that I should config the HBASE_OFFHEAPSIZE , it also > says > > > > that > > > > > I should config -XX:MaxDirectMemorySize. Since these two parameters > > > are > > > > > both related to the DirectMemory, I am confused which one should I > > > > > configure? > > > > > > > > > > > > > > See bin/hbase how HBASE_OFFHEAPSIZE gets interpolated as value of the > > > > -XX:MaxDirectMemorySize passed to java (so set HBASE_OFFHEAPSIZE) > (will > > > fix > > > > the doc so more clear) > > > > > > > > > > > > > > > > > Second: > > > > > I want to know how to configure the BucketCache as a pure > secondary > > > > > cache, which I mean to bypass the CombinedBlockCache policy. I > tried > > > to > > > > > configure as following , but when I go to the regionserver's > webUI, > > I > > > > > found it says "No L2 deployed" > > > > > > > > > > hbase.bucketcache.ioengine=offheap > > > > > hbase.bucketcache.size=200 > > > > > hbase.bucketcache.combinedcache.enabled=false > > > > > > > > > > > > > > What do you mean by pure secondary cache? Which block types do you > want > > > in > > > > the bucketcache? > > > > > > > > Why bypass CBC? We've been trying to simplify bucketcache deploy. > Part > > of > > > > this streamlining has been removing the myriad options because they > > tend > > > to > > > > confuse and give user a few simple choices. Do the options not work > for > > > > you? > > > > > > > > > > > > > > > > > Third: > > > > > I made the following configuration to set the Bucket Sizes. But > from > > > > > regionserver's WebUI, I found that (4+1)K and (8+1)K sizes are > used, > > > > > (64+1)K sizes are not used. What's wrong with my configuration? > > > > > hbase.bucketcache.ioengine=offheap > > > > > hbase.bucketcache.size=200 > > > > > hbase.bucketcache.combinedcache.enabled=true > > > > > hbase.bucketcache.sizes=65536 hfile.block.cache.sizes=65536 I > > > configured > > > > > these two both for I don't know which one is in use now. > > > > > > > > > > > > > > As per previous mail (and HBASE-13125), hfile.block.cache.sizes has > no > > > > effect in 0.98.x. Also per our previous mail "Questions about > > > > BucketCache", isn't it hbase.bucketcache.bucket.sizes that you want? > > > > > > > > You have tried the defaults and they do not fit your access patten? > > > > > > > > Yours, > > > > St.Ack > > > > > > > > > > > > > Many Thanks! > > > > > > > > > > > > > > >
