>>- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does it only work for flushing and not for compaction? I can see from the logs that the file is renamed >>after being written. Does that have something to do with why cacheblocksonwrite isn't working?
Generally, it is a very bad idea to enable caching on read/write during compaction unless your cache size is larger than your data size (which is not a common case). Cache invalidation during compaction is almost inevitable thing, due to a complexity of a potential optimizations in this case. Actually, there are some works (papers) on the Internet particularly dedicated to a smarter cache invalidation algorithms for a LSM - derived storage engines, but engineers, as usual, much more conservative than academia researches and are not eager to implement novel (not battle tested) algorithms. Latency spikes during compaction are normal and inevitable things, at least for HBase and especially, when one deals with S3 or any other cloud storage. S3 read latency can reach seconds sometimes and the only possible mitigation for this huge latency spikes is a very-smart-cache -invalidation-during-compaction algorithm (which does not exist yet). For your case, I would recommend the following settings: *CACHE_BLOOM_BLOCKS_ON_WRITE_KEY = true* *CACHE_INDEX_BLOCKS_ON_WRITE_KEY = true* *CACHE_DATA_BLOCKS_ON_WRITE_KEY = false (bad idea to set it to true)* PREFETCH_BLOCKS_ON_OPEN should be false as well, unless your table is small and your application does this on startup (once) -Vlad On Fri, Sep 20, 2019 at 12:51 PM Jacob LeBlanc <[email protected]> wrote: > Hi HBase Community! > > I have some questions on block caches around how the prefetch and > cacheblocksonwrite settings work. > > In our production environments we've been having some performance issues > with our HBase deployment (HBase 1.4.9 as part of AWS EMR 5.22, with data > backed by S3). > > Looking into the issue, we've discovered that when regions of a particular > table that are under heavy simultaneous write and read load go through a > big compaction, the rpc handler threads will all block while servicing read > requests to the region that was compacted. Here are a few relevant lines > from a log where you can see the compaction happen. I've included a couple > responseTooSlow warnings, but there are many more in the log after this: > > 2019-09-16 15:31:10,204 INFO > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425] > regionserver.HRegion: Starting compaction on a in region > block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7. > 2019-09-16 15:31:10,204 INFO > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425] > regionserver.HStore: Starting compaction of 8 file(s) in a of > block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7. > into > tmpdir=s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/.tmp, > totalSize=3.0 G > 2019-09-16 15:33:55,572 INFO > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425] > dispatch.DefaultMultipartUploadDispatcher: Completed multipart upload of 24 > parts 3144722724 bytes > 2019-09-16 15:33:56,017 INFO > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425] > s3n2.S3NativeFileSystem2: rename > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/.tmp/eede47d55e06454ca72482ce33529669 > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/a/eede47d55e06454ca72482ce33529669 > 2019-09-16 15:34:03,328 WARN > [RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16020] ipc.RpcServer: > (responseTooSlow): > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest)","starttimems":1568648032777,"responsesize":562,"method":"Get","param":"region= > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7., > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14newAcPMKh/dkK2vGxPO1XI > <TRUNCATED>","processingtimems":10551,"client":"172.20.132.45:51168 > ","queuetimems":0,"class":"HRegionServer"} > 2019-09-16 15:34:03,750 WARN > [RpcServer.default.FPBQ.Fifo.handler=35,queue=5,port=16020] ipc.RpcServer: > (responseTooSlow): > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest)","starttimems":1568648032787,"responsesize":565,"method":"Get","param":"region= > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7., > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14nfet675AvHhY4nnKAV2iqu > <TRUNCATED>","processingtimems":10963,"client":"172.20.112.226:52222 > ","queuetimems":0,"class":"HRegionServer"} > > Note those log lines are from a "shortCompactions" thread. This also > happens with major compactions, but I understand we can better control > major compactions by running them manually in off hours if we choose. > > When this occurs we see the numCallsInGeneralQueue metric spike up, and > some threads in our application that service REST API requests get tied up > which causes some 504 gateway timeouts for end users. > > Thread dumps from the region server show that the rpc handler threads are > blocking on an FSInputStream object (the read method is synchronized). Here > is a pastebin of one such dump: https://pastebin.com/Mh0JWx3T > > Because we are running in AWS with data backed by S3 and we expect read > latencies to be larger, we are hosting large bucket caches on the local > disk of the region servers. So our understanding is that after the > compaction, the relevant portions of the bucket cache are invalidated which > is causing read requests to have to go to S3, and these are all trying to > use the same input stream and block each other, and this continues until > eventually the cache is populated enough so that performance returns to > normal. > > In an effort to mitigate the effects of compaction on the cache, we > enabled the hbase.rs.cacheblocksonwrite setting on our region servers. My > understanding was that this would be placing the blocks into the > bucketcache while the new hfile was being written. However, after enabling > this setting we are still seeing the same issue occur. Furthermore, we > enabled the PREFETCH_BLOCKS_ON_OPEN setting on the column family and when > we see this issue occur, one of the threads that is getting blocked from > reading is the prefetching thread. > > Here are my questions: > - Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does it only > work for flushing and not for compaction? I can see from the logs that the > file is renamed after being written. Does that have something to do with > why cacheblocksonwrite isn't working? > - Why are the prefetching threads trying to read the same data? I thought > that would only happen when a region is opened and I confirmed from the > master and region server logs that wasn't happening. Maybe I have a > misunderstanding of how/when prefetching comes into play? > - Speaking more generally, any other thoughts on how we can avoid this > issue? It seems a shame that we have this nicely populated bucketcache that > is somewhat necessary with a slower file system (S3), but that the cache > suddenly gets invalidated because of compactions happening. I'm wary of > turning off compactions altogether during our peak load hours because I > don't want updates to be blocked due to too many store files. > > Thanks for any help on this! > > --Jacob LeBlanc >
