Hey Jason, Looks like approximately the right track. A few notes, thouhg:
- In the RPC implementation (TabletServiceImpl::ClearCache) make sure to call rpc->RespondSuccess(). Otherwise the RPC will never respond, and the client will time out (plus you'll leak some memory on the server) - the authorization should probably be SuperUser - in ClearCache, it doesn't seem like you're actually removing the elements from the LRU list itself, so I think you'd likely crash soon after calling the RPC. Also, the DCHECK in ClearCache() doesn't apply in the case that you're triggering it administratively. - ClearCache needs to lock the cache's mutex Aside from the above specific issues, I think it would be good to add some integration testing -- eg calling the ClearCache RPC against a tablet server while a mixed workload is going on, probably with the cache configured to be small enough that there is cache churn. We'd also want to have a command line tool action like 'kudu tserver clear_cache' to trigger this administratively. -Todd On Mon, Apr 17, 2017 at 5:21 AM, Jason Heo <[email protected]> wrote: > Hi, Todd. > > I've temporarily pushed this patch to my repository. > > https://github.com/jason-heo/kudu/commit/aff1fe181541671d2dc192ad9cb4ed > 2172a51826 > > Could you please check I'm on right track? > > It will take more time until pushing to cloudera's gerrit because I have > yet to test if my modification works well and I'm not familiar with the > contributing process <https://kudu.apache.org/docs/contributing.html>. > > Thanks, > > Jason > > 2017-04-11 12:55 GMT+09:00 Todd Lipcon <[email protected]>: > >> Sure. Here's a high-level overview of the approach: >> >> - in src/kudu/util/cache.h, you'll need to add a new method like >> 'ClearCache'. In cache.cc and nvm_cache.cc you'll need to implement the >> method. You could implement it for the NVM cache to just return >> Status::NotSupported() if your main concern is the default (DRAM) cache. >> - in tserver_service.proto, add a new RPC method called 'ClearCache' >> - in tserver.proto, define its request/response protobufs. They can >> probably be empty >> - in tablet_service.h, tablet_service.cc implement the new method. It can >> call through to BlockCache::GetInstance()->ClearCache() and then >> RespondSuccess >> - in tablet_server-test.cc add a test case which exercises this path >> >> Hope that helps >> >> -Todd >> >> On Mon, Apr 10, 2017 at 6:14 PM, Jason Heo <[email protected]> >> wrote: >> >>> Great. I would be appreciated it if you guide me how can I contribute >>> it. Then I'll try in my spare time. >>> >>> 2017-04-11 7:46 GMT+09:00 Todd Lipcon <[email protected]>: >>> >>>> On Sun, Apr 9, 2017 at 6:38 PM, Jason Heo <[email protected]> >>>> wrote: >>>> >>>>> Hi Todd. >>>>> >>>>> I hope you had a good weekend. >>>>> >>>>> Exactly, I'm testing the latency of cold-cache reads from SATA disks >>>>> and performance of difference schema designs as well. >>>>> >>>>> We currently using Elasticsearch for a analytic service. ES has a >>>>> "clear cache API" feature, it makes me easy to test. >>>>> >>>>> >>>> Makes sense. I don't think it would be particularly difficult to add >>>> such an API. Any interest in contributing a patch? I'm happy to point you >>>> in the right direction, if so. >>>> >>>> -Todd >>>> >>>> >>>>> 2017-04-08 5:05 GMT+09:00 Todd Lipcon <[email protected]>: >>>>> >>>>>> Hey Jason, >>>>>> >>>>>> Can I ask what the purposes of the testing is? >>>>>> >>>>>> One thing to note is that we're currently leaving a fair bit of >>>>>> performance on the table for cold-cache reads from spinning disks. So, if >>>>>> you find that the performance is not satisfactory, it's worth being aware >>>>>> that we will likely make some significant improvements in this area in >>>>>> the >>>>>> future. >>>>>> >>>>>> https://issues.apache.org/jira/browse/KUDU-1289 has some details. >>>>>> >>>>>> -Todd >>>>>> >>>>>> On Fri, Apr 7, 2017 at 8:44 AM, Dan Burkert <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Jason, >>>>>>> >>>>>>> There is no command to have Kudu evict its block cache, but >>>>>>> restarting the tablet server process will have that effect. Ideally all >>>>>>> written data will be flushed before the restart, otherwise >>>>>>> startup/bootstrap will take a bit longer. Flushing typically happens >>>>>>> within >>>>>>> 60s of the last write. Waiting for flush and compaction is also a >>>>>>> best-practice for read-only benchmarks. I'm not sure if someone else on >>>>>>> the list has an easier way of determining when a flush happens, but I >>>>>>> typically look at the 'MemRowSet' memory usage for the tablet on the >>>>>>> /mem-trackers HTTP endpoint; it should show something minimal like 256B >>>>>>> if >>>>>>> it's fully flushed and empty. You can also see details about how much >>>>>>> memory is in the block cache on that page, if that interests you. >>>>>>> >>>>>>> - Dan >>>>>>> >>>>>>> On Thu, Apr 6, 2017 at 11:23 PM, Jason Heo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi. >>>>>>>> >>>>>>>> I'm using Apache Kudu 1.2 on CDH 5.10. >>>>>>>> >>>>>>>> Currently, I'm doing a performance test of Kudu. >>>>>>>> >>>>>>>> Flushing OS Page Cache is easy, but I don't know how to flush >>>>>>>> `block_cache_capacity_mb` easily. >>>>>>>> >>>>>>>> I currently execute SELECT statement over a unnecessarily table to >>>>>>>> evict cached block of testing table. >>>>>>>> >>>>>>>> It is cumbersome, so I'd like to know is there a command for >>>>>>>> flushing block caches (or another kudu's caches which I don't know yet) >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Jason >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Todd Lipcon >>>>>> Software Engineer, Cloudera >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > -- Todd Lipcon Software Engineer, Cloudera
