Re: How to flush `block_cache_capacity_mb` easily?

Todd Lipcon Mon, 17 Apr 2017 11:30:40 -0700

Hey Jason,

Looks like approximately the right track. A few notes, thouhg:


- In the RPC implementation (TabletServiceImpl::ClearCache) make sure to
call rpc->RespondSuccess(). Otherwise the RPC will never respond, and the
client will time out (plus you'll leak some memory on the server)
- the authorization should probably be SuperUser
- in ClearCache, it doesn't seem like you're actually removing the elements
from the LRU list itself, so I think you'd likely crash soon after calling
the RPC. Also, the DCHECK in ClearCache() doesn't apply in the case that
you're triggering it administratively.
- ClearCache needs to lock the cache's mutex

Aside from the above specific issues, I think it would be good to add some
integration testing -- eg calling the ClearCache RPC against a tablet
server while a mixed workload is going on, probably with the cache
configured to be small enough that there is cache churn. We'd also want to
have a command line tool action like 'kudu tserver clear_cache' to trigger
this administratively.

-Todd

On Mon, Apr 17, 2017 at 5:21 AM, Jason Heo <[email protected]> wrote:

> Hi, Todd.
>
> I've temporarily pushed this patch to my repository.
>
> https://github.com/jason-heo/kudu/commit/aff1fe181541671d2dc192ad9cb4ed
> 2172a51826
>
> Could you please check I'm on right track?
>
> It will take more time until pushing to cloudera's gerrit because I have
> yet to test if my modification works well and I'm not familiar with the
> contributing process <https://kudu.apache.org/docs/contributing.html>.
>
> Thanks,
>
> Jason
>
> 2017-04-11 12:55 GMT+09:00 Todd Lipcon <[email protected]>:
>
>> Sure. Here's a high-level overview of the approach:
>>
>> - in src/kudu/util/cache.h, you'll need to add a new method like
>> 'ClearCache'. In cache.cc and nvm_cache.cc you'll need to implement the
>> method. You could implement it for the NVM cache to just return
>> Status::NotSupported() if your main concern is the default (DRAM) cache.
>> - in tserver_service.proto, add a new RPC method called 'ClearCache'
>> - in tserver.proto, define its request/response protobufs. They can
>> probably be empty
>> - in tablet_service.h, tablet_service.cc implement the new method. It can
>> call through to BlockCache::GetInstance()->ClearCache() and then
>> RespondSuccess
>> - in tablet_server-test.cc add a test case which exercises this path
>>
>> Hope that helps
>>
>> -Todd
>>
>> On Mon, Apr 10, 2017 at 6:14 PM, Jason Heo <[email protected]>
>> wrote:
>>
>>> Great. I would be appreciated it if you guide me how can I contribute
>>> it. Then I'll try in my spare time.
>>>
>>> 2017-04-11 7:46 GMT+09:00 Todd Lipcon <[email protected]>:
>>>
>>>> On Sun, Apr 9, 2017 at 6:38 PM, Jason Heo <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Todd.
>>>>>
>>>>> I hope you had a good weekend.
>>>>>
>>>>> Exactly, I'm testing the latency of cold-cache reads from SATA disks
>>>>> and performance of difference schema designs as well.
>>>>>
>>>>> We currently using Elasticsearch for a analytic service. ES has a
>>>>> "clear cache API" feature, it makes me easy to test.
>>>>>
>>>>>
>>>> Makes sense. I don't think it would be particularly difficult to add
>>>> such an API. Any interest in contributing a patch? I'm happy to point you
>>>> in the right direction, if so.
>>>>
>>>> -Todd
>>>>
>>>>
>>>>> 2017-04-08 5:05 GMT+09:00 Todd Lipcon <[email protected]>:
>>>>>
>>>>>> Hey Jason,
>>>>>>
>>>>>> Can I ask what the purposes of the testing is?
>>>>>>
>>>>>> One thing to note is that we're currently leaving a fair bit of
>>>>>> performance on the table for cold-cache reads from spinning disks. So, if
>>>>>> you find that the performance is not satisfactory, it's worth being aware
>>>>>> that we will likely make some significant improvements in this area in 
>>>>>> the
>>>>>> future.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/KUDU-1289 has some details.
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Fri, Apr 7, 2017 at 8:44 AM, Dan Burkert <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> There is no command to have Kudu evict its block cache, but
>>>>>>> restarting the tablet server process will have that effect.  Ideally all
>>>>>>> written data will be flushed before the restart, otherwise
>>>>>>> startup/bootstrap will take a bit longer. Flushing typically happens 
>>>>>>> within
>>>>>>> 60s of the last write.  Waiting for flush and compaction is also a
>>>>>>> best-practice for read-only benchmarks.  I'm not sure if someone else on
>>>>>>> the list has an easier way of determining when a flush happens, but I
>>>>>>> typically look at the 'MemRowSet' memory usage for the tablet on the
>>>>>>> /mem-trackers HTTP endpoint; it should show something minimal like 256B 
>>>>>>> if
>>>>>>> it's fully flushed and empty.  You can also see details about how much
>>>>>>> memory is in the block cache on that page, if that interests you.
>>>>>>>
>>>>>>> - Dan
>>>>>>>
>>>>>>> On Thu, Apr 6, 2017 at 11:23 PM, Jason Heo <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> I'm using Apache Kudu 1.2 on CDH 5.10.
>>>>>>>>
>>>>>>>> Currently, I'm doing a performance test of Kudu.
>>>>>>>>
>>>>>>>> Flushing OS Page Cache is easy, but I don't know how to flush
>>>>>>>> `block_cache_capacity_mb` easily.
>>>>>>>>
>>>>>>>> I currently execute SELECT statement over a unnecessarily table to
>>>>>>>> evict cached block of testing table.
>>>>>>>>
>>>>>>>> It is cumbersome, so I'd like to know is there a command for
>>>>>>>> flushing block caches (or another kudu's caches which I don't know yet)
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Jason
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: How to flush `block_cache_capacity_mb` easily?

Reply via email to