[ 
https://issues.apache.org/jira/browse/KAFKA-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793994#comment-16793994
 ] 

Sophie Blee-Goldman edited comment on KAFKA-8027 at 3/15/19 10:28 PM:
----------------------------------------------------------------------

Hi [~prashantideal], I have been looking into this and have two PRs aimed at 
improving performance of segmented stores with caching enabled. Would you be 
able to test either or both of them out, and let me know if they improve things 
at all? You can find the first PR 
[here|[https://github.com/apache/kafka/pull/6433]] and the second one 
[here|[https://github.com/apache/kafka/pull/6448]]

Keep in mind these are just improvements to the caching layer and are unlikely 
to result in overall better fetching performance than withCachingDisabled, 
since as you point out for range queries we must search the underlying 
RocksDBStore anyway. If you don't need caching for other reasons (eg reducing 
downstream traffic or writes to RocksDB) and can afford to turn it off, I 
recommend doing so. 


was (Author: ableegoldman):
Hi [~prashantideal], I have been looking into this and have two PRs aimed at 
improving performance of segmented stores with caching enabled. Would you be 
able to test either or both of them out, and let me know if they improve things 
at all? You can find the first PR 
[here|[https://github.com/apache/kafka/pull/6433]] and the second one 
[here|[https://github.com/apache/kafka/pull/6448]]

Keep in mind these are just improvements to the caching layer and are unlikely 
to result in overall better performance than withCachingDisabled, since as you 
point out for range queries we must search the underlying RocksDBStore anyway. 
If you don't need caching for other reasons (eg reducing downstream traffic) 
and can afford to turn it off, I recommend doing so. 

> Gradual decline in performance of CachingWindowStore provider when number of 
> keys grow
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8027
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8027
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.1.0
>            Reporter: Prashant
>            Priority: Major
>              Labels: interactivequ, kafka-streams
>
> We observed this during a performance test of our stream application which 
> tracks user's activity and provides REST interface to query the window state 
> store.  We used default configuration of Materialized i.e. withCachingEnabled 
> for storing user behaviour stats in a window state store 
> (CompositeWindowStore with CachingWindowStore as underlyin which internally 
> uses RocksDBStore for persistent).  
> While querying window store with store.fetch(key, long, long), it internally 
> tries to fetch the range from ThreadCache which uses a byte iterator to 
> search for a key in cache and on a cache miss it goes to RocksDBStore for 
> result. So, when number of keys in cache becomes large this ThreadCache 
> search starts taking time (range Iterator on all keys) which impacts 
> WindowStore query performance.
>  
> Workaround: If we disable cache with switch on Materialized instance i.e. 
> withCachingDisabled, key search is delegated directly to RocksDBStore which 
> is way faster and completed search in microseconds against millis in case of 
> CachingWindowStore.  
>  
> Stats: With Unique users > 0.5M, random search for a key i.e. UserId:
>  
> withCachingEnabled :  40 < t < 80ms (upper bound increases as unique users 
> grow)
> withCahingDisabled: t < 1ms (Almost constant time)      



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to