[jira] [Commented] (KAFKA-3777) Extract the LRU cache out of RocksDBStore

2016-06-03 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314561#comment-15314561
 ] 

Guozhang Wang commented on KAFKA-3777:
--

Adding to what Jay mentioned, we are already disabling the WAL logging of 
RocksDB in Kafka Streams.

> Extract the LRU cache out of RocksDBStore
> -
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3777) Extract the LRU cache out of RocksDBStore

2016-06-02 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313382#comment-15313382
 ] 

Jay Kreps commented on KAFKA-3777:
--

Probably everyone already knows but just want to capture a couple of goals of 
that cache originally:
1. Avoid serialization and deserialization with the read AND write to/from 
RocksDB
2. Avoid the write to kafka and rocksdb for duplicate updates
3. Allow doing larger batch writes to rocksdb which seemed to significantly cut 
down on overhead

A long time ago in Samza we had done some benchmarking of each of these. At 
that time (3) was a big across the board impact, and (1) and (2) really depend 
on the cachability of the read and write stream.

I also think that this was done originally with leveldb and then an early 
version of rocksdb and many things may have changed since then, particularly, 
(a) I am not sure if the batch write thing matters any more, (b) I think they 
added a way to disable logging entirely which should be a big win as long as we 
handle recovery from kafka in cases of unclean shutdown.

> Extract the LRU cache out of RocksDBStore
> -
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)