Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-12 Thread Greg Fodor
Ah thanks so much for the insights -- we should be in a position to profile the new library against real data in the next week or so so I'll let you know how it goes. On Oct 11, 2016 6:26 PM, "Guozhang Wang" wrote: > Hello Greg, > > I can share some context of KIP-63 here: >

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-11 Thread Guozhang Wang
Hello Greg, I can share some context of KIP-63 here: 1. Like Eno mentioned, we believe RocksDB's own mem-table is already optimizing a large portion of IO access for its write performance, and adding an extra caching layer on top of that was mainly for saving ser-de costs (note that you still

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-11 Thread Greg Fodor
Thanks Eno -- my understanding is that cache is already enabled to be 100MB per rocksdb so it should be on already, but I'll check. I was wondering if you could shed some light on the changes between 0.10.0 and 0.10.1 -- in 0.10.0 there was an intermediate cache within RocksDbStore -- presumably

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-11 Thread Eno Thereska
Hi Greg, An alternative would be to set up RocksDB's cache, while keeping the streams cache to 0. That might give you what you need, especially if you can work with RocksDb and don't need to change the store. For example, here is how to set the Block Cache size to 100MB and the Write Buffer

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-10 Thread Greg Fodor
Hey Eno, thanks for the suggestion -- understood that my patch is not something that could be accepted given the API change, I posted it to help make the discussion concrete and because i needed a workaround. (Likely we'll maintain this patch internally so we can move forward with the new version,

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-10 Thread Eno Thereska
Hi Greg, Thanks for trying 0.10.1. The best option you have for your specific app is to simply turn off caching by setting the cache size to 0. That should give you the old behaviour: streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0L); Your PR is an alternative, but

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-09 Thread Greg Fodor
JIRA opened here: https://issues.apache.org/jira/browse/KAFKA-4281 On Sun, Oct 9, 2016 at 2:02 AM, Greg Fodor wrote: > I went ahead and did some more testing, and it feels to me one option > for resolving this issue is having a method on KGroupedStream which > can be used to

Re: Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-09 Thread Greg Fodor
I went ahead and did some more testing, and it feels to me one option for resolving this issue is having a method on KGroupedStream which can be used to configure if the operations on it (reduce/aggregate) will forward immediately or not. I did a quick patch and was able to determine that if the

Store flushing on commit.interval.ms from KIP-63 introduces aggregation latency

2016-10-09 Thread Greg Fodor
I'm taking 0.10.1 for a spin on our existing Kafka Streams jobs and I'm hitting what seems to be a serious issue (at least, for us) with the changes brought about in KIP-63. In our job, we have a number of steps in the topology where we perform a repartition and aggregation on topics that require