Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-09-07 Thread Bruno Cadonna
Hi all, FYI: I removed RocksDB metric estimate-oldest-key-time [1] from this KIP since I was not able to access the corresponding property on RocksDB. I regard this as a minor change since this property is only available for FIFO compaction with compaction_options_fifo.allow_compaction = false

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-27 Thread Guozhang Wang
Thanks Bruno, That is clear. I think my misunderstanding is that I thought compaction is done by loading sst files into block cache (not OS), do the merge-sort and then write back to sst. But on a second thought I agree that it is not a good use case fitted for caching anyways. Using non-cached OS

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-27 Thread Bruno Cadonna
Hi Guozhang, Do you mean compression or compaction? Regarding compression, I agree with you except for the merge-sorting part. The docs describing what is stored in the block cache can be found under https://github.com/facebook/rocksdb/wiki/Block-Cache. Regarding compaction, my statement in

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-24 Thread Guozhang Wang
Ack, thanks for the clarification folks! Yeah I agree from JVM's point all rocksDB memory are off-heap basically (which makes operations harder, sigh..) Regarding the block cache, my understanding is that by default compressed blocks are in the OS page cache, uncompressed blocks are in the RocksDB

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-24 Thread Bruno Cadonna
Hi Guozhang and Sophie, 1) My understanding is also that the memtables are off-heap (as almost every data structure in RocksDB). According to the docs, if after a write the size of the memtable exceeds option write_buffer_size the memtable is flushed. I would not call it hard bounded since i

Re: Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-23 Thread Sophie Blee-Goldman
Guozhang, Just to clarify, the "heap" for all these objects is actually the C++ heap, not the JVM heap. So in the words of a Java application these would all be considered "off-heap", right? (Of course there are some pointers in the Java heap to the off-heap objects but that size is trivial comp

Re: Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-23 Thread Guozhang Wang
Thanks Bruno, they made sense to me. Regarding the last comment, my main reasoning is that it's better to explain to users the rocksDB memory usage and link the metrics to different categories. Just to kick off this (and also asking for correction of my own understanding :) here's what I read fro

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-22 Thread Bruno Cadonna
Thank you John for the proposal. Indeed I did also not like to extend the RocksDBConfigSetter interface, but couldn't find a way around it. I will remove the interface extension from the KIP and try out your proposal. I need to look into the details but after a first glance, I think we need

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-22 Thread Bruno Cadonna
Hi Guozhang, Thank you for your feedback! I answered inline. Best, Bruno On 21.07.20 00:39, Guozhang Wang wrote: Hello Bruno, Thanks for the updated KIP. I made a pass and here are some comments: 1) What's the motivation of keeping it as INFO while KIP-471 metrics are defined in DEBUG?

Fwd: Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-22 Thread Bruno Cadonna
Hi Guozhang, Thank you for your feedback! I answered inline. Best, Bruno On 21.07.20 00:39, Guozhang Wang wrote: Hello Bruno, Thanks for the updated KIP. I made a pass and here are some comments: 1) What's the motivation of keeping it as INFO while KIP-471 metrics are defined in DEBUG?

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-21 Thread John Roesler
Thanks for the update, Bruno! In addition to Guozhang's feedback, I'm a little concerned about the change to the RocksDBConfigSetter. If I understand the proposal, people would have to separately supply their Cache to the Options parameter in setConfig() and also save it in a field so it can be re

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-20 Thread Guozhang Wang
Hello Bruno, Thanks for the updated KIP. I made a pass and here are some comments: 1) What's the motivation of keeping it as INFO while KIP-471 metrics are defined in DEBUG? 2) Some namings are a bit inconsistent with others and with KIP-471, for example: 2.a) KIP-471 uses "memtable" while in t

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-07-20 Thread Bruno Cadonna
Hi, During the implementation of this KIP and after some discussion about RocksDB metrics, I decided to make some major modifications to this KIP and kick off discussion again. https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+Ro

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-15 Thread Bill Bejeck
Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory reporting will be a welcomed addition. FWIW I also agree to have the metrics reported on a store level. I'm glad you changed the KIP to that effect. -Bill On Wed, May 13, 2020 at 6:24 PM Guozhang Wang wrote: > Hi Bruno

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-13 Thread Guozhang Wang
Hi Bruno, Sounds good to me. I think I'm just a bit more curious to see its impact on performance: as long as we have one INFO level rocksDB metrics, then we'd have to turn on the scheduled rocksdb metrics recorder whereas previously, we can decide to not turn on the recorder at all if all are se

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-13 Thread Sophie Blee-Goldman
Thanks Bruno! I took a look at the revised KIP and it looks good to me. Sophie On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna wrote: > Hi John, > > Thank you for the feedback! > > I agree and I will change the KIP as I stated in my previous e-mail to > Guozhang. > > Best, > Bruno > > On Tue, May

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-13 Thread Bruno Cadonna
Hi John, Thank you for the feedback! I agree and I will change the KIP as I stated in my previous e-mail to Guozhang. Best, Bruno On Tue, May 12, 2020 at 3:07 AM John Roesler wrote: > > Thanks, all. > > If you don’t mind, I’ll pitch in a few cents’ worth. > > In my life I’ve generally found mo

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-13 Thread Bruno Cadonna
Hi Guozhang, The recording level is set on sensor level. At the moment, I do not see any reason why we could not set store-level metrics to INFO. So I would propose putting the metrics on store-level and setting the recording level to INFO in the KIP. If it does not work due to technical issues, w

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-13 Thread Bruno Cadonna
Hi Sophie, Thank you for your feedback! Please find my comments inline. On Mon, May 11, 2020 at 9:29 PM Sophie Blee-Goldman wrote: > > Hey Bruno, > > Thanks for the KIP! I have one high-level concern, which is that we should > consider > reporting these metrics on the per-store level rather tha

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-11 Thread John Roesler
Thanks, all. If you don’t mind, I’ll pitch in a few cents’ worth. In my life I’ve generally found more granular metrics to be more useful, as long as there’s a sane way to roll them up. It does seem nice to see it on the per-store level. For roll-up purposes, the task and thread tags should be

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-11 Thread Guozhang Wang
Hello Sophie / Bruno, I've also thought about the leveling question, and one motivation I had for setting it in instance-level is that we want to expose it in INFO level: today our report leveling is not very finer grained --- which I think is sth. worth itself --- such that one have to either tur

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-11 Thread Sophie Blee-Goldman
Hey Bruno, Thanks for the KIP! I have one high-level concern, which is that we should consider reporting these metrics on the per-store level rather than instance-wide. I know I was the one who first proposed making it instance-wide, so bear with me: While I would still argue that the instance-wi

[DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

2020-05-06 Thread Bruno Cadonna
Hi all, I'd like to discuss KIP-607 that aims to add RocksDB memory usage metrics to Kafka Streams. https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams Best, Bruno