Re: Are RocksDBWindowStore windows hopping or sliding?

2020-02-26 Thread Matthias J. Sax
What you call "sliding window" is called "hopping window" in Kafka Streams. And yes, you can use a windowed-store for this case: In fact, a non-overlapping tumbling window is just a special case of a hopping window with advance == window-size. In Kafka Streams we have a single implementation for

Re: when to expand cluster

2020-02-26 Thread Peter Bukowinski
The effect for producers isn’t very significant once your topic partition count exceeds your broker count. For consumers — especially if you are using consumer groups — the more partitions you have, the more consumer instances you can have in a single consumer group. (The maximum number of

Re: KStream.groupByKey().aggregate() is always emitting record, even if nothing changed

2020-02-26 Thread Sachin Mittal
Hi, Yes using filter with transformValues would also work. I have a question out of curiosity. which one would be more efficient? stream.transform(/*return null for records that don't need to be forwarded downstream*/) or stream.transformValues(/*return null for values that don't need to be

PCF to kafka message sending

2020-02-26 Thread Sunil CHAUDHARI
Hi, We have one case where we want to send messages from PCF to Kafka endpoints. Is it possible? How? Regards, Sunil. CONFIDENTIAL NOTE: The information contained in this email is intended only for the use of the individual or entity named above and may contain information that is privileged,

Re: when to expand cluster

2020-02-26 Thread 张祥
Thanks. What influence does it have for consumers and producers when partition number is more than broker number, which means at least one broker serves two partitions for one topic ? performance wise. Peter Bukowinski 于2020年2月26日周三 下午11:02写道: > Disk usage is one reason to expand. Another

Are RocksDBWindowStore windows hopping or sliding?

2020-02-26 Thread Sachin Mittal
Hi, So far how I have understood is that when we create a rocksdb window store; we specify a window size and retention period. So windows are created from epoch time based on size, say size if 100 then windows are: [0, 100), [100, 200), [200, 300) ... Windows are retained based on retention

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Manikumar
Congrats Konstantine! On Thu, Feb 27, 2020 at 7:46 AM Matthias J. Sax wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > Congrats! > > On 2/27/20 2:21 AM, Jeremy Custenborder wrote: > > Congrats Konstantine! > > > > On Wed, Feb 26, 2020 at 2:39 PM Bill Bejeck > > wrote: > >> > >>

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Congrats! On 2/27/20 2:21 AM, Jeremy Custenborder wrote: > Congrats Konstantine! > > On Wed, Feb 26, 2020 at 2:39 PM Bill Bejeck > wrote: >> >> Congratulations Konstantine! Well deserved. >> >> -Bill >> >> On Wed, Feb 26, 2020 at 5:37 PM Jason

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Jeremy Custenborder
Congrats Konstantine! On Wed, Feb 26, 2020 at 2:39 PM Bill Bejeck wrote: > > Congratulations Konstantine! Well deserved. > > -Bill > > On Wed, Feb 26, 2020 at 5:37 PM Jason Gustafson wrote: > > > The PMC for Apache Kafka has invited Konstantine Karantasis as a committer > > and we > > are

Re: Use a single consumer or create consumer per topic

2020-02-26 Thread Mark Zang
I am thinking of consumer count, not consuming group. What is the pros and cons to use one consumer one topic V.S. one consumer 50 topics? Ryanne Dolan 于2020年2月27日周四 上午12:48写道: > On an older cluster like that, rebalances will stop-the-world and kill your > throughput. Much better to have a

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Guozhang Wang
Congrats Konstantine! Guozhang On Wed, Feb 26, 2020 at 3:09 PM John Roesler wrote: > Congrats, Konstantine! Awesome news. > -John > > On Wed, Feb 26, 2020, at 16:39, Bill Bejeck wrote: > > Congratulations Konstantine! Well deserved. > > > > -Bill > > > > On Wed, Feb 26, 2020 at 5:37 PM Jason

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread John Roesler
Congrats, Konstantine! Awesome news. -John On Wed, Feb 26, 2020, at 16:39, Bill Bejeck wrote: > Congratulations Konstantine! Well deserved. > > -Bill > > On Wed, Feb 26, 2020 at 5:37 PM Jason Gustafson wrote: > > > The PMC for Apache Kafka has invited Konstantine Karantasis as a committer > >

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Bill Bejeck
Congratulations Konstantine! Well deserved. -Bill On Wed, Feb 26, 2020 at 5:37 PM Jason Gustafson wrote: > The PMC for Apache Kafka has invited Konstantine Karantasis as a committer > and we > are pleased to announce that he has accepted! > > Konstantine has contributed 56 patches and helped

[ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread Jason Gustafson
The PMC for Apache Kafka has invited Konstantine Karantasis as a committer and we are pleased to announce that he has accepted! Konstantine has contributed 56 patches and helped to review even more. His recent work includes a major overhaul of the Connect task management system in order to

Securing Kafka with zookeeper 3.5.5+ and mTLS

2020-02-26 Thread Dima Brodsky
Hi, I was just wondering if the following article: https://docs.confluent.io/current/kafka/incremental-security-upgrade.html is still valid when using Zookeeper 3.5.5 with mTLS rather than kerberos? If it is still valid, what principle is used for the ACL? Thanks! ttyl Dima --

Re: Metrics for Topic/Partition size

2020-02-26 Thread Richard Rossel
awesome, thanks Gabriele On Wed, Feb 26, 2020 at 1:24 PM Gabriele Paggi wrote: > > Hi Richard, > > Yes, it's the size in bytes for all log segments for a given > topic/partition on a given broker, without the index files: > > [gpaggi@kafkalog001 ~]$ kafka-log-dirs.sh --bootstrap-server >

Re: Metrics for Topic/Partition size

2020-02-26 Thread Gabriele Paggi
Hi Richard, Yes, it's the size in bytes for all log segments for a given topic/partition on a given broker, without the index files: [gpaggi@kafkalog001 ~]$ kafka-log-dirs.sh --bootstrap-server $(hostname -f):9092 --describe --broker-list 1 --topic-list access_logs | tail -n+3 | jq

Re: Error handling guarantees in Kafka Streams

2020-02-26 Thread Bruno Cadonna
Hi Magnus, with exactly-once, the producer commits the consumer offsets. Thus, if the producer is not able to successfully commit a transaction, no consumer offsets will be successfully committed, too. Best, Bruno On Wed, Feb 26, 2020 at 1:51 PM Reftel, Magnus wrote: > > Hi, > > From my

Re: Use a single consumer or create consumer per topic

2020-02-26 Thread Ryanne Dolan
On an older cluster like that, rebalances will stop-the-world and kill your throughput. Much better to have a bunch of consumer groups, one per topic, so they can rebalance independently. On Wed, Feb 26, 2020, 1:05 AM Mark Zang wrote: > Hi, > > I have a 20 brokers kafka cluster and there are

Re: Use a single consumer or create consumer per topic

2020-02-26 Thread Boyang Chen
Hey Mark, you could use a consumer group (check the consumer #subscribe API) to consume from 50 topics in a dynamic fashion, as long as the data processing function is the same for all the records. Consumer group could provide basic guarantees for balancing the number of partitions for each

Re: when to expand cluster

2020-02-26 Thread Peter Bukowinski
Disk usage is one reason to expand. Another reason is if you need more ingest or output throughout for your topic data. If your producers aren’t able to send data to kafka fast enough or your consumers are lagging, you might benefit from more brokers and more partitions. -- Peter > On Feb 26,

Connect: store Statuses, Configs and Offsets in a separate Kafka cluster

2020-02-26 Thread Dimitar Petrov
Hi, Can Kafka Connect be configured to store Statuses, Configs and Offsets in a separate Kafka cluster, different from the one which acts as a source/sink? Thanks

Re: How to write data from kafka to CSV file on a Mac

2020-02-26 Thread Richard Rossel
By default the output goes to standard output, but you can redirect that to a file: kafkacat -b your.broker.com:yourPORT -t yourtopic -c max-messages > /your/full/path/file.csv On Wed, Feb 26, 2020 at 4:45 AM Doaa K. Amin wrote: > > Hi Richard, > Thanks for answering. Just one more thing: do I

Re: Metrics for Topic/Partition size

2020-02-26 Thread Richard Rossel
Thanks Gabriele, it turned out I didn't have that pattern deployed (facepalm). After deploying it , worked right away. Now I'm struggling with understanding the size metric. Do you know if it's reporting the size (in bytes) of all segments for that broker/topic/partition? I'm trying to compare

Error handling guarantees in Kafka Streams

2020-02-26 Thread Reftel, Magnus
Hi, >From my understanding, it is guaranteed that when a Kafka Streams application >running with the exactly_once processing guarantee receives a record, it will >either finish processing the record (including flushing any records generated >as a direct result of processing the message and

Re: Metrics for Topic/Partition size

2020-02-26 Thread Gabriele Paggi
Hi Richard, The beans path is: kafka.log:name=Size,partition=,topic=,type=Log I don't have a jmx_exporter at hand to test it at the moment but I don't see anything obviously wrong in your config, other than type: GAUGE missing. Did you try browsing the beans with jmxterm before configuring the

Re: How to write data from kafka to CSV file on a Mac

2020-02-26 Thread Doaa K. Amin
Hi Richard, Thanks for answering. Just one more thing: do I add to the command that you've written the path and name of the CSV file that I want to write the data to? Please, advise. Thanks,Doaa. Sent from Yahoo Mail on Android On Tue, Feb 25, 2020 at 6:20 PM, Richard Rossel wrote: you

when to expand cluster

2020-02-26 Thread 张祥
In documentation, it is described how to expand cluster: https://kafka.apache.org/20/documentation.html#basic_ops_cluster_expansion. But I am wondering what the criteria for expand is. I can only think of disk usage threshold. For example, suppose several disk usage exceed 80%. Is this correct and