Re: Streams/RocksDB: Why Universal Compaction?

2023-07-27 Thread Guozhang Wang
copious amounts of custom code, perhaps (no promises) my team could > put something together that's more suitable for a general Streams audience > rather than just our own internal usage. > > Cheers, > Colt McNealy > > *Founder, LittleHorse.dev* > > > On Sun, Jul 23,

Re: Streams/RocksDB: Why Universal Compaction?

2023-07-23 Thread Guozhang Wang
Yeah I can shed some light here: I used Universal originally since at the beginning of Kafka Streams journey there were user reports complaining about its storage amplifications. But soon enough (around 2019) I've realized that, as a OOTB config, level compaction may be more preferable. I had a

Re: is exactly-once supported with kafka streams application with external state store like redis

2023-04-03 Thread Guozhang Wang
Hello Pushkar, Unfortunately it is not supported at the moment, see https://issues.apache.org/jira/browse/KAFKA-12475 for some details. Currently the community is working on KAFKA-12549 to support transactional state stores, at that time customized remote stores would be able to provide

Re: [ANNOUNCE] New committer: Stanislav Kozlovski

2023-01-17 Thread Guozhang Wang
Congratulations Stan! Guozhang On Tue, Jan 17, 2023 at 3:24 PM Matthias J. Sax wrote: > Congrats! > > On 1/17/23 1:26 PM, Ron Dagostino wrote: > > Congratulations, Stan! > > > > Ron > > > >> On Jan 17, 2023, at 12:29 PM, Mickael Maison > wrote: > >> > >> Congratulations Stanislav! > >> > >>>

[ANNOUNCE] New Kafka PMC Member: Bruno Cadonna

2022-11-01 Thread Guozhang Wang
to announce that Bruno has agreed to join the Kafka PMC. Congratulations, Bruno! -- Guozhang Wang, on behalf of Apache Kafka PMC

Re: Add to Kafka contributor list

2022-09-05 Thread Guozhang Wang
Hello Vinay, Thanks for your interest in contributing, I've added you to the list. Guozhang On Sun, Sep 4, 2022 at 10:57 PM vinay kumar wrote: > Hi, > > Could you add me to Kafka contributor list please? > > > I want to start contributing to Kafka > > Jira user : vvakati > > Thanks, > Vinay >

Re: Add to Contributors list of Kafka

2022-09-05 Thread Guozhang Wang
Hello Nandini, Thanks for your interests, I've added you to the contributors list. Guozhang On Mon, Sep 5, 2022 at 8:16 AM Nandini Anagondi wrote: > Hi, > > Can you add Nandini Anagondi to the contributor list. > > Thanks, > Nandini A. > -- -- Guozhang

Re: leftjoin not working as expected.

2022-08-09 Thread Guozhang Wang
Hello Chad, Here are a few thoughts on top of my head: for left joins, we would keep those received records from the left side that have NOT found a match on the right side in a separate temporary store (this is only recently improved, but since you're already on 3.2.1 it's the case indeed). When

[ANNOUNCE] New Kafka PMC Member: A. Sophie Blee-Goldman

2022-08-01 Thread Guozhang Wang
this year. It is my pleasure to announce that Sophie agreed to join the Kafka PMC. Congratulations, Sophie! -- Guozhang Wang, on behalf of Apache Kafka PMC

Re: Consume data-skewed partitions using Kafka-streams causes consumer load balancing issue.

2022-07-13 Thread Guozhang Wang
Hello Ankit, Kafka Streams's rebalance protocol is trying to balance workloads based on the num.partitions (more specifically, the num.tasks which is derived from the input partitions) but not on the num.messages or num.bytes, so they would not be able to handle data-skewness across partitions

Re: unsubscribed from all topics when adding a KTable

2022-06-07 Thread Guozhang Wang
Hello Meir, >From the code snippet I cannot find where did you add a KTable, it seems you created a KStream from the source topic, and aggregate the stream into a KTable, could you show me the code difference between "adding a KTable" v.s. "adding a KStream"? Anyways, the log line should only

Re: What role plays transactional.id after KIP-447?

2022-06-02 Thread Guozhang Wang
he general KafkaException. > I think this could happen in both sendOffsetsToTransaction and > commitTransaction right? > > Thanks. > > El mar, 31 may 2022 a las 14:49, Guozhang Wang () > escribió: > > > The CommitFailedException should be expected, since the fencing happens &g

Re: What role plays transactional.id after KIP-447?

2022-05-31 Thread Guozhang Wang
fluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/#client-api-simplification > , > but I was actually expecting a ProducerFencedException instead. Does that > exception only correspond to fencing done by transactional.id? > > Thanks > > El mar, 24 may 2022 a las

Re: What role plays transactional.id after KIP-447?

2022-05-24 Thread Guozhang Wang
his not a > possible scenario, but I can't say what it is. > > Sorry if the question is too confusing. > > > > > > > El mar, 24 may 2022 a las 12:49, Guozhang Wang () > escribió: > > > Hi Gabriel, > > > > What I meant is that with KIP-447, th

Re: What role plays transactional.id after KIP-447?

2022-05-24 Thread Guozhang Wang
per consumer, so the relation is 1 consumer/topic => 1 producer > and the transactional id is set as --. > Do you see any problem with this configuration? > > Thanks again. > > El sáb, 21 may 2022 a las 16:37, Guozhang Wang () > escribió: > > > Hello Gabrie

Re: Request to include me to contributors list

2022-05-23 Thread Guozhang Wang
Thanks for your interest! Please let me know what's your id? Guozhang On Sun, May 22, 2022 at 6:05 AM Kumud Kumar Srivatsava Tirupati < kumudkumartirup...@gmail.com> wrote: > Hi team, > I am willing to work on > https://issues.apache.org/jira/browse/KAFKA-13926 please > add me to the

Re: What role plays transactional.id after KIP-447?

2022-05-21 Thread Guozhang Wang
Hello Gabriel, What you're asking is a very fair question :) In fact, for Streams where the partition-assignment to producer-consumer pairs are purely flexible, we think the new EOS would not have hard requirement on transactional.id: https://issues.apache.org/jira/browse/KAFKA-9453 I you

Re: docs fixes

2022-05-18 Thread Guozhang Wang
Thanks for reporting this. Are you interested in submitting a PR to fix it? Guozhang On Wed, May 18, 2022 at 7:26 AM Владимир Савостин wrote: > In section https://kafka.apache.org/quickstart#quickstart_kafkaconnect > you should change plugin path from lib/connect-file-3.2.0.ja to >

Re: Kafka Stretch Clusters

2022-05-12 Thread Guozhang Wang
random selection, then manually optimizing the replica > list to reduce producer hops probably isn't worth trying, as we'd get the > producer hops during normal broker maintenance. > > Thank you! > > > > > > > > On Mon, May 9, 2022 at 6:00 PM Guozhang Wa

Re: Kafka Stretch Clusters

2022-05-09 Thread Guozhang Wang
Hello Andrew. Just to answer your questions first, yes that's correct in your described settings that three round-trips between DCs would incur, but since the replica fetches can be done in parallel, the latency is not a sum of all the round-trips. But if you stay with 2 DCs only, the number of

Re: Add to Kafka contributor list

2022-05-05 Thread Guozhang Wang
Thanks Chern for your interest to contribute! I've added you to the contributors list. Guozhang On Thu, May 5, 2022 at 3:04 PM Chern Yih Cheah wrote: > Hi, > > Could you add me to Kakfa contributor list? I would like to work on > https://issues.apache.org/jira/browse/KAFKA-13879 > > Thank

Re: [VOTE] 3.1.1 RC0

2022-04-12 Thread Guozhang Wang
Thanks Tom. I took a quick pass on the release notes, javadocs, downloaded the rc (zip only) and compiled with no issues. +1 Guozhang On Fri, Apr 8, 2022 at 9:18 AM Tom Bentley wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for release of Apache

Re: Transactions and `endOffsets` Java client consumer method

2022-03-22 Thread Guozhang Wang
> > > by the client. > > > > > > What I was looking for when calling `endOffsets` was the last offset > that > > > would be surfaced via the `poll` method rather than where the low level > > > client should read up to. If this is at all possible, it seem

Re: Transactions and `endOffsets` Java client consumer method

2022-03-21 Thread Guozhang Wang
uto.offset.reset -> earliest, isolation.level -> read_committed, > > group.instance.id -> , group.id -> > > ephemeral-6c5f8338-3d94-4296-b64a-d3a5a331117e, bootstrap.servers -> > > PLAINTEXT://localhost:58300, enable.auto.commit -> false > > > > Th

Re: Transactions and `endOffsets` Java client consumer method

2022-03-19 Thread Guozhang Wang
Hi Chris, The broker does take the isolation level in the ListOffset API (which is used for the endoffset call) in to consideration: val lastFetchableOffset = isolationLevel match { case Some(IsolationLevel.READ_COMMITTED) => localLog.lastStableOffset case

Re: Kafka streams and user authentication

2022-02-25 Thread Guozhang Wang
KStream plain(StreamsBuilder builder) { > KStream stream = builder.stream( "A" ); > stream.map( ... ).to( "B" ); > return stream; > } > > Thanks > Alessandro > > > -Original Message- > From: Guozhang Wang > S

Re: Kafka streams uneven task allocation

2022-02-23 Thread Guozhang Wang
gt; & 2 > > respectively. I have configured standby to be 1 which means there will be > > one more copy of the state store and warm up by default is 2. What's the > > difference, will there be 2 copies now? > > > > Thanks > > > > On Fri, Feb 4, 2022 at 1:13

Re: Kafka streams and user authentication

2022-02-23 Thread Guozhang Wang
Hello Alessandro, Could you elaborate a bit more on what authN methanisms you are using, and by `account` what do you mean explicitly? Guozhang On Wed, Feb 23, 2022 at 5:10 AM Alessandro Ernesto Mascherpa < alessandro.masche...@piksel.com> wrote: > Hi All, > I'm facing a problem with user

[ANNOUNCE] New committer: Luke Chen

2022-02-09 Thread Guozhang Wang
The PMC for Apache Kafka has invited Luke Chen (showuon) as a committer and we are pleased to announce that he has accepted! Luke has been actively contributing to Kafka since early 2020. He has made more than 120 commits on various components of Kafka, with notable contributions to the rebalance

Re: What versions are supported by community?

2022-02-05 Thread Guozhang Wang
Hi, There's a wiki page regarding Kafka's release policy: https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan And regarding the EOL: "Given 3 releases a year and the fact that no one upgrades three times a year, we propose making sure (by testing!) that rolling upgrade can

Re: Error triming topology

2022-02-03 Thread Guozhang Wang
Hello Murilo, Could you elaborate a bit more on how you "trimmed" the topology? For example: 1) Did you change the code so that only 6 sub-topologies will be built, and their sub-topology ids stays the same? i.e. you just trimmed the last 3 sub-topologies with id 6,7,8? 2) Did you delete the

Re: Kafka streams uneven task allocation

2022-02-03 Thread Guozhang Wang
Hello Navneeth, Could you describe how you ended up with more than one partition assigned to the same thread after certain rebalance(s)? Do you override any default config values such as instance.id (for static consumer members), etc? Also I'd suggest upgrading to a newer version --- we just

Re: Kafka Streams - Stream threads processing two input topics

2022-01-10 Thread Guozhang Wang
Hi Miguel, I suspect it's due to the timestamps in your topic A, which are earlier than topic B. Note that Kafka Streams tries to synchronize joining topics by processing records with smaller timestamps, and hence if topic A's messages have smaller timestamps, they will be selected over the

Re: How to properly use a clean a TimestampedKeyValueStore

2022-01-10 Thread Guozhang Wang
tps://gist.github.com/magg/576bf3381c9c0501b9761b54e9d86375 > > Thanks > - Miguel > > > > > > On Tue, Jan 4, 2022 at 5:21 PM Guozhang Wang wrote: > > > Hi Miguel, > > > > How is your kvStore being constructed? Could you paste the snippet of the >

Re: How to properly use a clean a TimestampedKeyValueStore

2022-01-04 Thread Guozhang Wang
Hi Miguel, How is your kvStore being constructed? Could you paste the snippet of the related construction code, as well as the related iterating / deletion code here? On Tue, Jan 4, 2022 at 2:25 PM Matthias J. Sax wrote: > Not 100% sure. From what you describe it should work as expected. > >

Re: Kafka Streams threads sometimes fail transaction and are fenced after broker restart

2021-12-21 Thread Guozhang Wang
Hello Pieter, Thanks for bringing this to the community's attention. After reading your description I suspect you're hitting this issue: https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts Basically today we did not try to

Re: Hi Team,

2021-11-17 Thread Guozhang Wang
Hello, Several things need to be checked here: 1) Is your group id correct? Does that exist and it's version is sufficiently new that it does not store metadata on ZK but on Kafka? 2) Is the bootstrap server list correct? Is that IP reachable. On Wed, Nov 17, 2021 at 12:24 AM

Re: Producer Timeout issue in kafka streams task

2021-11-01 Thread Guozhang Wang
Hello Pushkar, I'm assuming you have the same Kafka version (2.5.1) at the Streams client side here: in those old versions, Kafka Streams relies on the embedded Producer clients to handle timeouts, which requires users to correctly configure such values. In newer version (2.8+) We have made

Re: Stream to KTable internals

2021-11-01 Thread Guozhang Wang
Hello Chad, >From your earlier comment, you mentioned "In my scenario the records were written to the KTable topic before the record was written to the KStream topic." So I think Matthias and others have excluded this possibility while trying to help investigate. If only the matching records

Re: Neverending KafkaStreams rebalance

2021-10-20 Thread Guozhang Wang
, we still had rebalances for an extra 2h, with messages saying: > Finished unstable assignment of tasks > For some reason, tasks seemed to keep migrating for 2h even after the > polling stopped timing out and the message backlog had been processed. > But now the service is stable again. >

Re: Neverending KafkaStreams rebalance

2021-10-13 Thread Guozhang Wang
Hi Murilo, which version of Kafka Streams are you using? Could you try the latest 3.0.0 release if it is not yet on that version? On Wed, Oct 13, 2021 at 8:02 AM Murilo Tavares wrote: > Hi > I have a large, stateful, KafkaStreams application that is on a never > ending rebalance loop. > I can

Re: [ANNOUNCE] Apache Kafka 3.0.0

2021-09-22 Thread Guozhang Wang
avid > Mao, David Osvath, Davor Poldrugo, Dejan Stojadinović, Dhruvil Shah, Diego > Erdody, Dong Lin, Dongjoon Hyun, Dániel Urbán, Edoardo Comar, Edwin Hobor, > Eric Beaudet, Ewen Cheslack-Postava, Gardner Vickers, Gasparina Damien, > Geordie, Greg Harris, Gunnar Morling, Guozhang Wang, G

Re: Subscribe for mailing list

2021-09-16 Thread Guozhang Wang
Hello Bhavik, You need to send to a separate mailing list to subscribe: https://kafka.apache.org/contact On Thu, Sep 16, 2021 at 8:48 AM Bhavik Ambani wrote: > Subscribe for mailing list > -- -- Guozhang

Re: High disk read with Kafka streams

2021-08-17 Thread Guozhang Wang
Hello Magnat, Thanks for reporting your observations. I have some questions: 1) Are your global state stores also in-memory or persisted on disks? 2) Are your Kafka and KStreams colocated? Guozhang On Tue, Aug 10, 2021 at 6:10 AM mangat rai wrote: > Hey All, > > We are using the low level

Re: DescribeTopics could return deleted topic

2021-08-17 Thread Guozhang Wang
Since DescribeTopics is just handled via a Metadata request behind the scene, it is possible that if the request is sent to some brokers with stale metadata (not yet received the metadata update request). But I would not expect it to "always" return the deleted topic partition, unless the broker

Re: NullPointerException after upgrading from 2.5.1 to 2.6.1 in my stream app

2021-06-27 Thread Guozhang Wang
On Wed, Feb 24, 2021 at 12:48 PM Nitay Kufert wrote: > > > I guess it's possible but very unlikely because it works perfectly with > > all the previous versions and the current one? (2.5.1) > > Why did a change in the version introduce NULLS there? > > > > On Tue

Re: Add me to contributors list of Apache Kafka

2021-06-27 Thread Guozhang Wang
Hello Alan, I've added you to the contributors list and also assigned the ticket to you. Cheers, Guozhang On Fri, Jun 25, 2021 at 5:19 PM Alan Artigao wrote: > Hi there! > > I'm working on KAFKA-12995 > and I need to be in > the

Re: KStreams and multiple instance

2021-05-11 Thread Guozhang Wang
Hello Pietro, 1) If you are using the Streams DSL with an aggregation, it would repartition the input streams by the aggregation field for data parallelism, and hence multiple instances would be able to do the aggregation in parallel and independently with correct results. 2) Short answer is

Re: Kafka Stream: State replication seems unpredictable.

2021-05-03 Thread Guozhang Wang
o the changelog topic, > right ? I am trying to understand the cases under which the changelog topic > can ever be disabled. > > Thanks > Mohan > > > On 4/21/21, 10:22 PM, "Guozhang Wang" wrote: > > Hello Mangat, > > I think using

Re: Exceptions in kafka streams

2021-05-03 Thread Guozhang Wang
Hello DInesh, Seems you're still using version 1.0 of Kafka Streams with EOS enabled. Could you try to upgrade to a newer version (2.6+) and see if this issue goes away? On Mon, May 3, 2021 at 8:55 AM Dinesh Raj wrote: > Hi, > > I am getting too many exceptions whenever the kafka streams

Re: Kafka Stream: State replication seems unpredictable.

2021-04-21 Thread Guozhang Wang
ility but keep things consistent. > 2. Use exactly-once semantics. However, we might need to redesign our apps. > It's a bigger change. > > Regards, > Mangat > > On Tue, Apr 20, 2021 at 6:50 AM Guozhang Wang wrote: > > > Hello Mangat, > > > > What you've encoun

Re: Kafka Stream: State replication seems unpredictable.

2021-04-19 Thread Guozhang Wang
sing until the state is completely transferred from Thread-2. > 2. If yes, how can we disable it without downgrading the client? > > Since we have a very low scale and no real-time computing requirement, we > will be happy to sacrifice the availability to have consistency. > &

Re: [ANNOUNCE] Apache Kafka 2.8.0

2021-04-19 Thread Guozhang Wang
Blee-Goldman, > Attila Sasvari, Benoit Maggi, bertber, bill, Bill Bejeck, > Bob Barrett, Boyang Chen, Brajesh Kumar, Bruno Cadonna, > Cheng Tan, Chia-Ping Tsai, Chris Egerton, CHUN-HAO TANG, > Colin Patrick McCabe, Colin P. Mccabe, Cyrus Vafadari, David > Arthur, David Jacot, David Mao,

Re: Kafka Stream: State replication seems unpredictable.

2021-04-16 Thread Guozhang Wang
issue, our CPU was highly throttled by Kubernetes. We > gave more resources, also decreased the fetch-size so we have more I/O to > Cpu time ratio than before, and then there was no timeout issue, hence no > reassignment and hence no corrupted state. > > I really appreciate your

Re: Kafka Stream: State replication seems unpredictable.

2021-04-14 Thread Guozhang Wang
nd don't commit ourselves. > > Regards, > Mangat > > > > > On Thu, Apr 8, 2021 at 5:37 PM Guozhang Wang wrote: > > > Hi Mangat, > > > > Please see my replies inline below. > > > > On Thu, Apr 8, 2021 at 5:34 AM mangat rai wrote: > > > >

Re: Yet Another Repartitioning Question About Kafka Streams

2021-04-14 Thread Guozhang Wang
gt; > > > > On Mon, Apr 5, 2021 at 6:34 PM Guozhang Wang wrote: > > > Hello Gareth, > > > > 1) For this scenario, its state should be reusable and we do not need to > > read from scratch from Kafka to rebuild. > > > > 2) "Warmup replicas&quo

Re: Kafka Stream: State replication seems unpredictable.

2021-04-08 Thread Guozhang Wang
Hi Mangat, Please see my replies inline below. On Thu, Apr 8, 2021 at 5:34 AM mangat rai wrote: > @Guozhang Wang > > Thanks for the reply. Indeed I am finding it difficult to explain this > state. I checked the code many times. There can be a bug but I fail to see > it. Th

[ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Guozhang Wang
Hello all, I'm happy to announce that Bruno Cadonna has accepted his invitation to become an Apache Kafka committer. Bruno has been contributing to Kafka since Jan. 2019 and has made 99 commits and more than 80 PR reviews so far: https://github.com/apache/kafka/commits?author=cadonna He worked

Re: Kafka Stream: State replication seems unpredictable.

2021-04-07 Thread Guozhang Wang
Hello Mangat, With at least once, although some records maybe processed multiple times their process ordering should not be violated, so what you observed is not expected. What caught my eyes are this section in your output changelogs (high-lighted): Key1, V1 Key1, null Key1, V1 Key1, null

Re: Kafka Streams application startup issues

2021-04-05 Thread Guozhang Wang
Hello Mikko, The issue that you saw restoration repeating and never completed is a bit weird, and without further logs I cannot tell exactly what's the root cause. At the same time, if your common scenario is upgrade, maybe you can consider using static members to avoid unnecessary rebalances

Re: Yet Another Repartitioning Question About Kafka Streams

2021-04-05 Thread Guozhang Wang
Hello Gareth, 1) For this scenario, its state should be reusable and we do not need to read from scratch from Kafka to rebuild. 2) "Warmup replicas" is just a special standby replica that is temporary, note that if there's no partition migration needed at the moment, the num.warmup.replicas is

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-30 Thread Guozhang Wang
Desai​ | Senior Software Developer | *ude...@itrsgroup.com* > > *www.itrsgroup.com* <https://www.itrsgroup.com/> > <https://www.itrsgroup.com/> > > *From: *Guozhang Wang > *Date: *Tuesday, March 30, 2021 at 2:10 PM > *To: *Users > *Cc: *Bart Lilje > *Subject: *Re: Ka

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-30 Thread Guozhang Wang
; > > Best, > > Upesh > > > Upesh Desai​ | Senior Software Developer | *ude...@itrsgroup.com* > > *www.itrsgroup.com* <https://www.itrsgroup.com/> > <https://www.itrsgroup.com/> > > *From: *Guozhang Wang > *Date: *Tuesday, March 30, 2021 at 1:0

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-30 Thread Guozhang Wang
t; > *From: *Guozhang Wang > *Date: *Thursday, March 25, 2021 at 6:31 PM > *To: *Users > *Subject: *Re: Kafka Streams Processor API state stores not restored via > changelog topics > > BTW, yes that indicates the record in the changelog was already truncated > (logically). But s

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-25 Thread Guozhang Wang
shutdown? On Thu, Mar 25, 2021 at 4:22 PM Guozhang Wang wrote: > That's indeed weird. > > Have you tried to run Kafka brokers with 2.6 while Kafka Streams client > with 2.7? > > On Thu, Mar 25, 2021 at 2:34 PM Upesh Desai wrote: > >> Hello Guozhang, >> >&

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-25 Thread Guozhang Wang
log is just set to compact. We also confirmed > that this does not happen when we run everything on Kafka version 2.6. > > > > Thanks, > > Upesh > > > Upesh Desai​ | Senior Software Developer | *ude...@itrsgroup.com* > > *www.itrsgroup.com* <https://www.it

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-25 Thread Guozhang Wang
gt; Upesh Desai​ | Senior Software Developer | *ude...@itrsgroup.com* > > *www.itrsgroup.com* <https://www.itrsgroup.com/> > <https://www.itrsgroup.com/> > > *From: *Guozhang Wang > *Date: *Wednesday, March 24, 2021 at 1:37 PM > *To: *Users > *Cc: *Bart Lilje > *S

Re: Kafka Streams Processor API state stores not restored via changelog topics

2021-03-24 Thread Guozhang Wang
Hello Upesh, Thanks for the detailed report. I looked through the code and tried to reproduce the issue, but so far have not succeeded. I think I may need some further information from you to help my further investigation. 1) The `bufferedLimitIndex == 0` itself does not necessarily mean there's

Re: Right Use Case For Kafka Streams?

2021-03-17 Thread Guozhang Wang
Hello Gareth, A common practice for rolling up aggregations with Kafka Streams is to do the finest granularity at processor (5 days in your case), and to coarse-grained rolling up upon query serving through the interactive query API -- i.e. whenever a query is issued for a 30 day aggregate you do

Re: Redis as state store

2021-03-16 Thread Guozhang Wang
Thanks! On Mon, Mar 15, 2021 at 8:38 PM Sophie Blee-Goldman wrote: > Yep, that fell off my radar. Here we go: > https://issues.apache.org/jira/browse/KAFKA-12475 > > On Mon, Mar 15, 2021 at 8:09 PM Guozhang Wang wrote: > > > Hey Sophie, > > > > Maybe

Re: [VOTE] 2.6.2 RC0

2021-03-15 Thread Guozhang Wang
Hi Sophie, I've reviewed the javadocs / release notes / documentations, and they LGTM. +1. Guozhang On Fri, Mar 12, 2021 at 10:48 AM Sophie Blee-Goldman wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for release of Apache Kafka 2.6.2. > >

Re: Redis as state store

2021-03-15 Thread Guozhang Wang
> > > >>> > > > > >>>> " Another issue with 3rd party state stores could be violation > of > > > > >>>> exactly-once guarantee provided by kafka streams in the event > of a > > > >

Re: Redis as state store

2021-03-12 Thread Guozhang Wang
Hello Mohan, I think what you had in mind works with Redis, since it is a remote state store engine, it does not have the co-partitioning requirements as local state stores. One thing you'd need to tune KS though is that with remote stores, the processing latency may be larger, and since Kafka

Re: Abort transaction semantics

2021-03-11 Thread Guozhang Wang
g where is the place in the transaction schema where consumer > offsets are committed to kafka. > > Thank you > > Peter > > On Fri, Feb 19, 2021 at 9:23 PM Guozhang Wang wrote: > > > Hello Peter, > > > > Note that when you upgrade from 2.4 to later versions

Re: Window Store

2021-02-23 Thread Guozhang Wang
ozhang. > > I don't see the remove method in window stores. Am I missing something? It > would be very nice to implement the optimization you had mentioned. > > Thanks > > On Tue, Feb 23, 2021 at 11:11 AM Guozhang Wang wrote: > > > I see. In that case I think your

Re: NullPointerException after upgrading from 2.5.1 to 2.6.1 in my stream app

2021-02-23 Thread Guozhang Wang
> more generic): > > > inputStream.flatMapValues(_.split).peek((k, v) => {val _ = $k -> > > ${v.printForDebug}")}) # return type KStream[Windowed[String], > > SingleInputMessage] > > > On Fri, Jan 29, 2021 at 9:01 AM Guozhang W

Re: Window Store

2021-02-23 Thread Guozhang Wang
gt; User1 (Delete event - Since the user is inactive for a 10 min > period) > > Thanks > > On Fri, Feb 19, 2021 at 12:19 PM Guozhang Wang wrote: > > > Hello Navneeth, > > > > I would agree with Liam that a session store seems a good fit for your > > case. But note

Re: Abort transaction semantics

2021-02-19 Thread Guozhang Wang
Hello Peter, Note that when you upgrade from 2.4 to later versions in Kafka, your error handling could be modified and simplified a bit as well. You can read the example code in KIP-691 as a reference:

Re: Window Store

2021-02-19 Thread Guozhang Wang
Hello Navneeth, I would agree with Liam that a session store seems a good fit for your case. But note that session stores would not expire a session themselves and it is still the processor node's job to find those already expired sessions and emit results / delete. You can take a look at the

Re: Kafka 2.7.0 processor API and streams-test-utils changes

2021-02-05 Thread Guozhang Wang
Hello Upesh, Thanks for the report! I think this is overlooked to update the documentation with the new 2.7.0 release. Could you file a JIRA (or even better, provide a PR with the JIRA :) to update the docs? Guozhang On Thu, Feb 4, 2021 at 1:03 PM Upesh Desai wrote: > Hello, > > > > I

Re: NullPointerException after upgrading from 2.5.1 to 2.6.1 in my stream app

2021-01-28 Thread Guozhang Wang
Could you share your code around > com.app.consumer.Utils$.$anonfun$buildCountersStream$1(ServiceUtils.scala:91) That seems to be where NPE is thrown. On Wed, Jan 13, 2021 at 5:46 AM Nitay Kufert wrote: > Hey, > *Without any code change*, just by bumping the kafka version from 2.5.1 to >

Re: Are Kafka and Kafka Streams right tools for my case?

2021-01-19 Thread Guozhang Wang
Hello, I have observed several use cases similar to yours that are using Kafka / Kafka Streams in production. That being said, your concerns are valid: * Big messages: 5MB is indeed large, but not extremely big for Kafka. If it is a single message of hundreds of MBs or over a GB then it's a bit

Re: [ANNOUNCE] Apache Kafka 2.7.0

2020-12-21 Thread Guozhang Wang
117 contributors to this release! > > > > > > A. Sophie Blee-Goldman, Aakash Shah, Adam Bellemare, Adem Efe Gencer, > > > albert02lowis, Alex Diachenko, Andras Katona, Andre Araujo, Andrew > Choi, > > > Andrew Egelhofer, Andy Coates, Ankit Kumar, Anna Povzner, A

Re: Handling "uneven" network partitions

2020-12-14 Thread Guozhang Wang
Hello Stig, I think there's an ordering of the events here, e.g.: T0: network partition happens. T1: record-1 received at the leader, at this time the ISR is still 3. Leader will accept this record and wait for it to be replicated. T100: lafter some elapsed time the leader decides to kick other

Re: [VOTE] 2.7.0 RC5

2020-12-14 Thread Guozhang Wang
I checked the docs and ran unit tests, no red flags found. +1. On Fri, Dec 11, 2020 at 5:45 AM Bill Bejeck wrote: > Updated with link to successful Jenkins build. > > * Successful Jenkins builds for the 2.7 branch: > Unit/integration tests: > >

Re: Kafka Streams - Source Partition Assignment Issue

2020-12-14 Thread Guozhang Wang
Kafka Streams should evenly distribute the partitions, but there are some issues in old versions of Kafka that you may be observing. To verify if it is a transient issue or it is permanently, I'd suggest you try: 1) bounce the instances that have no partitions assigned (not bounce them all

Re: Kafka Streams Optimizations

2020-12-14 Thread Guozhang Wang
Hello Navneeth, Please find answers to your questions below. On Sun, Dec 6, 2020 at 10:33 PM Navneeth Krishnan wrote: > Hi All, > > I have been working on moving an application to kafka streams and I have > the following questions. > > 1. We were planning to use an EFS mount to share rocksdb

Re: GlobalKTable restoration - Unexplained performance penalty

2020-12-01 Thread Guozhang Wang
ggered for my app (and my app > actually restarted, meaning i get it as part of my app bootstrap logging) > > On Tue, Nov 24, 2020 at 12:03 AM Guozhang Wang wrote: > > > Hello Nitay, > > > > Thanks for letting us know about your observations. When you restart the > >

Re: 在哪可以查看kafka各个版本的生命周期

2020-11-28 Thread Guozhang Wang
Hello Joson, The EOL policy of Kafka is 3 releases, which is about a year: https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan And you can infer the life cycle of each release from their publish date here: https://kafka.apache.org/downloads Please use English to engage the

Re: GlobalKTable restoration - Unexplained performance penalty

2020-11-23 Thread Guozhang Wang
Hello Nitay, Thanks for letting us know about your observations. When you restart the application from empty local state Kafka Streams will try to restore all the records up to the current log end for those global KTables, and during this period of time there should be no processing. Do you mind

Re: High latency for Kafka transactions

2020-11-13 Thread Guozhang Wang
Hello John, It's a bit hard to reason what's your total producing traffic to the Kafka brokers, e.g. how frequently do you send a `whole group of messages`? If it is every 10ms e.g. then your traffic would be 100K per sec, which is pretty large enough with transactions. Guozhang On Fri, Nov

Re: Partitioning per team

2020-10-30 Thread Guozhang Wang
Hello Jan, One alternative approach you can consider is to use combo as the key, hence it achieves the small aggregation, while customizing your partitioner for the repartition topic such that keys with the same prefix always go to the same partition. Then when cleaning up data, similarly

Re: Kafka Streams : Rack Aware deployment in EC2

2020-10-20 Thread Guozhang Wang
Hello Tony, I think you already know the consumer-client side fetch-from-follower feature: https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica Say your Kafka deployment is cross-AZs, and your Streams is also deployed cross-AZ, you can, by

[ANNOUNCE] New committer: Chia-Ping Tsai

2020-10-19 Thread Guozhang Wang
Hello all, I'm happy to announce that Chia-Ping Tsai has accepted his invitation to become an Apache Kafka committer. Chia-Ping has been contributing to Kafka since March 2018 and has made 74 commits: https://github.com/apache/kafka/commits?author=chia7712 He's also authored several major

Re: [ANNOUNCE] New committer: David Jacot

2020-10-16 Thread Guozhang Wang
Congrats David! Guozhang On Fri, Oct 16, 2020 at 10:23 AM Raymond Ng wrote: > Congrats David! > > Cheers, > Ray > > On Fri, Oct 16, 2020 at 10:08 AM Rajini Sivaram > wrote: > > > Congratulations, David! > > > > Regards, > > > > Rajini > > > > On Fri, Oct 16, 2020 at 5:45 PM Matthias J. Sax

Re: Contributor list registration request

2020-10-11 Thread Guozhang Wang
Hello Sheikh, Your account username has been added to the contributor list. Cheers. Guozhang On Sun, Oct 11, 2020 at 11:22 AM Sheikh Araf wrote: > Hi, > > Please add me to the contributor list. My JIRA account username is > “arafsheikh”. > > Thanks in advance! > > -- -- Guozhang

Re: Is this a valid use case for reading local store ?

2020-10-01 Thread Guozhang Wang
> to App2 (through kafka topic, not shown above). Conceptually it looks same > as your use case. What do people do if a kafka streams application (App1) > has to offer REST interface also ? > > -thanks > Mohan > > On 9/30/20, 5:01 PM, "Guozhang Wang" wrote: > >

Re: Sign up email for apache kafka notifications

2020-09-30 Thread Guozhang Wang
Hello Josh, Please help yourself subscribing to the mailing list of Kafka: https://kafka.apache.org/contact On Wed, Sep 30, 2020 at 9:36 AM wrote: > Hello, > > This is an email to sign up for the notifications for apache kafka. > Please advise if there are any further actions that I will need

Re: Is this a valid use case for reading local store ?

2020-09-30 Thread Guozhang Wang
Hello Mohan, If I understand correctly, your async event trigger process runs out of the streams application, that reads the state stores of app2 through the interactive query interface, right? This is actually a pretty common use case pattern for IQ :) Guozhang On Wed, Sep 30, 2020 at 1:22 PM

Re: Newly added topic or partitions are not assigned to running consumer groups using static membership

2020-09-27 Thread Guozhang Wang
That seems to be a bug indeed, I will reply on the ticket. Guozhang On Thu, Sep 24, 2020 at 8:03 PM Fu, Tony wrote: > Is anyone seeing this problem as well ( > https://issues.apache.org/jira/browse/KAFKA-10513)? I think it also > happens when new topics created within the subscription

  1   2   3   4   5   6   7   8   9   10   >