[DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
Dear Kafka community, I am proposing KIP-267 to augment the public Streams test utils API. The goal is to simplify testing of Kafka Streams applications. Please find details in the wiki:https://cwiki.apache.org/confluence/display/KAFKA/KIP-267%3A+Add+Processor+Unit+Test+Support+to+Kafka+Streams+T

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
Thanks Ted, Sure thing; I updated the example code in the KIP with a little snippet. -John On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu wrote: > Looks good. > > See if you can add punctuator into the sample code. > > On Wed, Mar 7, 2018 at 7:10 PM, John Roesler wrote: > > &

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-07 Thread John Roesler
On Wed, Mar 7, 2018 at 8:03 PM, John Roesler wrote: > Thanks Ted, > > Sure thing; I updated the example code in the KIP with a little snippet. > > -John > > On Wed, Mar 7, 2018 at 7:18 PM, Ted Yu wrote: > >> Looks good. >> >> See if you can add punctuat

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
me punctuators are > indeed registered, and if people want full auto punctuation testing they > have to go with TopologyTestDriver. > > > > Guozhang > > > On Wed, Mar 7, 2018 at 8:04 PM, John Roesler wrote: > > > On Wed, Mar 7, 2018 at 8:03 PM, John Roesler wrote:

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
o a third module that depends on both streams and test-utils. Yuck! Thanks, -John On Thu, Mar 8, 2018 at 3:16 PM, John Roesler wrote: > Thanks for the review, Guozhang, > > In response: > 1. I missed that! I'll look into it and update the KIP. > > 2. I was planning to us

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
t childIndex)` and `forward(K key, > V value, String childName)` -- should we also throw > UnsupportedOperationException similar to `schedule(long)` if KIP-251 is > accepted? > > > -Matthias > > On 3/8/18 3:16 PM, John Roesler wrote: > > Thanks for the review, Guozhang, > > > > In re

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-08 Thread John Roesler
a solution. Thanks, -John On Thu, Mar 8, 2018 at 3:39 PM, Matthias J. Sax wrote: > Isn't MockProcessorContext in o.a.k.test part of the unit-test package > but not the main package? > > This should resolve the dependency issue. > > -Matthias > > On 3/8/18 3:32 PM

Re: Delayed processing

2018-03-09 Thread John Roesler
Hi Wim, One off-the-cuff idea is that you maybe don't need to actually delay anonymizing the data. Instead, you can just create a separate pathway that immediately anonymizes the data. Something like this: (raw-input topic, GDPR retention period set) |\->[streams apps that needs non-anonymized d

Re: [DISCUSS] KIP-267: Add Processor Unit Test Support to Kafka Streams Test Utils

2018-03-09 Thread John Roesler
t I think it is a good cost to pay, > plus once we start publishing test-util artifacts for other projects like > client and connect, we may face the same issue and need to do this > refactoring as well. > > > > Guozhang > > > > > On Fri, Mar 9, 2018 at 9:54 AM, Joh

Re: Workaround for KTable/KTable join followed by groupBy and aggregate/count can result in duplicated results KAFKA-4609?

2018-04-30 Thread John Roesler
Hello Artur, Apologies in advance if I say something incorrect, as I'm still a little new to this project. If I followed your example, then I think the scenario is that you're joining "claims" and "payments", grouping by "claimNumber", and then building a list for each "claimNumber" of all the cl

Re: Workaround for KTable/KTable join followed by groupBy and aggregate/count can result in duplicated results KAFKA-4609?

2018-05-01 Thread John Roesler
cal messages, like > this > > {"claimList":{"lst":[{"claimnumber":"3_0","claimtime":"708.521153490306"," > claimreporttime":"55948.33110985625","claimcounter":" > 0"}]},"paymentList&qu

Re: Kafka Streams Produced Wrong (duplicated) Results with Simple Windowed Aggregation Case

2018-06-04 Thread John Roesler
Hi EC, Thanks for the very clear report and question. Like Guozhang said this is expected (but not ideal) behavior. For an immediate work-around, you can try materializing the KTable and setting the commit interval and cache size as discussed here ( https://www.confluent.io/blog/watermarks-tables

Re: Some Total and Rate metrics are not consistent

2018-06-21 Thread John Roesler
Hi Sam, This sounds like a condition I fixed in https://github.com/apache/kafka/commit/ed51b2cdf5bdac210a6904bead1a2ca6e8411406#diff-8b364ed2d0abd8e8ae21f5d322db6564R221 . I realized that the prior code creates a new Meter, which uses a Total metric instead of a Count. But that would total all the

Re: Kafka Streams - Expiring Records By Process Time

2018-06-21 Thread John Roesler
Hi Sicheng, I'm also curious about the details. Let's say you are doing a simple count aggregation with 24-hour windows. You got three events with key "A" on 2017-06-21, one year ago, so the windowed key (A,2017-06-21) has a value of 3. Fast-forward a year later. We get one late event, also for

[DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-26 Thread John Roesler
Hello devs and users, Please take some time to consider this proposal for Kafka Streams: KIP-328: Ability to suppress updates for KTables link: https://cwiki.apache.org/confluence/x/sQU0BQ The basic idea is to provide: * more usable control over update rate (vs the current state store caches) *

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
ta > structures by supporting `of` ? > > Suppression.of(Duration.ofMinutes(10)) > > > Cheers > > > > On Tue, Jun 26, 2018 at 1:11 PM, John Roesler wrote: > > > Hello devs and users, > > > > Please take some time to consider this proposal for K

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
Hello again all, I realized today that I neglected to include metrics in the proposal. I have added them just now. Thanks, -John On Tue, Jun 26, 2018 at 3:11 PM John Roesler wrote: > Hello devs and users, > > Please take some time to consider this proposal for Kafka Streams: &g

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
ve as new example is semantically the same as what I > suggested. > > So it is good by me. > > > > Thanks > > > > On Wed, Jun 27, 2018 at 7:31 AM, John Roesler wrote: > > > >> Thanks for taking look, Ted, > >> > >> I agree this

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-06-27 Thread John Roesler
ime. And in the latter case, > it will drop it on the floor and also record it in "too-late-records" > metrics. And also this emit policy would not need any buffering, since the > original store's cache contains the record context already need for > flushing downstream. >

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
owed stores make sense in practice? > 2. Does "suppressLateEvents" with any parameter Y for non-windowed stores > make sense in practice? > > > > Guozhang > > > On Fri, Jun 29, 2018 at 2:26 PM, Bill Bejeck wrote: > > > Thanks for the explanation

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
n" in Suppress instead. > > So I'd summarize my thoughts in the following questions: > > 1. Does "suppressLateEvents" with parameter Y != X (window retention time) > for windowed stores make sense in practice? > 2. Does "suppressLateEvents" wi

Re: [DISCUSS] KIP-328: Ability to suppress updates for KTables

2018-07-02 Thread John Roesler
both expressed doubt that there are practical use cases for it outside of final-results. -John On Mon, Jul 2, 2018 at 12:27 PM John Roesler wrote: > Hi again, Guozhang ;) Here's the second part of my response... > > It seems like your main concern is: "if I'm a user who w

Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-12 Thread John Roesler
Hi Vasily, Thanks for the email. To answer your question: you should reset the application basically any time you change the topology. Some transitions are safe, but others will result in data loss or corruption. Rather than try to reason about which is which, it's much safer just to either reset

Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-13 Thread John Roesler
table aggregation) > 2. Start without table aggregation (no app reset) > 3. Start with table aggregation (no app reset) > > Bellow is an interpretation of the adder/subtractor logs for a given > key/window in the chronological order > > SUB: newValue=(key2, 732, 10:50:40) aggValue=nu

Re: Kafka-streams calling subtractor with null aggregator value in KGroupedTable.reduce() and other weirdness

2018-07-13 Thread John Roesler
s it to reducer. I feel > like I am missing something here. Could you please clarify this? > > Can you please point me to a place in kafka-streams sources where a > Change of newValue/oldValue is produced, so I could take a look? I > found KTableReduce implementation, but can'

Re: Kafka Streams - Merge vs. Join

2018-08-09 Thread John Roesler
a new stream by fusing together records from the two inputs by key For example: input-1: (A, 1) (B, 2) input-2: (A, 500) (C, 60) join( (l,r) -> new KeyValue(l, r) ):// simplified API (A, (1, 500) ) (B, (2, null) ) (C, (null, 60) ) merge: (A, 1) (A, 500) (B, 2) (C, 60) Does that make sens

Re: Controlled topics creation in Kafka cluster, how does that affect a Kafka Streams App that uses join?

2018-09-05 Thread John Roesler
Hi Meeiling, It sounds like "formal process" is more like "fill out a form beforehand" than just "access controlled". In that case, although it's somewhat of an internal detail, I think the easiest thing would be to figure out what the name of the internal topics will be and request their creatio

Re: Best way to manage topics using Java API

2018-09-05 Thread John Roesler
Hi Robin, The AdminClient is what you want. "Evolving" is just a heads-up that the API is relatively new and hasn't stood the test of time, so you shouldn't be *too* surprised to see it change in the future. That said, even APIs marked "Evolving" need to go through the KIP process to be changed,

Re: Timing state changes?

2018-09-06 Thread John Roesler
Hi Tim, >From your spec, I think that Kafka Streams has several ways to support either scenario. Caveat: this is off the cuff, so I might have missed something. For your context, I'll give you some thoughts on several ways you could do it, with various tradeoffs. Scenario 1 (every widget reports

Re: SAM Scala aggregate

2018-09-10 Thread John Roesler
In addition to the other suggestions, if you're having too much trouble with the interface, you can always fall back to creating anonymous Initializer/Aggregator instances the way you would if programming in Java. This way, you wouldn't need SAM conversion at all (all it's doing is turning your fun

Re: Timing state changes?

2018-09-12 Thread John Roesler
e of your state store, you may find that recovery from the changelog is on the slow side. For this reason, it's probably a good idea to use stateful sets with K8s if possible. Does this help? Thanks, -John On Wed, Sep 12, 2018 at 7:50 AM Tim Ward wrote: > From: John Roesler > >

Re: KStreams / KSQL processes that span multiple clusters

2018-09-12 Thread John Roesler
Hi Elliot, This is not currently supported, but I, for one, think it would be awesome. It's something I have considered tackling in the future. Feel free to create a Jira ticket asking for it (but please take a minute to search for preexisting tickets). Offhand, my #1 concern would be how it wor

Re: SAM Scala aggregate

2018-09-12 Thread John Roesler
reams/2.0.0 > > > > Kind regards, > > > > Liam Clarke > > > >> On Tue, Sep 11, 2018 at 1:43 PM, Michael Eugene > wrote: > >> > >> Well to bring up that kafka to 2.0, do I just need for sbt kafka clients > >> and kafka streams

Re: Best way for reading all messages and close

2018-09-14 Thread John Roesler
Specifically, you can monitor the "records-lag-max" ( https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics) metric. (or the more granular one per partition). Once this metric goes to 0, you know that you've caught up with the tail of the log. Hope this helps, -John On Fri, Sep 1

Re: Best way for reading all messages and close

2018-09-17 Thread John Roesler
ugh per > topic). > What do you think? > > El sáb., 15 sept. 2018 a las 0:30, John Roesler () > escribió: > > > Specifically, you can monitor the "records-lag-max" ( > > https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics) > > metric.

Re: Low level kafka consumer API to KafkaStreams App.

2018-09-17 Thread John Roesler
Hey Praveen, I also suspect that you can get away with far fewer threads. Here's the general starting point I recommend: * start with just a little over 1 thread per hardware thread (accounting for cores and hyperthreading). For example, on my machine, I have 4 cores with 2 threads of execution e

Re: [KafkaStreams 1.1.1] partition assignment broken?

2018-10-08 Thread John Roesler
Hi Bart, This sounds a bit surprising. Is there any chance you can zip up some logs so we can see the assignment protocol on the nodes? Thanks, -John On Mon, Oct 8, 2018 at 4:32 AM Bart Vercammen wrote: > Hi, > > I recently moved some KafkaStreams applications from v0.10.2.1 to v1.1.1 > and no

Re: [KafkaStreams 1.1.1] partition assignment broken?

2018-10-08 Thread John Roesler
. > I'll post the results when I have more info. > > Greets, > Bart > > On Mon, Oct 8, 2018 at 7:36 PM John Roesler wrote: > > > Hi Bart, > > > > This sounds a bit surprising. Is there any chance you can zip up some > logs > > so we can see the a

Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-10-26 Thread John Roesler
Hi Patrik, Just to drop one observation in... Streaming to a topic and then consuming it as a table does create overhead, but so does reducing a stream to a table, and I think it's actually the same in either case. They both require a store to collect the table state, and in both cases, the store

Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-10-27 Thread John Roesler
ake sense? > > Having said this: you _can_ write a KStream into a topic an read it back > as KTable. But it's semantically questionable to do so, IMHO. Maybe it > makes sense for your specific application, but in general I don't think > it does make sense. > > > -

Re: Converting a Stream to a Table - groupBy/reduce vs. stream.to/builder.table

2018-11-20 Thread John Roesler
> but if the KTable is maintained inside a streams topology, does it have > to > > read back everything it sends to the broker or can it keep the table > > internally? I hope it is understandable what I mean, otherwise I can try > > the explain it more clearly. > > > &

Re: Kafka Streams 2.1.0, 3rd time data lose investigation

2019-01-03 Thread John Roesler
Hi Nitay, I'm sorry to hear of these troubles; it sounds frustrating. No worries about spamming the list, but it does sound like this might be worth tracking as a bug report in Jira. Obviously, we do not expect to lose data when instances come and go, regardless of the frequency, and we do have t

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-07 Thread John Roesler
Hi Peter, Sorry, I just now have seen this thread. You asked if this behavior is unexpected, and the answer is yes. Suppress.untilWindowCloses is intended to emit only the final result, regardless of restarts. You also asked how the suppression buffer can resume after a restart, since it's not p

Re: Kafka Streams 2.1.0, 3rd time data lose investigation

2019-01-07 Thread John Roesler
y (I had thousands of keys who lost their value - but > many many more that didn't lose their value). > (I am monitoring those changes using datadog, so I know which keys are > affected and I can investigate them) > > Let me know if you need some more details or if you want me t

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-10 Thread John Roesler
Hi Peter, Regarding retention, I was not referring to log retention, but to the window store retention. Since a new window is created every second (for example), there are in principle an unbounded number of windows (the longer the application runs, the more windows there are, with no end). Howeve

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-14 Thread John Roesler
ng how WindowStore works. I have some > more questions... > > On 1/10/19 5:33 PM, John Roesler wrote: > > Hi Peter, > > > > Regarding retention, I was not referring to log retention, but to the > > window store retention. > > Since a new window is created every second

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-22 Thread John Roesler
e with EOS enabled. Thanks for your help, -John On Mon, Jan 14, 2019 at 1:20 PM John Roesler wrote: > Hi Peter, > > I see your train of thought, but the actual implementation of the > window store is structured differently from your mental model. > Unlike Key/Value stores, we k

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-24 Thread John Roesler
Thanks, -John On Wed, Jan 23, 2019 at 5:11 AM Peter Levart wrote: > Hi John, > > Sorry I haven't had time to prepare the minimal reproducer yet. I still > have plans to do it though... > > On 1/22/19 8:02 PM, John Roesler wrote: > > Hi Peter, > > > >

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-01-25 Thread John Roesler
to reinstate the demo yet, but I have been re-reading > the following scenario of yours.... > > On 1/24/19 11:48 PM, Peter Levart wrote: > > Hi John, > > > > On 1/24/19 3:18 PM, John Roesler wrote: > > > >> > >> The reason is that, upon rest

Re: DSL - Deliver through a table and then to a stream?

2019-02-15 Thread John Roesler
Hi Trey, I think there is a ticket open requesting to be able to re-use the source topic, so I don't think it's an intentional restriction, just a consequence of the way the code is structured at the moment. Is it sufficient to send the update to "calls" and "answered-calls" at the same time? You

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-02-26 Thread John Roesler
put data from aborted (aka rolled-back) transactions, and you > miss the intended "exactly once" guarantee. > > Thanks, > -John > > On Fri, Jan 25, 2019 at 1:51 AM Peter Levart > wrote: > >> Hi John, >> >> Haven't been able to reinstate th

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-03-04 Thread John Roesler
angelog > > XXX-KTABLE-SUPPRESS-STATE-STORE-11-changelog-10 with EOS turned > on. *Reinitializing > > the task and restore its state from the beginning.* > > > > ... > > > > > > I was hoping for this to be fixed, but is not the case, at least for my

Re: KTable.suppress(Suppressed.untilWindowCloses) does not suppress some non-final results when the kafka streams process is restarted

2019-03-05 Thread John Roesler
to reproduce the problem, and also exactly the behavior you observe? Thanks, -John On Mon, Mar 4, 2019 at 10:41 PM John Roesler wrote: > Hi Jonathan, > > Sorry to hear that the feature is causing you trouble as well, and that > the 2.2 release candidate didn't seem to fix it. &g

Re: KafkaStreams backoff for non-existing topic

2019-03-25 Thread John Roesler
Hi, Murlio, I found https://issues.apache.org/jira/browse/KAFKA-7970, which sounds like the answer is currently "yes". Unfortunately, it is still tricky to handle this case, although the situation may improve soon. In the mean time, you can try to work around it with the StateListener. When Strea

Re: How to be notified by Kafka stream during partitions rebalancing

2019-04-09 Thread John Roesler
Hi Pierre, If you're using a Processor (or Transformer), you might be able to use the `close` method for this purpose. Streams invokes `close` on the Processor when it suspends the task at the start of the rebalance, when the partitions are revoked. (It invokes `init` once the rebalance is complet

Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

2019-04-30 Thread John Roesler
Hey, Jose, This is an interesting thought that I hadn't considered before. I think (tentatively) that windowsFor should *not* take the grace period into account. What I'm thinking is that the method is supposed to return "all windows that contain the provided timestamp" . When we keep window1 op

Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

2019-04-30 Thread John Roesler
Hi Ashok, I think some people may be able to give you advice, but please start a new thread instead of replying to an existing message. This just helps keep all the messages organized. Thanks! -John On Thu, Apr 25, 2019 at 6:12 AM ASHOK MACHERLA wrote: > Hii, > > what I asking > > I want to kn

Re: Changing tumbling windows inclusion

2019-05-07 Thread John Roesler
Hi Alessandro, Interesting. I agree, messing with the record timestamp to achieve your goal sounds too messy. It should be pretty easy to plug in your own implementation of Windows, instead of using the built-in TimeWindows, if you want slightly different windowing behavior. Does that work for y

Re: Customers getting duplicate emails

2019-05-13 Thread John Roesler
Hi Ashok, In general, what Ryanne said is correct. For example, if you send an email, and the send times out (but the email actually did get sent), your app cannot distinguish this from a failure in which the send times out before the email makes it out. Therefore, your only option would be to ret

Re: Can kafka internal state be purged ?

2019-06-20 Thread John Roesler
Hi! In addition to setting the grace period to zero (or some small number), you should also consider the delays introduced by record caches upstream of the suppression. If you're closely watching the timing of records going into and coming out of the topology, this might also spoil your expectatio

Re: Kafka streams dropping events in join when reading from earliest message on topic

2019-06-20 Thread John Roesler
Hi! You might also want to set MAX_TASK_IDLE_MS_CONFIG = "max.task.idle.ms" to a non-zero value. This will instruct Streams to wait the configured amount of time to buffer incoming events on all topics before choosing any records to process. In turn, this should cause records to be processed in ro

Re: Can kafka internal state be purged ?

2019-06-21 Thread John Roesler
the windows depends only on the event time and not on the key > > Are these two statements correct ? > > Thanks > Mohan > > On 6/20/19, 9:17 AM, "John Roesler" wrote: > > Hi! > > In addition to setting the grace period to zero (or some small

Re: Can kafka internal state be purged ?

2019-06-21 Thread John Roesler
twist. Thanks for the correction ( I will go back and confirm > this. > > -mohan > > > On 6/21/19, 12:40 PM, "John Roesler" wrote: > > Sure, the record cache attempts to save downstream operators from > unnecessary updates by also buffering for

Re: Can kafka internal state be purged ?

2019-06-24 Thread John Roesler
the state reconciled > across all the application instances ? Is there a designated instance for a > particular key ? > > In my case, there was only one instance processing records from all > partitions and it is kind of odd that windows did not expire even though I > understand

Re: replace usage of TimeWindows.until() from Kafka Streams 2.2

2019-06-24 Thread John Roesler
Hey Sendoh, I think you just overlooked the javadoc in your search, which says: > @deprecated since 2.1. Use {@link Materialized#withRetention(Duration)} or > directly configure the retention in a store supplier and use {@link > Materialized#as(WindowBytesStoreSupplier)}. Sorry for the confusi

Re: Can kafka internal state be purged ?

2019-06-26 Thread John Roesler
> ? It is possible that there is new stream of messages coming in but post-map > operation, the partitions in the repartitioned topic does not see the same > thing. > > Thanks > Mohan > > On 6/24/19, 7:49 AM, "John Roesler" wrote: > > Hey, this

Re: Can kafka internal state be purged ?

2019-06-28 Thread John Roesler
tch the "key" of the pending windows. Then it would be > flushed. > > In practice, it may not be a problem always. But then the real time nature of > the problem might require us that there is not a huge delay between the > processing of the event and the flush. How does o

Re: Kafka streams (2.1.1) - org.rocksdb.RocksDBException:Too many open files

2019-06-28 Thread John Roesler
Hey all, If you want to figure it out theoretically, if you print out the topology description, you'll have some number of state stores listed in there. The number of Rocks instances should just be (#global_state_stores + sum(#partitions_of_topic_per_local_state_store)) . The number of stream thre

Re: Message reprocessing logic

2019-07-09 Thread John Roesler
Hi Alessandro, Sorry if I'm missing some of the context, but could you just keep retrying the API call inside a loop? This would block any other processing by the same thread, but it would allow Streams to stay up in the face of transient failures. Otherwise, I'm afraid that throwing an exception

Re: Reducing streams startup bandwidth usage

2019-12-02 Thread John Roesler
Hi Alessandro, I'm sorry to hear that. The restore process only takes one factor into account: the current offset position of the changelog topic is stored in a local file alongside the state stores. On startup, the app checks if the recorded position lags the latest offset in the changelog. I

Re: KafkaStreams internal producer order guarantee

2019-12-03 Thread John Roesler
Hi Murilo, For this case, you don’t have to worry. Kafka Streams provides the guarantee you want by default. Let us know if you want/need more information! Cheers, John On Tue, Dec 3, 2019, at 08:59, Murilo Tavares wrote: > Hi Mathias > Thank you for your feedback. > I'm still a bit confused

Re: Ordering of messages in the same kafka streams sub-topology with multiple sinks for the same topic

2019-12-03 Thread John Roesler
Hi Vasily, Probably in this case, with the constraints you’re providing, the first branch would output first, but I wouldn’t depend on it. Any small change in your program could mess this up, and also any change in Streams could alter the exact execution order also. The right way to think abo

Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
-store > with 18 records, ending offset is 3024261, next starting position is 3024262 > > > why it first says it didn't find the checkpoint and then it does find it? > It seems it loaded about 2.7M records (sum of offset difference in the > "restorting partition "

Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
ld have only the last value for each key. > > Hope that makes sense > > Thanks again > > -- > Alessandro Tagliapietra > > > On Tue, Dec 3, 2019 at 3:04 PM John Roesler wrote: > > > Hi Alessandro, > > > > To take a stab at your question, maybe it first

Re: Reducing streams startup bandwidth usage

2019-12-03 Thread John Roesler
luent cloud and I had to create > it manually. > Regarding the change in the recovery behavior, with compact cleanup policy > shouldn't the changelog only keep the last value? That would make the > recovery faster and "cheaper" as it would only need to read a single value &g

Re: Reducing streams startup bandwidth usage

2019-12-04 Thread John Roesler
faster now > having only one key per key instead of one key per window per key. > > Thanks a lot for helping! I'm now going to setup a prometheus-jmx > monitoring so we can keep better track of what's going on :) > > -- > Alessandro Tagliapietra > > >

Re: How to set concrete names for state stores and internal topics backed by these

2019-12-06 Thread John Roesler
Hi Sachin, The way that Java infers generic arguments makes that case particularly obnoxious. By the way, the problem you're facing is specifically addressed by these relatively new features: * https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+

Re: Kafka Streams Topology describe format

2019-12-06 Thread John Roesler
Hi again, Sachin, I highly recommend this tool for helping to understand the topology description: https://zz85.github.io/kafka-streams-viz/ I think your interpretation of the format is pretty much spot-on. Hope this helps, -John On Fri, Dec 6, 2019, at 12:21, Sachin Mittal wrote: > Hi, > I am

Re: What timestamp is used by streams when doing windowed joins

2019-12-06 Thread John Roesler
Hi Sachin, I'd need more information to speculate about why your records are missing, but it sounds like you're suspecting something to do with the records' timestamps, so I'll just focus on answering your questions. Streams always uses the same timestamp for all operations, which is the times

Re: Reducing streams startup bandwidth usage

2019-12-07 Thread John Roesler
ys the same while commit > > latency increases. > > > > Now, I've these questions: > > - why isn't the aggregate/suppress store changelog topic throughput the > > same as the LastValueStore? Shouldn't every time it aggregates send a > > record

Re: Reducing streams startup bandwidth usage

2019-12-07 Thread John Roesler
> > Thank you so much for your help > > -- > Alessandro Tagliapietra > > > On Sat, Dec 7, 2019 at 9:14 AM John Roesler wrote: > > > Ah, yes. Glad you figured it out! > > > > Caching does not reduce EOS guarantees at all. I highly recommend using > > i

Re: How to set concrete names for state stores and internal topics backed by these

2019-12-07 Thread John Roesler
aterialized. byte[]>>as("store-name").withKeySerde(Serde).withValueSerde(Serde))) > > Note that I have custom class for Key and Value. > > Thanks > Sachin > > > > On Fri, Dec 6, 2019 at 11:02 PM John Roesler wrote: > > > Hi Sachin, > > >

Re: What timestamp is used by streams when doing windowed joins

2019-12-08 Thread John Roesler
the two records it checks if their LogAppendTime are within 5 > minutes then they would get joined. > Please let me know if I got this part right? > > Thanks > Sachin > > > > > On Sat, Dec 7, 2019 at 10:43 AM John Roesler wrote: > > > Hi Sachin, > >

Re: Kafka trunk vs master branch

2019-12-25 Thread John Roesler
Hi Sachin, Trunk is the basis for development. I’m not sure what master is for, if anything. I’ve never used it for anything or even checked it out. The numbered release branches are used to develop patch releases. Releases are created from trunk, PRs should be made against trunk, etc. Thank

Re: designing a streaming task for count and event time difference

2020-01-05 Thread John Roesler
Hey Chris, Yeah, I think what you’re really looking for is data-driven windowing, which we haven’t implemented yet. In lieu of that, you’ll want to build on top of session windows. What you can do is define an aggregate object similar to what Sachin proposed. After the aggregation, you can ju

Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-14 Thread John Roesler
Congrats, Colin, Vahid, and Manikumar! A great accomplishment, reflecting your great work. -John On Tue, Jan 14, 2020, at 11:33, Bill Bejeck wrote: > Congrats Colin, Vahid and Manikumar! Well deserved. > > -Bill > > On Tue, Jan 14, 2020 at 12:30 PM Gwen Shapira wrote: > > > Hi everyone, > >

Re: KTable Suppress not working

2020-01-17 Thread John Roesler
Hi Sushrut, That's frustrating... I haven't seen that before, but looking at the error in combination with what you say happens without suppress makes me think there's a large volume of data involved here. Probably, the problem isn't specific to suppression, but it's just that the interactions on

Re: KTable Suppress not working

2020-01-17 Thread John Roesler
wanted to directly solve the given error instead of trying something different. Thanks, -John On Fri, Jan 17, 2020, at 23:33, John Roesler wrote: > Hi Sushrut, > > That's frustrating... I haven't seen that before, but looking at the error > in combination with what you say hap

Re: KTable Suppress not working

2020-01-19 Thread John Roesler
data is flushed? > > Thanks, > Sushrut > > > > > > > On Sat, Jan 18, 2020 at 12:00 PM Sushrut Shivaswamy < > sushrut.shivasw...@gmail.com> wrote: > > > Thanks John, > > I'll try increasing the "CACHE_MAX_BYTES_BUFFERING_CONF

Re: Does Merging two kafka-streams preserve co-partitioning

2020-01-20 Thread John Roesler
Hi Yair, You should be fine! Merging does preserve copartitioning. Also processing on that partition is single-threaded, so you don’t have to worry about races on the same key in your transformer. Actually, you might want to use transformValues to inform Streams that you haven’t changed the

Re: stop

2020-01-22 Thread John Roesler
Hey Sowjanya, That won't work. The "welcome" email you got when you signed up for the mailing list has instructions for unsubscribing: > To remove your address from the list, send a message to: > Cheers, -John On Wed, Jan 22, 2020, at 10:12, Sowjanya Karangula wrote: > stop >

Re: Resource based kafka assignor

2020-01-31 Thread John Roesler
Hi Srinivas, Your approach sounds fine, as long as you don’t need the view of the assignment to be strictly consistent. As a roughy approximation, it could work. On the other hand, if you’re writing a custom assignor, you could consider using the SubscriptionInfo field of the joinGroup request

Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-10 Thread John Roesler
Hi, I’m sorry for the trouble. It looks like it was a mistake during https://github.com/apache/kafka/pull/6521 Specifically, while addressing code review comments to change a bunch of other logs from debugs to warnings, that one seems to have been included by accident: https://github.com/apach

Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-10 Thread John Roesler
;s info or debug level > though, or > how likely it is that anyone really pays attention to it? > > On Mon, Feb 10, 2020 at 9:53 AM John Roesler wrote: > > > Hi, > > > > I’m sorry for the trouble. It looks like it was a mistake during > > > > http

Re: "Skipping record for expired segment" in InMemoryWindowStore

2020-02-11 Thread John Roesler
next step, I would like to list/inspect records and their timestamps > from given partition of the changelog topic via a command line tool (or in > some other way) - to confirm if they are really stored this way. If you > have a tip on how to do it, please let me know. > > That is al

Re: Using Kafka AdminUtils

2020-02-16 Thread John Roesler
Hi Victoria, I’ve used the AdminClient for this kind of thing before. It’s the official java client for administrative actions like creating topics. You can create topics with any partition count, replication, or any other config. I hope this helps, John On Sat, Feb 15, 2020, at 22:41, Victor

Re: Using Kafka AdminUtils

2020-02-16 Thread John Roesler
c functionality (like > createTopics) remains pretty stable. > Is it considered stable enough for production? > > Thanks, > Victoria > > > On 16/02/2020, 20:15, "John Roesler" wrote: > > Hi Victoria, > > I’ve used the AdminClient for this kind

Re: KTable in Compact Topic takes too long to be updated

2020-02-19 Thread John Roesler
Hi Renato, Can you describe a little more about the nature of the join+aggregation logic? It sounds a little like the KTable represents the result of aggregating messages from the KStream? If that's the case, the operation you probably wanted was like: > KStream.groupBy().aggregate() which prod

Re: KTable in Compact Topic takes too long to be updated

2020-02-20 Thread John Roesler
a Streams joins the item2 to > Order in the streams, but the order is not updated yet. So we add the > item2 to order and instead of having: > Order(item1, item2) > > we have > > Order(item2) > I hope I made more clear our scenario. > Regards, > > Renato

Re: [ANNOUNCE] New committer: Konstantine Karantasis

2020-02-26 Thread John Roesler
Congrats, Konstantine! Awesome news. -John On Wed, Feb 26, 2020, at 16:39, Bill Bejeck wrote: > Congratulations Konstantine! Well deserved. > > -Bill > > On Wed, Feb 26, 2020 at 5:37 PM Jason Gustafson wrote: > > > The PMC for Apache Kafka has invited Konstantine Karantasis as a committer > >

  1   2   >