Re: At Least Once semantics for Kafka Streams

2017-02-05 Thread Mahendra Kariya
tps://kafka.apache.org/documentation/#consumerconfigs> apply to Kafka streams as well? On Fri, Feb 3, 2017 at 9:43 PM, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote: > Ah OK! Thanks a lot for this clarification. > > it will only commit the offsets if the value of COMMIT_INTE

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Mahendra Kariya
eams we always set enable.auto.commit to false, > and manage commits using the other commit parameter. That way streams has > more control on when offsets are committed. > > Eno > > On 6 Feb 2017, at 05:39, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > >

Re: At Least Once semantics for Kafka Streams

2017-02-06 Thread Mahendra Kariya
Ah OK! Thanks! On Mon, Feb 6, 2017, 3:09 PM Eno Thereska <eno.there...@gmail.com> wrote: > Oh, by "other" I meant the original one you started discussing: > COMMIT_INTERVAL_MS_CONFIG. > > Eno > > On 6 Feb 2017, at 09:28, Mahendra Kariya <mahendra.kar...@go

Re: At Least Once semantics for Kafka Streams

2017-02-03 Thread Mahendra Kariya
; failure happens, just retry from last committed offset (and because some > output date might already be written which cannot be undone you might > get duplicates). > > -Matthias > > On 1/29/17 8:13 PM, Mahendra Kariya wrote: > > Hey All, > > > > I am new to Kaf

Re: At Least Once semantics for Kafka Streams

2017-02-03 Thread Mahendra Kariya
Ah OK! Thanks a lot for this clarification. it will only commit the offsets if the value of COMMIT_INTERVAL_MS_CONFIG > has > passed. >

At Least Once semantics for Kafka Streams

2017-01-29 Thread Mahendra Kariya
Hey All, I am new to Kafka streams. From the documentation , it is pretty much clear that streams support at least once semantics. But I couldn't find details about how this is supported. I am interested in knowing

Re: Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-23 Thread Mahendra Kariya
+1 for such a tool. It would be of great help in a lot of use cases. On Thu, Feb 23, 2017 at 11:44 PM, Matthias J. Sax wrote: > \cc from dev > > > Forwarded Message > Subject: Re: KIP-122: Add a tool to Reset Consumer Group Offsets > Date: Thu, 23 Feb

Re: JMX metrics for replica lag time

2017-02-22 Thread Mahendra Kariya
Just wondering, for what particular Kafka version is this applicable? On Thu, Feb 23, 2017 at 2:38 AM, Guozhang Wang wrote: > Hmm that is a very good question. It seems to me that we did not add the > corresponding metrics for it when we changed the mechanism. And your >

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Mahendra Kariya
Hi Guozhang, On Thu, Feb 23, 2017 at 2:48 AM, Guozhang Wang wrote: > With that even if you do > not have any data processed the commit operation will be triggered after > that configured period of time. > The above statement is confusing. As per this thread

Re: Consumer / Streams causes deletes in __consumer_offsets?

2017-02-22 Thread Mahendra Kariya
only happen after any records have been completely > processed in the topology, and that also means that the actual commit > internal may be a bit longer than the configured value in practice. > > > Guozhang > > > > On Wed, Feb 22, 2017 at 8:15 PM, Mahendra Kariya &l

Re: Kafka consumer offset location

2017-02-09 Thread Mahendra Kariya
You can use the seekToBeginning method of KafkaConsumer. https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning(java.util.Collection) On Thu, Feb 9, 2017 at 7:56 PM, Igor Kuzmenko wrote: > Hello, I'm using new consumer to

Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
Hi, I have a couple of questions regarding Kafka streams. 1. Can we merge two streams from two different Kafka clusters? 2. Can my sink topic be in Kafka cluster different from source topic? Thanks!

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
J;context-place=forum/confluent-platform > > Thanks > Eno > > On 27 Feb 2017, at 13:37, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > > Hi, > > > > I have a couple of questions regarding Kafka streams. > > > > 1. Can we merge

Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
Hey All, We have a Kafka cluster hosted on Google Cloud. There was some network issue on the cloud and suddenly, the offset for a particular consumer group got reset to earliest and all of a sudden the lag was in millions. We aren't able to figure out what went wrong. Has anybody faced the

Re: Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
offset. When your consumer reconnects Kafka no longer has the offset > so it will reprocess from earliest. > > Michael > > > On 12 Jan 2017, at 11:13, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > > Hey All, > > > > We have a Kafka c

Re: Capacity planning for Kafka Streams

2017-03-22 Thread Mahendra Kariya
> On Wed, 22 Mar 2017 at 12:16 Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > To test Kafka streams on 0.10.2.0, we setup a new Kafka cluster with the > > latest version and used mirror maker to replicate the data from the > > 0.10.0.0 Kafka cluster. We p

Re: Capacity planning for Kafka Streams

2017-03-17 Thread Mahendra Kariya
> > > Guozhang > > > On Mon, Mar 13, 2017 at 5:12 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: > > > We are planning to migrate to the newer version of Kafka. But that's a > few > > weeks away. > > > > We will try set

Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
Hey Guozhang, Thanks a lot for these insights. We are facing the exact same problem as Tianji. Our commit frequency is also quite high. We flush almost around 16K messages per minute to Kafka at the end of the topology. Another issue that we are facing is that rocksdb is not deleting old data.

Re: Kafka Streams: lockException

2017-03-20 Thread Mahendra Kariya
<https://github.com/facebook/rocksdb/wiki/basic-operations#purging-wal-files>. But I am not sure how to set these configs. Any help would be really appreciated. Just for reference, our Kafka brokers are on v0.10.0.1 and RocksDB version is 4.8.0. On Mon, Mar 20, 2017 at 12:29 PM, Mahendra

Re: Consumers not rebalancing after replication

2017-03-21 Thread Mahendra Kariya
, Mahendra Kariya <mahendra.kar...@go-jek.com > wrote: > Hey All, > > We have six consumers in a consumer group. At times, some of the > partitions are under replicated for a while (maybe, 2 mis). During this > time, the consumers subscribed to such partitions stops gett

Consumers not rebalancing after replication

2017-03-21 Thread Mahendra Kariya
Hey All, We have six consumers in a consumer group. At times, some of the partitions are under replicated for a while (maybe, 2 mis). During this time, the consumers subscribed to such partitions stops getting data from Kafka and they become inactive for a while. But when the partitions are fully

Re: Capacity planning for Kafka Streams

2017-03-22 Thread Mahendra Kariya
18, 2017 at 5:58 AM, Mahendra Kariya <mahendra.kar...@go-jek.com > wrote: > Thanks for the heads up Guozhang! > > The problem is our brokers are on 0.10.0.x. So we will have to upgrade > them. > > On Sat, Mar 18, 2017 at 12:30 AM, Guozhang Wang <wangg...@gmail.co

Re: auto.offset.reset for Kafka streams 0.10.2.0

2017-04-11 Thread Mahendra Kariya
Streams is "earliest" > > > > cf. > > https://github.com/apache/kafka/blob/0.10.2.0/streams/ > > src/main/java/org/apache/kafka/streams/StreamsConfig.java#L405 > > > > > > -Matthias > > > > On 4/10/17 9:41 PM, Mahendra Kariya wrote: &g

Re: auto.offset.reset for Kafka streams 0.10.2.0

2017-04-10 Thread Mahendra Kariya
herwise it > starts from the offset. > > > On Tue, Apr 11, 2017 at 8:51 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: > > > Hey All, > > > > Is the auto offset reset set to "earliest" by default in Kafka streams > >

auto.offset.reset for Kafka streams 0.10.2.0

2017-04-10 Thread Mahendra Kariya
Hey All, Is the auto offset reset set to "earliest" by default in Kafka streams 0.10.2.0? I thought default was "latest". I started a new Kafka streams application with a fresh application id and it started consuming messages from the beginning.

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
Are the bug fix releases published to Maven central repo? On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska wrote: > Hi Sachin, > > In the bug fix release for 0.10.2 (and in trunk) we have now set > max.poll.interval to infinite since from our experience with streams this >

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
17 Apr 2017, at 13:25, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > > Are the bug fix releases published to Maven central repo? > > > > On Sat, Apr 1, 2017 at 12:26 PM, Eno Thereska <eno.there...@gmail.com> > > wrote: > > > >> Hi S

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-18 Thread Mahendra Kariya
> > I see the java.lang.NoSuchMethodError: org.apache.kafka.clients... error. > Looks like some jars aren't in the classpath? > > Eno > > > On 18 Apr 2017, at 12:46, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > > Hey Eno, > > > &g

Re: how can I contribute to this project?

2017-04-19 Thread Mahendra Kariya
Hi James, This page has all the information you are looking for. https://kafka.apache.org/contributing On Thu, Apr 20, 2017 at 9:32 AM, James Chain wrote: > Hi > Because I love this project, so I want to take part of it. But I'm brand > new to opensource project. > >

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Mahendra Kariya
larger socket size too. > > Thanks > Eno > > > On Mar 13, 2017, at 9:21 AM, Mahendra Kariya <mahendra.kar...@go-jek.com> > wrote: > > > > Hi Eno, > > > > Please find my answers inline. > > > > > > We are in the process of documenti

Capacity planning for Kafka Streams

2017-03-12 Thread Mahendra Kariya
Hey All, Are there some guidelines / documentation around capacity planning for Kafka streams? We have a Streams application which consumes messages from a topic with 400 partitions. At peak time, there are around 20K messages coming into that topic per second. The Streams app consumes these

Re: Capacity planning for Kafka Streams

2017-03-13 Thread Mahendra Kariya
Hi Eno, Please find my answers inline. We are in the process of documenting capacity planning for streams, stay > tuned. > This would be great! Looking forward to it. Could you send some more info on your problem? What Kafka version are you > using? > We are using Kafka 0.10.0.0. > Are the

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-17 Thread Mahendra Kariya
Thanks! On Tue, Apr 18, 2017, 12:26 AM Eno Thereska <eno.there...@gmail.com> wrote: > The RC candidate build is here: > http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/ < > http://home.apache.org/~gwenshap/kafka-0.10.2.1-rc1/> > > Eno > > On 17 Ap

Re: Kafka streams 0.10.2 Producer throwing exception eventually causing streams shutdown

2017-04-18 Thread Mahendra Kariya
ractConfig. getConfiguredInstances(AbstractConfig.java:220) at org.apache.kafka.clients.consumer.KafkaConsumer.( KafkaConsumer.java:673) ... 6 more On Tue, Apr 18, 2017 at 5:47 AM, Mahendra Kariya <mahendra.kar...@go-jek.com > wrote: > Thanks! > > On Tue, Apr 18, 2017, 12:26 A

Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
Hey All, We have a Kafka Streams application which ingests from a topic to which more than 15K messages are generated per second. The app filters a few of them, counts the number of unique filtered messages (based on one particular field) within a 1 min time window, and dumps it back to Kafka.

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
g/jira/browse/KAFKA-5055 < > https://issues.apache.org/jira/browse/KAFKA-5055>. Could you let us know > if you use any special TimeStampExtractor class, or if it is the default? > > Thanks > Eno > > On 27 Apr 2017, at 13:46, Mahendra Kariya <mahendra.kar...@go-jek.com>

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
> Streams skips records with timestamp -1 > > The metric you mentioned, reports the number of skipped record. > > Are you sure that `getEventTimestamp()` never returns -1 ? > > > > -Matthias > > On 4/27/17 10:33 AM, Mahendra Kariya wrote: > > Hey Eno, >

Re: Debugging Kafka Streams Windowing

2017-04-27 Thread Mahendra Kariya
> Can you somehow verify your output? Do you mean the Kafka streams output? In the Kafka Streams output, we do see some missing values. I have attached the Kafka Streams output (for a few hours) in the very first email of this thread for reference. Let me also summarise what we have done so

Re: Debugging Kafka Streams Windowing

2017-04-28 Thread Mahendra Kariya
dow while the count > is increasing, you should actually see multiple records per window. > > Your code is like this: > > stream.filter().groupByKey().count(TimeWindow.of(6)).to(); > > Or do you have something more complex? > > > -Matthias > > > On 4/27/1

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
data. I don't have a good fix yet, I set the retention to > something massive which I think is getting me other problems. > > Maybe that helps? > > On Tue, May 2, 2017 at 6:27 AM, Mahendra Kariya < > mahendra.kar...@go-jek.com> > wrote: > > > Hi Matthias,

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
would help to understand in what part of the program the data gets > lost. > > > -Matthias > > > On 5/2/17 11:09 PM, Mahendra Kariya wrote: > > Hi Garrett, > > > > Thanks for these insights. But we are not consuming old data. We want the > > Streams app

Re: Debugging Kafka Streams Windowing

2017-05-03 Thread Mahendra Kariya
Another question that I have is, is there a way for us detect how many messages have come out of order? And if possible, what is the delay? On Thu, May 4, 2017 at 6:17 AM, Mahendra Kariya <mahendra.kar...@go-jek.com> wrote: > Hi Matthias, > > Sure we will look into this. In the me

Re: Order of punctuate() and process() in a stream processor

2017-05-14 Thread Mahendra Kariya
We use Kafka Streams for quite a few aggregation tasks. For instance, counting the number of messages with a particular status in a 1-minute time window. We have noticed that whenever we restart a stream, we see a sudden spike in the aggregated numbers. After a few minutes, things are back to

Re: Debugging Kafka Streams Windowing

2017-05-11 Thread Mahendra Kariya
o.a.k.c.c.i.AbstractCoordinator - Discovered coordinator broker-05:6667 (id: 2147483642 rack: null) for group grp_id. On Tue, May 9, 2017 at 3:40 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Great! Glad 0.10.2.1 fixes it for you! > > -Matthias > > On 5/7/17 8:57 PM,

Re: Debugging Kafka Streams Windowing

2017-05-16 Thread Mahendra Kariya
fails, another broker can take over so that any consumer > groups that is corresponding to that offset topic partition won't be > blocked. > > > Guozhang > > > > On Mon, May 15, 2017 at 7:33 PM, Mahendra Kariya < > mahendra.kar...@go-jek.com > > wrote: &g

Re: Debugging Kafka Streams Windowing

2017-05-13 Thread Mahendra Kariya
google.com/forum/#!topic/confluent-platform/A14dkPlDlv4 > > Do you still see missing data? > > > -Matthias > > > On 5/11/17 2:39 AM, Mahendra Kariya wrote: > > Hi Matthias, > > > > We faced the issue again. The logs are below. > > > > 16:13:16.527 [Stre

Re: Debugging Kafka Streams Windowing

2017-05-15 Thread Mahendra Kariya
this exception. On Mon, May 15, 2017 at 11:28 PM, Guozhang Wang <wangg...@gmail.com> wrote: > I'm wondering if it is possibly due to KAFKA-5167? In that case, the "other > thread" will keep retrying on grabbing the lock. > > Guozhang > > > On Sat, May

Re: Debugging Kafka Streams Windowing

2017-06-08 Thread Mahendra Kariya
Mahendra, > > Did increasing those two properties do the trick? I am running into this > exact issue testing streams out on a single Kafka instance. Yet I can > manually start a consumer and read the topics fine while its busy doing > this dead stuffs. > > On Tue, M

Re: Debugging Kafka Streams Windowing

2017-05-22 Thread Mahendra Kariya
On 22 May 2017 at 16:09, Guozhang Wang wrote: > For > that issue I'd suspect that there is a network issue, or maybe the network > is just saturated already and the heartbeat request / response were not > exchanged in time between the consumer and the broker, or the sockets

Re: Debugging Kafka Streams Windowing

2017-05-07 Thread Mahendra Kariya
book keeping to > compute if out-of-order and what the delay was. > > > >>> - same for .mapValues() > >>> > >> > >> I am not sure how to check this. > > The same way as you do for filter()? > > > -Matthias > > > On 5/4/17 10:29 AM,

Re: Debugging Kafka Streams Windowing

2017-04-29 Thread Mahendra Kariya
we can get some insight from those. > > > > > > -Matthias > > > > On 4/27/17 11:41 PM, Mahendra Kariya wrote: > >> Oh good point! > >> > >> The reason why there is only one row corresponding to each time window > is > >> because it only

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Mahendra Kariya
is present is 1493694300. That's around 9 minutes of data missing. And this is just one instance. There are a lot of such instances in this file. On Sun, Apr 30, 2017 at 11:23 AM, Mahendra Kariya < mahendra.kar...@go-jek.com> wrote: > Thanks for the update Matthias! And sorry for the delayed