Re: [ANNOUNCE] New Committer: Bill Bejeck

2019-02-13 Thread Damian Guy
Congratulations Bill!

On Wed, 13 Feb 2019 at 16:51, Satish Duggana 
wrote:

> Congratulations Bill!
>
> On Thu, Feb 14, 2019 at 6:41 AM Marcelo Barbosa
>  wrote:
> >
> > Wow! Congrats Bill!
> > Cheers,
> > Barbosa
> > Em quarta-feira, 13 de fevereiro de 2019 23:03:54 BRST, Guozhang
> Wang  escreveu:
> >
> >  Hello all,
> >
> > The PMC of Apache Kafka is happy to announce that we've added Bill Bejeck
> > as our newest project committer.
> >
> > Bill has been active in the Kafka community since 2015. He has made
> > significant contributions to the Kafka Streams project with more than 100
> > PRs and 4 authored KIPs, including the streams topology optimization
> > framework. Bill's also very keen on tightening Kafka's unit test / system
> > tests coverage, which is a great value to our project codebase.
> >
> > In addition, Bill has been very active in evangelizing Kafka for stream
> > processing in the community. He has given several Kafka meetup talks in
> the
> > past year, including a presentation at Kafka Summit SF. He's also
> authored
> > a book about Kafka Streams (
> > https://www.manning.com/books/kafka-streams-in-action), as well as
> various
> > of posts in public venues like DZone as well as his personal blog (
> > http://codingjunkie.net/).
> >
> > We really appreciate the contributions and are looking forward to see
> more
> > from him. Congratulations, Bill !
> >
> >
> > Guozhang, on behalf of the Apache Kafka PMC
> >
>


Re: No referential transparency with transform() ?

2018-09-24 Thread Damian Guy
The return value from the `TransformSupplier` should always be a `new
YourTransformer(..)` as there will be one for each task and they are
potentially processed on multiple threads.

On Mon, 24 Sep 2018 at 16:07 Stéphane. D.  wrote:

> Hi,
>
> We just stumbled upon an issue with KStream.transform() where we had a
> runtime error with this code:
>
> ```
> DeduplicationTransformer transformer = new
> DeduplicationTransformer<>(...);
> stream.transform(() -> transformer, ...)
> ```
>
> The error is:
> Failed to process stream task 0_0 due to the following error:
> java.lang.IllegalStateException: This should not happen as timestamp()
> should only be called while a record is processed
>
> Whereas simply inlining the creation of the Transformer works:
>
> ```
> stream.transform(() -> new DeduplicationTransformer<>(...), ...)
> ```
>
> Is this behavior expected?
>
>
> I guess that's why tranform() takes a wrapper, to construct it when needed?
>
> Why does this happen? Is there some kind of global reference used
> internally (only construct during the execution) ?
>
>
> Thanks,
>
> Stéphane
>


Re: A question about Kafka Stream API

2018-08-01 Thread Damian Guy
The count is stored in RocksDB which is persisted to disk. It is not
in-memory unless you specifically use an InMemoryStore.

On Wed, 1 Aug 2018 at 12:53 Kyle.Hu  wrote:

> Hi, bosses:
>   I have read the word count demo of Kafka Stream API, it is cool that
> the Kafka Stream keeps the status, and I have a question about it, Weather
> it would cause memory problems when the keys accumulate a lot ?


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Damian Guy
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an application to act as a stream processor,
>
> consuming an input stream from one or more topics and producing an
>
> output stream to one or more output topics, effectively transforming the
>
> input streams to output streams.
>
>
>
> ** The Connector API allows building and running reusable producers or
>
> consumers that connect Kafka topics to existing applications or data
>
> systems. For example, a connector to a relational database might
>
> capture every change to a table.
>
>
>
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
>
>
> ** Building real-time streaming data pipelines that reliably get data
>
> between systems or applications.
>
>
>
> ** Building real-time streaming applications that transform or react
>
> to the streams of data.
>
>
>
>
>
>
>
> Apache Kafka is in use at large and small companies worldwide, including
>
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
>
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
>
>
>
>
>
>
> A big thank you for the following 131 contributors to this release!
>
>
>
> Adem Efe Gencer, Alex D, Alex Dunayevsky, Allen Wang, Andras Beni,
>
> Andy Bryant, Andy Coates, Anna Povzner, Arjun Satish, asutosh936,
>
> Attila Sasvari, bartdevylder, Benedict Jin, Bill Bejeck, Blake Miller,
>
> Boyang Chen, cburroughs, Chia-Ping Tsai, Chris Egerton, Colin P. Mccabe,
>
> Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
>
> Daniel Shuy, Daniel Wojda, Dark, David Glasser, Debasish Ghosh, Detharon,
>
> Dhruvil Shah, Dmitry Minkovsky, Dong Lin, Edoardo Comar, emmanuel Harel,
>
> Eugene Sevastyanov, Ewen Cheslack-Postava, Fedor Bobin, fedosov-alexander,
>
> Filipe Agapito, Florian Hussonnois, fredfp, Gilles Degols, gitlw, Gitomain,
>
> Guangxian, Gunju Ko, Gunnar Morling, Guozhang Wang, hmcl, huxi, huxihx,
>
> Igor Kostiakov, Ismael Juma, Jacek Laskowski, Jagadesh Adireddi,
>
> Jarek Rudzinski, Jason Gustafson, Jeff Klukas, Jeremy Custenborder,
>
> Jiangjie (Becket) Qin, Jiangjie Qin, JieFang.He, Jimin Hsieh, Joan Goyeau,
>
> Joel Hamill, John Roesler, Jon Lee, Jorge Quilcate Otoya, Jun Rao,
>
> Kamal C, khairy, Koen De Groote, Konstantine Karantasis, Lee Dongjin,
>
> Liju John, Liquan Pei, lisa2lisa, Lucas Wang, Magesh Nandakumar,
>
> Magnus Edenhill, Magnus Reftel, Manikumar Reddy, Manikumar Reddy O,
>
> manjuapu, Mats Julian Olsen, Matthias J. Sax, Max Zheng, maytals,
>
> Michael Arndt, Michael G. Noll, Mickael Maison, nafshartous, Nick Travers,
>
> nixsticks, Paolo Patierno, parafiend, Patrik Erdes, Radai Rosenblatt,
>
> Rajini Sivaram, Randall Hauch, ro7m, Robert Yokota, Roman Khlebnov,
>
> Ron Dagostino, Sandor Murakozi, Sasaki Toru, Sean Glover,
>
> Sebastian Bauersfeld, Siva Santhalingam, Stanislav Kozlovski, Stephane
> Maarek,
>
> Stuart Perks, Surabhi Dixit, Sönke Liebau, taekyung, tedyu, Thomas Leplus,
>
> UVN, Vahid Hashemian, Valentino Proietti, Viktor Somogyi, Vitaly Pushkar,
>
> Wladimir Schmidt, wushujames, Xavier Léauté, xin, yaphet,
>
> Yaswanth Kumar, ying-zheng, Yu
>
>
>
>
>
>
>
> We welcome your help and feedback. For more information on how to
>
> report problems, and to get involved, visit the project website at
>
> https://kafka.apache.org/
>
>
>
>
>
> Thank you!
>
>
>
>
>
> Regards,
>
>
>
> Rajini
>


Re: Possible bug? Duplicates when searching kafka stream state store with caching

2018-07-03 Thread Damian Guy
Hi,

When you create your window store do you have `retainDuplicates` set to
`true`? i.e., assuming you use `Stores.persistentWindowStore(...)` is the
last param `true`?

Thanks,
Damian

On Mon, 2 Jul 2018 at 17:29 Christian Henry 
wrote:

> We're using the latest Kafka (1.1.0). I'd like to note that when we
> encounter duplicates, the window is the same as well.
>
> My original code was a bit simplifier -- we also insert into the store if
> iterator.hasNext() as well, before returning null. We're using a window
> store because we have a punctuator that runs every few minutes to count
> GUIDs with similar metadata, and reports that in a healthcheck. Since our
> healthcheck window is less than the retention period of the store
> (retention period might be 1 hour, healthcheck window is ~5 min), the
> window store seemed like a good way to efficiently query all of the most
> recent data. Note that since the healthcheck punctuator needs to aggregate
> on all the recent values, it has to do a *fetchAll(start, end) *which is
> how these duplicates are affecting us.
>
> On Fri, Jun 29, 2018 at 7:32 PM, Guozhang Wang  wrote:
>
> > Hello Christian,
> >
> > Since you are calling fetch(key, start, end) I'm assuming that
> > duplicateStore
> > is a WindowedStore. With a windowed store, it is possible that a single
> key
> > can fall into multiple windows, and hence be returned from the
> > WindowStoreIterator,
> > note its type is , V>
> >
> > So I'd first want to know
> >
> > 1) which Kafka version are you using.
> > 2) why you'd need a window store, and if yes, could you consider using
> the
> > single point fetch (added in KAFKA-6560) other than the range query
> (which
> > is more expensive as well).
> >
> >
> >
> > Guozhang
> >
> >
> > On Fri, Jun 29, 2018 at 11:38 AM, Christian Henry <
> > christian.henr...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'll first describe a simplified view of relevant parts of our setup
> > (which
> > > should be enough to repro), describe the behavior we're seeing, and
> then
> > > note some information I've come across after digging in a bit.
> > >
> > > We have a kafka stream application, and one of our transform steps
> keeps
> > a
> > > state store to filter out messages with a previously seen GUID. That
> is,
> > > our transform looks like:
> > >
> > > public KeyValue transform(byte[] key, String guid) {
> > > try (WindowStoreIterator iterator =
> > > duplicateStore.fetch(correlationId, start, now)) {
> > > if (iterator.hasNext()) {
> > > return null;
> > > } else {
> > > duplicateStore.put(correlationId, some metadata);
> > > return new KeyValue<>(key, message);
> > > }
> > > }}
> > >
> > > where the duplicateStore is a persistent windowed store with caching
> > > enabled.
> > >
> > > I was debugging some tests and found that sometimes when calling
> > > *all()* or *fetchAll()
> > > *on the duplicate store and stepping through the iterator, it would
> > return
> > > the same guid more than once, even if it was only inserted into the
> store
> > > once. More specifically, if I had the following guids sent to the
> stream:
> > > [1, 2, ... 9] (for 9 values total), sometimes it would
> return
> > > 10 values, with one (or more) of the values being returned twice by the
> > > iterator. However, this would not show up with a *fetch(guid)* on that
> > > specific guid. For instance, if 1 was being returned twice by
> > > *fetchAll()*, calling *duplicateStore.fetch("1", start, end)* will
> > > still return an iterator with size of 1.
> > >
> > > I dug into this a bit more by setting a breakpoint in
> > > *SegmentedCacheFunction#compareSegmentedKeys(cacheKey,
> > > storeKey)* and watching the two input values as I looped through the
> > > iterator using "*while(iterator.hasNext()) { print(iterator.next())
> }*".
> > In
> > > one test, the duplicate value was 6, and saw the following behavior
> > > (trimming off the segment values from the byte input):
> > > -- compareSegmentedKeys(cacheKey = 6, storeKey = 2)
> > > -- next() returns 6
> > > and
> > > -- compareSegmentedKeys(cacheKey = 7, storeKey = 6)
> > > -- next() returns 6
> > > Besides those, the input values are the same and the output is as
> > expected.
> > > Additionally, a coworker noted that the number of duplicates always
> > matches
> > > the number of times *Long.compare(cacheSegmentId, storeSegmentId)
> > *returns
> > > a non-zero value, indicating that duplicates are likely arising due to
> > the
> > > segment comparison.
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: Can anyone help me to send messages in their original order?

2018-05-26 Thread Damian Guy
Hi Raymond,
If you want all messages delivered in order then you should create the
topic with 1 partition. If you want ordering guarantees for messages with
the same key, then you need to produce the messages with a key.

Using the console producer you can do that by adding
--property "parse.key=true"
--property "key.separator=,"

Regards,
Damian

On Sat, 26 May 2018 at 21:32, Raymond Xie  wrote:

> Thank you so much Hans for your enlightening, it is definitely greatly
> helpful to me as a new starter.
>
> So for my case, what is the right options I should put together to run the
> commands for producer and consumer respectively?
>
> Thanks.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sat, May 26, 2018 at 4:26 PM, Hans Jespersen  wrote:
>
> > There are two concepts in Kafka that are not always familiar to people
> who
> > have used other pub/sub systems.
> >
> > 1) partitions:
> >
> > Kafka topics are partitioned which means a single topic is sharded into
> > multiple pieces that are distributed across multiple brokers in the
> cluster
> > for parallel processing.
> >
> > Order is guaranteed per partition (not per topic).
> >
> > You can think of each kafka topic partition like an exclusive queue is
> > traditional messaging systems and order is not guaranteed when the data
> is
> > spread out across multiple queues in tradition messaging either.
> >
> > 2) keys
> >
> > Kafka messages have keys in addition the value (I.e body) and the header.
> > When messages are published with the same key they will be all be sent in
> > order to the same partition.
> >
> > If messages are published with a “null” key then they will be spread out
> > round robin across all partitions (which is what you have done).
> >
> >
> > Conclusion
> >
> > You will see ordered delivery if your either use a key when you publish
> or
> > create a topic with one partition.
> >
> >
> > -hans
> >
> > On May 26, 2018, at 7:59 AM, Raymond Xie  wrote:
> >
> > Thanks. By default, can you explain me why I received the message in
> wrong
> > order? Note there are only 9 lines from 1 to 9, but on consumer side
> their
> > original order becomes messed up.
> >
> > ~~~sent from my cell phone, sorry if there is any typo
> >
> > Hans Jespersen  于 2018年5月26日周六 上午12:16写道:
> >
> >> If you create a topic with one partition they will be in order.
> >>
> >> Alternatively if you publish with the same key for every message they
> >> will be in the same order even if your topic has more than 1 partition.
> >>
> >> Either way above will work for Kafka.
> >>
> >> -hans
> >>
> >> > On May 25, 2018, at 8:56 PM, Raymond Xie 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I just started learning Kafka and have the environment setup on my
> >> > hortonworks sandbox at home vmware.
> >> >
> >> > test.csv is what I want the producer to send out:
> >> >
> >> > more test1.csv ./kafka-console-producer.sh --broker-list
> >> > sandbox.hortonworks.com:6667 --topic kafka-topic2
> >> >
> >> > 1, abc
> >> > 2, def
> >> > ...
> >> > 8, vwx
> >> > 9, zzz
> >> >
> >> > What I received are all the content of test.csv, however, not in their
> >> > original order;
> >> >
> >> > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic
> >> > kafka-topic2
> >> >
> >> > 2, def
> >> > 1, abc
> >> > ...
> >> > 9, zzz
> >> > 8, vwx
> >> >
> >> >
> >> > I read from google that partition could be the feasible solution,
> >> however,
> >> > my questions are:
> >> >
> >> > 1. for small files like this one, shall I really do the partitioning?
> >> how
> >> > small a partition would be acceptable to ensure the sequence?
> >> > 2. for big files, each partition could still contain multiple lines,
> >> how to
> >> > ensure all the lines in each partition won't get messed up on consumer
> >> side?
> >> >
> >> >
> >> > I also want to know what is the best practice to process large volume
> of
> >> > data through kafka? There should be better way other than console
> >> command.
> >> >
> >> > Thank you very much.
> >> >
> >> >
> >> >
> >> > **
> >> > *Sincerely yours,*
> >> >
> >> >
> >> > *Raymond*
> >>
> >
>


Re: streams windowing question

2018-05-21 Thread Damian Guy
Hi Peter,

It depends on how you specify the JoinWindow. But using `JoinWindows.of(10
secocds)` would mean that a record will join with any other record with the
matching key that arrived between 10 seconds before it arrived and 10
seconds after it arrived.

So your example is correct. You would need to have a JoinWIndow large
enough to allow for the expected difference in arrival time.

Thanks,
Damian

On Sat, 19 May 2018 at 19:50 Peter Kleinmann <nnamni...@gmail.com> wrote:

> Hi Damian,
>
> thank you for the informative reply.I think this answers 95% of my
> questions (or maybe 100% and I missed the explanation).
>
> what is still unresolved is how to handle trades and risks that arrive far
> apart.
>
> Suppose we have
>
> timeToAllowAJoin = 10  seconds
>
> and we have
>
> Time | Trade | Risk
> 0s  --
>
> 1s Trade(t1, v1)
> 4s Trade(t2, v1)
> 5s Trade(t3, v1)
> 8s Risk(t2, v1)
> 10s --
> 14sRisk(t1, v1)
> 20s --
> 27sRisk(t4, v1)
> 30s --
> 37sRisk(t3, v1)
> 40s --
> 47sTrade(t4, v1)
> 50s --
>
>
> I think
> trades.join(risk, valueJoiner, JoinWindows.of(timeToAllowAJoin));
>
> will join
> Risk(t2,v1) -> Trade(t2,v1)
> for window 0-10s efficiently
>
> but I don't think I get the other joins, even running
> trades.join(risk, valueJoiner, JoinWindows.of(timeToAllowAJoin));
> for windows
> 10s - 20s
> 20s - 30s
> 40s - 40s
> If this is correct, then is there another common way to handle a scenario
> like the one above?
>
> thanks in advance,
>
> Peter
>
>
>
>
>
>
>
> On Fri, May 18, 2018 at 6:27 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Hi,
>>
>> In order to join the two streams they need to have the same key and the
>> same number of partitions in each topic. If they don't have the same key
>> you can force a repartition by using:
>>
>> `stream.selectKey(KeyValueMapper)`
>>
>> if the number of partitions is also different you could do:
>> `stream.selectKey(KeyValueMapper).through("your-new-topic")`
>>
>> You would need to create "your-new-topic" in advance with the correct
>> number of partitions.
>>
>> Now assuming that we have the same key and the same number of partitions,
>> the join is something like:
>>
>> `trades.join(risk, valueJoiner, JoinWindows.of(timeToAllowAJoin));`
>>
>> Because the trade and risk have the same key when a trade or risk event
>> arrives you will only join against the corresponding event (within the
>> time
>> window specified in the join). For example:
>>
>> Trade <t1, v1>
>> Trade <t2, v1>
>> Risk <t1, v1>  -> join(Trade <t1,v1>)
>> Risk<t2, v1> -> join(Trade <t2, v1>)
>>
>> Note: if multiple events for the same key arrive within the same
>> JoinWindow
>> you will get multiple outputs. However, you could avoid this from going
>> downstream by using `transformValues(..)` after the join. You would attach
>> a StateStore to the `transformValues`, i.e., by first creating the store
>> and then passing in the store name as a param to the method. Then when a
>> join result for a given key arrives, your transformer would first check in
>> the store if there was already a result, if there isn't a result update
>> the
>> store and send the result downstream. If there is a result you drop it.
>>
>> Regards,
>> Damian
>>
>>
>>
>>  -
>>
>> On Fri, 18 May 2018 at 22:57 Peter Kleinmann <nnamni...@gmail.com> wrote:
>>
>> > Dear community, sorry in advance for what will be a newbie question:
>> >
>> >
>> > suppose I have two topics
>> > trades
>> > risks
>> >
>> > and I want to join a trade in the trades topic to a risk message in the
>> > risks topic by fields tradeId, and version, which exist in both trade
>> and
>> > risk messages.
>> >
>> > Seems I can naturally create streams on top of each topic, but here is
>> the
>> > question:
>> >
>> > Suppose in one period between time boundary b0 and b1 trades t1 and t2
>> > arrive, and risk r1 matching t1 arrives.
>> >
>> > In the next period, risk r2 arrives matching t2.
>> >
>> > a) How do I join r2 to t2?
>> >
>> > b) How do I not reprocess t1 and r1?
>> >
>> > I'm going to have between 2 million and 25 million trades and risks a
>> day,
>> > so once a trade and risk has been matched, I dont want to handle them
>> > again.
>> >
>> > Do I need to sink the kafka topics to something like postgres, and have
>> a
>> > umatched trades table
>> > unmatched risks table
>> > matched table
>> >
>> > Many Many Thanks in Advance!!!
>> >
>>
>
>


Re: streams windowing question

2018-05-18 Thread Damian Guy
Hi,

In order to join the two streams they need to have the same key and the
same number of partitions in each topic. If they don't have the same key
you can force a repartition by using:

`stream.selectKey(KeyValueMapper)`

if the number of partitions is also different you could do:
`stream.selectKey(KeyValueMapper).through("your-new-topic")`

You would need to create "your-new-topic" in advance with the correct
number of partitions.

Now assuming that we have the same key and the same number of partitions,
the join is something like:

`trades.join(risk, valueJoiner, JoinWindows.of(timeToAllowAJoin));`

Because the trade and risk have the same key when a trade or risk event
arrives you will only join against the corresponding event (within the time
window specified in the join). For example:

Trade 
Trade 
Risk   -> join(Trade )
Risk -> join(Trade )

Note: if multiple events for the same key arrive within the same JoinWindow
you will get multiple outputs. However, you could avoid this from going
downstream by using `transformValues(..)` after the join. You would attach
a StateStore to the `transformValues`, i.e., by first creating the store
and then passing in the store name as a param to the method. Then when a
join result for a given key arrives, your transformer would first check in
the store if there was already a result, if there isn't a result update the
store and send the result downstream. If there is a result you drop it.

Regards,
Damian



 -

On Fri, 18 May 2018 at 22:57 Peter Kleinmann  wrote:

> Dear community, sorry in advance for what will be a newbie question:
>
>
> suppose I have two topics
> trades
> risks
>
> and I want to join a trade in the trades topic to a risk message in the
> risks topic by fields tradeId, and version, which exist in both trade and
> risk messages.
>
> Seems I can naturally create streams on top of each topic, but here is the
> question:
>
> Suppose in one period between time boundary b0 and b1 trades t1 and t2
> arrive, and risk r1 matching t1 arrives.
>
> In the next period, risk r2 arrives matching t2.
>
> a) How do I join r2 to t2?
>
> b) How do I not reprocess t1 and r1?
>
> I'm going to have between 2 million and 25 million trades and risks a day,
> so once a trade and risk has been matched, I dont want to handle them
> again.
>
> Do I need to sink the kafka topics to something like postgres, and have a
> umatched trades table
> unmatched risks table
> matched table
>
> Many Many Thanks in Advance!!!
>


Re: ClassCastException in KStreams job for SessionWindow aggregation

2018-05-02 Thread Damian Guy
Hi,

I think it **might** be  related to this:
  final Serializer httpSessionSerializer = new
JsonPOJOSerializer<>();
serdeProps.put("JsonPOJOClass", Http.class);
httpSessionSerializer.configure(serdeProps, false);

final Deserializer httpSessionDeserializer = new
JsonPOJODeserializer<>();
serdeProps.put("JsonPOJOClass", Http.class);
httpSessionDeserializer.configure(serdeProps, false);

Shouldn't the class be HttpSession.class ?

On Wed, 2 May 2018 at 16:12 Conrad Crampton 
wrote:

> I'm trying to window over http logs and create an HttpSession i.e. a list
> of http requests (and some other properties). However when in my aggregate
> Merger part (I think) I'm getting a classcastexception I think in when my
> sessions are being merged and cannot for the life of me work out why.
> The exception is at the bottom and I think the relevant code is here.
> Can anyone give a suggestion as to why Http is trying to be cast to
> HttpSession?
> Thanks
>
>
> final Serializer httpSerializer = new JsonPOJOSerializer<>();
> serdeProps.put("JsonPOJOClass", Http.class);
> httpSerializer.configure(serdeProps, false);
>
> final Deserializer httpDeserializer = new
> JsonPOJODeserializer<>();
> serdeProps.put("JsonPOJOClass", Http.class);
> httpDeserializer.configure(serdeProps, false);
>
> final Serde httpSerde = Serdes.serdeFrom(httpSerializer,
> httpDeserializer);
>
> final Serializer httpSessionSerializer = new
> JsonPOJOSerializer<>();
> serdeProps.put("JsonPOJOClass", Http.class);
> httpSessionSerializer.configure(serdeProps, false);
>
> final Deserializer httpSessionDeserializer = new
> JsonPOJODeserializer<>();
> serdeProps.put("JsonPOJOClass", Http.class);
> httpSessionDeserializer.configure(serdeProps, false);
>
> final Serde httpSessionSerde =
> Serdes.serdeFrom(httpSessionSerializer, httpSessionDeserializer);
>
> StreamsBuilder builder = new StreamsBuilder();
>
> KStream httpStream = null;
> try {
> httpStream = builder.stream(
> config.getString(ConfigConstants.HTTP_TOPIC_KEY),
> Consumed.with(Serdes.String(), httpSerde))
> .selectKey((s, http) -> http.getClient() +
> http.getSourceIp() + http.getUseragent())
> .groupByKey(Serialized.with(Serdes.String(),
> httpSerde))
> // window by session
>
> .windowedBy(SessionWindows.with(TimeUnit.MINUTES.toMillis(10)))
> .aggregate(
> new Initializer() {
> @Override
> public HttpSession apply() {
> return new HttpSession();
> }
> },
> new Aggregator() {
> @Override
> public HttpSession apply(String s, Http
> http, HttpSession session) {
> return session.addRequest(http);
> }
> },
> new Merger() {
>  @Override
>  public HttpSession apply(String s,
> HttpSession session, HttpSession v1)
>  log.debug("merging key {}, session {}
> with other {}", s, session, v1);
>  return session.merge(v1);}
> },
> Materialized. SessionStore byte[]>>as(config.getString(StreamsConfig.APPLICATION_ID_CONFIG) +
>
> "-session-store").withKeySerde(Serdes.String()).withValueSerde(httpSessionSerde)
> ).toStream((stringWindowed, session) ->
> (stringWindowed.key()));
> } catch (Exception e) {
> e.printStackTrace();
> }
>
> httpStream
> .filter((key, message) -> message != null)
> .filter((key, message) -> message.getClient() != null)
> .filter((key, message) ->
> httpClients.stream().anyMatch(message.getClient()::equals))
> .foreach((key, message) -> {
> log.info("message {}", message);
> });
>
> final KafkaStreams streams = new KafkaStreams(builder.build(),
> props);
> streams.start();
>
> java.lang.ClassCastException: com.secdata.gi.graph.model.Http cannot be
> cast to com.secdata.gi.graph.model.HttpSession
> at com.secdata.gi.graph.Process$$Lambda$45/1474607212.apply(Unknown Source)
> at
>
> org.apache.kafka.streams.kstream.internals.KStreamImpl$2.apply(KStreamImpl.java:157)
> at
>
> 

Re: Subject: [VOTE] 1.1.0 RC3

2018-03-21 Thread Damian Guy
Hi,

Closing of this vote thread as there will be another RC.

Thanks,
Damian

On Mon, 19 Mar 2018 at 23:47 Ismael Juma <ism...@juma.me.uk> wrote:

> Vahid,
>
> The Java 9 Connect issue is similar to the one being fixed for Trogdor in
> the following PR:
>
> https://github.com/apache/kafka/pull/4725
>
> We need to do something similar for Connect.
>
> Ismael
>
> On Fri, Mar 16, 2018 at 3:10 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Damian,
> >
> > Thanks for running the release.
> >
> > I tried building from source and running the quick start on Linux &
> > Windows with both Java 8 & 9.
> > Here's the result:
> >
> > +-+-+-+
> > | |  Linux  | Windows |
> > + +-+-+
> > | | J8 | J9 | J8 | J9 |
> > +-+++++
> > |  Build  |  + |  + |  + |  + |
> > +-+++++
> > |  Single broker  |  + |  + |  + |  + |
> > | produce/consume |||||
> > +-+++++
> > | Connect |  + |  ? |  - |  - |
> > +-+++++
> > | Streams |  + |  + |  + |  + |
> > +-+++++
> >
> > ?: Connect quickstart on Linux with Java 9 runs but the connect tool
> > throws a bunch of exceptions (https://www.codepile.net/pile/yVg8XJB8)
> > -: Connect quickstart on Windows fails (Java 8:
> > https://www.codepile.net/pile/xJGra6BP, Java 9:
> > https://www.codepile.net/pile/oREYeORK)
> >
> > Given that Windows is not an officially supported platform, and the
> > exceptions with Linux/Java 9 are not breaking the functionality, my vote
> > is a +1 (non-binding).
> >
> > Thanks.
> > --Vahid
> >
> >
> >
> >
> > From:   Damian Guy <damian@gmail.com>
> > To: d...@kafka.apache.org, users@kafka.apache.org,
> > kafka-clie...@googlegroups.com
> > Date:   03/15/2018 07:55 AM
> > Subject:Subject: [VOTE] 1.1.0 RC3
> >
> >
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the fourth candidate for release of Apache Kafka 1.1.0.
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > apache.org_confluence_pages_viewpage.action-3FpageId-
> > 3D75957546=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_
> >
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2s
> > gX46mjZPws01U=cKgJtQXXRauZ3HSAoSbsC9SLVTAkO-pbLdPrOCBuJzE=
> >
> >
> > A few highlights:
> >
> > * Significant Controller improvements (much faster and session expiration
> > edge cases fixed)
> > * Data balancing across log directories (JBOD)
> > * More efficient replication when the number of partitions is large
> > * Dynamic Broker Configs
> > * Delegation tokens (KIP-48)
> > * Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)
> >
> >
> > Release notes for the 1.1.0 release:
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__home.
> > apache.org_-7Edamianguy_kafka-2D1.1.0-2Drc3_RELEASE-5FNOTES.
> > html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> > kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=
> > 26FgbzRhKImhxyEkB4KzDPG-l8W_Y99xU6LykOAgpFI=
> >
> >
> > *** Please download, test and vote by Monday, March 19, 9am PDT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.
> > apache.org_KEYS=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_
> >
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2s
> > gX46mjZPws01U=xlnrfgxVFMRCKk8pTOhujyC-Um4ogtsxK6Xwks6mc3U=
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__home.
> > apache.org_-7Edamianguy_kafka-2D1.1.0-2Drc3_=DwIBaQ=jf_
> > iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=
> > Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=
> > ulHUeYnWIp28Gsn4VV1NK3FrGV4Jn5rUpuU6tvgekME=
> >
> >
> > * Maven artifacts to be voted upon:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__
> > repository.apache.org_content_groups_staging_=DwIBaQ=jf_
> > iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=
> > Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=G9o4hXVXF0bjL_
> > a3Wocod9GUEfy9WBBgoGa2u

Re: Subject: [VOTE] 1.1.0 RC3

2018-03-18 Thread Damian Guy
We have a green system test build -
https://jenkins.confluent.io/job/system-test-kafka/job/1.1/43/
<https://jenkins.confluent.io/job/system-test-kafka/job/1.1/>

Thanks,
Damian

On Fri, 16 Mar 2018 at 22:10 Vahid S Hashemian <vahidhashem...@us.ibm.com>
wrote:

> Hi Damian,
>
> Thanks for running the release.
>
> I tried building from source and running the quick start on Linux &
> Windows with both Java 8 & 9.
> Here's the result:
>
> +-+-+-+
> | |  Linux  | Windows |
> + +-+-+
> | | J8 | J9 | J8 | J9 |
> +-+++++
> |  Build  |  + |  + |  + |  + |
> +-+++++
> |  Single broker  |  + |  + |  + |  + |
> | produce/consume |||||
> +-+++++
> | Connect |  + |  ? |  - |  - |
> +-+++++
> | Streams |  + |  + |  + |  + |
> +-+++++
>
> ?: Connect quickstart on Linux with Java 9 runs but the connect tool
> throws a bunch of exceptions (https://www.codepile.net/pile/yVg8XJB8)
> -: Connect quickstart on Windows fails (Java 8:
> https://www.codepile.net/pile/xJGra6BP, Java 9:
> https://www.codepile.net/pile/oREYeORK)
>
> Given that Windows is not an officially supported platform, and the
> exceptions with Linux/Java 9 are not breaking the functionality, my vote
> is a +1 (non-binding).
>
> Thanks.
> --Vahid
>
>
>
>
> From:   Damian Guy <damian@gmail.com>
> To: d...@kafka.apache.org, users@kafka.apache.org,
> kafka-clie...@googlegroups.com
> Date:   03/15/2018 07:55 AM
> Subject:Subject: [VOTE] 1.1.0 RC3
>
>
>
> Hello Kafka users, developers and client-developers,
>
> This is the fourth candidate for release of Apache Kafka 1.1.0.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D75957546=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=cKgJtQXXRauZ3HSAoSbsC9SLVTAkO-pbLdPrOCBuJzE=
>
>
> A few highlights:
>
> * Significant Controller improvements (much faster and session expiration
> edge cases fixed)
> * Data balancing across log directories (JBOD)
> * More efficient replication when the number of partitions is large
> * Dynamic Broker Configs
> * Delegation tokens (KIP-48)
> * Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)
>
>
> Release notes for the 1.1.0 release:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Edamianguy_kafka-2D1.1.0-2Drc3_RELEASE-5FNOTES.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=26FgbzRhKImhxyEkB4KzDPG-l8W_Y99xU6LykOAgpFI=
>
>
> *** Please download, test and vote by Monday, March 19, 9am PDT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.apache.org_KEYS=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=xlnrfgxVFMRCKk8pTOhujyC-Um4ogtsxK6Xwks6mc3U=
>
>
> * Release artifacts to be voted upon (source and binary):
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Edamianguy_kafka-2D1.1.0-2Drc3_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=ulHUeYnWIp28Gsn4VV1NK3FrGV4Jn5rUpuU6tvgekME=
>
>
> * Maven artifacts to be voted upon:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_groups_staging_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=G9o4hXVXF0bjL_a3Wocod9GUEfy9WBBgoGa2u6yFKQw=
>
>
> * Javadoc:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Edamianguy_kafka-2D1.1.0-2Drc3_javadoc_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=2auaI4IIJhEORGYm1Kdpxt5TDHh0PzSvtK77lC3SJVY=
>
>
> * Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_kafka_tree_1.1.0-2Drc3=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=Qn2GySTKcOV5MFr3WDl63BDv7pTd2sgX46mjZPws01U=h7G8XPD8vAWl_gqySi2Iocag5NnP32IT_PyirPC3Lss=
>
>
>
> * Documentation:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.apache.org_11_documentation.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7N

Subject: [VOTE] 1.1.0 RC3

2018-03-15 Thread Damian Guy
Hello Kafka users, developers and client-developers,

This is the fourth candidate for release of Apache Kafka 1.1.0.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957546

A few highlights:

* Significant Controller improvements (much faster and session expiration
edge cases fixed)
* Data balancing across log directories (JBOD)
* More efficient replication when the number of partitions is large
* Dynamic Broker Configs
* Delegation tokens (KIP-48)
* Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)


Release notes for the 1.1.0 release:
http://home.apache.org/~damianguy/kafka-1.1.0-rc3/RELEASE_NOTES.html

*** Please download, test and vote by Monday, March 19, 9am PDT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-1.1.0-rc3/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-1.1.0-rc3/javadoc/

* Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
https://github.com/apache/kafka/tree/1.1.0-rc3


* Documentation:
http://kafka.apache.org/11/documentation.html


* Protocol:
http://kafka.apache.org/11/protocol.html



* Successful Jenkins builds for the 1.1 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/82/

Note: I'll update with passing system tests link once successful. If not
then we can do another RC

/**

Thanks,
Damian


Re: [VOTE] 1.1.0 RC2

2018-03-14 Thread Damian Guy
Thanks for pointing out Satish. Links updated:



Hello Kafka users, developers and client-developers,

This is the third candidate for release of Apache Kafka 1.1.0.

This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
Please see the release plan for more details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957546

A few highlights:

* Significant Controller improvements (much faster and session expiration
edge cases fixed)
* Data balancing across log directories (JBOD)
* More efficient replication when the number of partitions is large
* Dynamic Broker Configs
* Delegation tokens (KIP-48)
* Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)

Release notes for the 1.1.0 release:
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 16, 1pm PDT>

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/javadoc/

* Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
https://github.com/apache/kafka/tree/1.1.0-rc2


* Documentation:
http://kafka.apache.org/11/documentation.html
<http://kafka.apache.org/1/documentation.html>

* Protocol:
http://kafka.apache.org/11/protocol.html
<http://kafka.apache.org/1/protocol.html>

* Successful Jenkins builds for the 1.1 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/78
System tests: https://jenkins.confluent.io/job/system-test-kafka/job/1.1/38/

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75957546

On Wed, 14 Mar 2018 at 04:41 Satish Duggana <satish.dugg...@gmail.com>
wrote:

> Hi Damian,
> Thanks for starting vote thread for 1.1.0 release.
>
> There may be a typo on the tag to be voted upon for this release candidate.
> I guess it should be https://github.com/apache/kafka/tree/1.1.0-rc2
> instead
> of https://github.com/apache/kafka/tree/1.1.0-rc.
>
> On Wed, Mar 14, 2018 at 8:27 AM, Satish Duggana <satish.dugg...@gmail.com>
> wrote:
>
> > Hi Damian,
> > Given release plan link in earlier mail is about 1.0 release. You may
> want
> > to replace that with 1.1.0 release plan link[1].
> >
> > 1 - https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=75957546
> >
> > Thanks,
> > Satish.
> >
> > On Wed, Mar 14, 2018 at 12:47 AM, Damian Guy <damian@gmail.com>
> wrote:
> >
> >> Hello Kafka users, developers and client-developers,
> >>
> >> This is the third candidate for release of Apache Kafka 1.1.0.
> >>
> >> This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
> >> Please see the release plan for more details:
> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913
> >>
> >> A few highlights:
> >>
> >> * Significant Controller improvements (much faster and session
> expiration
> >> edge cases fixed)
> >> * Data balancing across log directories (JBOD)
> >> * More efficient replication when the number of partitions is large
> >> * Dynamic Broker Configs
> >> * Delegation tokens (KIP-48)
> >> * Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)
> >>
> >> Release notes for the 1.1.0 release:
> >> http://home.apache.org/~damianguy/kafka-1.1.0-rc2/RELEASE_NOTES.html
> >>
> >> *** Please download, test and vote by Friday, March 16, 1pm PDT>
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> http://kafka.apache.org/KEYS
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> http://home.apache.org/~damianguy/kafka-1.1.0-rc2/
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/
> >>
> >> * Javadoc:
> >> http://home.apache.org/~damianguy/kafka-1.1.0-rc2/javadoc/
> >>
> >> * Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
> >> https://github.com/apache/kafka/tree/1.1.0-rc
> >> <https://github.com/apache/kafka/tree/1.1.0-rc1>2
> >>
> >>
> >> * Documentation:
> >> http://kafka.apache.org/11/documentation.html
> >> <http://kafka.apache.org/1/documentation.html>
> >>
> >> * Protocol:
> >> http://kafka.apache.org/11/protocol.html
> >> <http://kafka.apache.org/1/protocol.html>
> >>
> >> * Successful Jenkins builds for the 1.1 branch:
> >> Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/78
> >> System tests: https://jenkins.confluent.io/j
> >> ob/system-test-kafka/job/1.1/38/
> >>
> >> /**
> >>
> >> Thanks,
> >> Damian
> >>
> >>
> >> *
> >>
> >
> >
>


[VOTE] 1.1.0 RC2

2018-03-13 Thread Damian Guy
Hello Kafka users, developers and client-developers,

This is the third candidate for release of Apache Kafka 1.1.0.

This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
Please see the release plan for more details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913

A few highlights:

* Significant Controller improvements (much faster and session expiration
edge cases fixed)
* Data balancing across log directories (JBOD)
* More efficient replication when the number of partitions is large
* Dynamic Broker Configs
* Delegation tokens (KIP-48)
* Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)

Release notes for the 1.1.0 release:
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 16, 1pm PDT>

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-1.1.0-rc2/javadoc/

* Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
https://github.com/apache/kafka/tree/1.1.0-rc
2


* Documentation:
http://kafka.apache.org/11/documentation.html


* Protocol:
http://kafka.apache.org/11/protocol.html


* Successful Jenkins builds for the 1.1 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/78
System tests: https://jenkins.confluent.io/job/system-test-kafka/job/1.1/38/

/**

Thanks,
Damian


*


Re: [VOTE] 1.1.0 RC1

2018-03-09 Thread Damian Guy
Hi Jeff,

Thanks, we will look into this.

Regards,
Damian

On Thu, 8 Mar 2018 at 18:27 Jeff Chao <jc...@salesforce.com> wrote:

> Hello,
>
> We at Heroku have run 1.1.0 RC1 through our normal performance and
> regression test suite and have found performance to be comparable to 1.0.0.
>
> That said, we're however -1 (non-binding) since this release includes
> Zookeeper 3.4.11 <https://issues.apache.org/jira/browse/KAFKA-6390> which
> is affected by the critical regression ZOOKEEPER-2960
> <https://issues.apache.org/jira/browse/ZOOKEEPER-2960>. As 3.4.12 isn't
> released yet, it might be better to have 3.4.10 included instead.
>
> Jeff
> Heroku
>
>
> On Tue, Mar 6, 2018 at 1:19 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > +1
> >
> > Checked signature
> > Ran test suite - apart from flaky testMetricsLeak, other tests passed.
> >
> > On Tue, Mar 6, 2018 at 2:45 AM, Damian Guy <damian@gmail.com> wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second candidate for release of Apache Kafka 1.1.0.
> > >
> > > This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
> > > Please see the release plan for more details:
> > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=71764913
> > >
> > > A few highlights:
> > >
> > > * Significant Controller improvements (much faster and session
> expiration
> > > edge cases fixed)
> > > * Data balancing across log directories (JBOD)
> > > * More efficient replication when the number of partitions is large
> > > * Dynamic Broker Configs
> > > * Delegation tokens (KIP-48)
> > > * Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)
> > >
> > > Release notes for the 1.1.0 release:
> > > http://home.apache.org/~damianguy/kafka-1.1.0-rc1/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Friday, March 9th, 5pm PT
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~damianguy/kafka-1.1.0-rc1/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~damianguy/kafka-1.1.0-rc1/javadoc/
> > >
> > > * Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
> > > https://github.com/apache/kafka/tree/1.1.0-rc1
> > >
> > >
> > > * Documentation:
> > > http://kafka.apache.org/11/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/11/protocol.html
> > >
> > > * Successful Jenkins builds for the 1.1 branch:
> > > Unit/integration tests:
> https://builds.apache.org/job/kafka-1.1-jdk7/68
> > > System tests: https://jenkins.confluent.io/
> > job/system-test-kafka/job/1.1/
> > > 30/
> > >
> > > /**
> > >
> > > Thanks,
> > > Damian Guy
> > >
> >
>


Re: Kafka Streams - "state store may have migrated to another instance"

2018-03-07 Thread Damian Guy
If you run multiple instances of your app you may not be able to access the
state store you are trying to access from the instance you are trying from,
i.e., it may be on another instance. If streams is in the RUNNING state,
this would seem to be the issue.

On Wed, 7 Mar 2018 at 15:56 detharon <detha...@gmail.com> wrote:

> I'm afraid that's not what I'm looking for, as I'm just trying to retrieve
> the local data, from inside my application (but from outside the stream
> topology), and in some cases it becomes impossible. That is, the stream
> changes its state from "rebalancing" to "running", but the store is remains
> inaccessible.
>
> I don't want to have access to state stores located on other instances.
>
> In pseudocode:
> streams.start(); <- here's where the stream starts
> loopOverLocalData(streams) <- periodical, asynchronous calls to
> streams.store, which, if the problem occurs, always result in exceptions
> being thrown. I expected to retrieve the local data this way.
>
> On 7 March 2018 at 16:20, Damian Guy <damian@gmail.com> wrote:
>
> > If you have multiple streams instances then the store might only be
> > available on one of the instances. Using `KafkaStreams.store(..)` will
> only
> > locate stores that are currently accessible by that instance. If you need
> > to be able to locate stores on other instances, then you should probably
> > have a read of:
> > https://kafka.apache.org/10/documentation/streams/
> > developer-guide/interactive-queries.html#querying-remote-
> > state-stores-for-the-entire-app
> >
> > There is also a convenient 3rd party lib that can help you with this:
> > https://github.com/lightbend/kafka-streams-query
> >
> > On Wed, 7 Mar 2018 at 14:07 detharon <detha...@gmail.com> wrote:
> >
> > > Hello, I'm experiencing issues accessing the state stores outside the
> > Kafka
> > > stream. My application queries the state store every n seconds using
> the
> > > .all() method to retrieve all key value pairs. I know that the state
> > store
> > > might not be available, so I guard against the
> InvalidStateStoreException
> > > and in case it's caught, I simply invoke the .store method on my
> stream,
> > in
> > > order to get a new store. The problem is, that for some reason the
> store
> > > never becomes available.
> > >
> > > Some facts:
> > > - During that time stream processing works correctly, and it
> successfully
> > > puts and gets data to and from the store.
> > > - Stream is in "running" state. I've started logging that because under
> > > normal circumstances this exception is being thrown when the stream is
> in
> > > "rebalancing" phase, but after a while it's gone, as expected, so I can
> > > distinguish between these two cases.
> > > - It only happens when I run my app in multiple instances.
> > >
> > > I've set the log level to debug, but I can't see anything suspicious in
> > the
> > > logs, but maybe there's something particular I should pay attention to?
> > > I access the store from inside an Akka actor, to which I pass the
> > reference
> > > to KafkaStreams object, if that matters.
> > >
> > > I ran out of ideas what might have caused that behavior, so any help
> will
> > > be greatly appreciated.
> > >
> >
>


Re: Kafka Streams - "state store may have migrated to another instance"

2018-03-07 Thread Damian Guy
If you have multiple streams instances then the store might only be
available on one of the instances. Using `KafkaStreams.store(..)` will only
locate stores that are currently accessible by that instance. If you need
to be able to locate stores on other instances, then you should probably
have a read of:
https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html#querying-remote-state-stores-for-the-entire-app

There is also a convenient 3rd party lib that can help you with this:
https://github.com/lightbend/kafka-streams-query

On Wed, 7 Mar 2018 at 14:07 detharon  wrote:

> Hello, I'm experiencing issues accessing the state stores outside the Kafka
> stream. My application queries the state store every n seconds using the
> .all() method to retrieve all key value pairs. I know that the state store
> might not be available, so I guard against the InvalidStateStoreException
> and in case it's caught, I simply invoke the .store method on my stream, in
> order to get a new store. The problem is, that for some reason the store
> never becomes available.
>
> Some facts:
> - During that time stream processing works correctly, and it successfully
> puts and gets data to and from the store.
> - Stream is in "running" state. I've started logging that because under
> normal circumstances this exception is being thrown when the stream is in
> "rebalancing" phase, but after a while it's gone, as expected, so I can
> distinguish between these two cases.
> - It only happens when I run my app in multiple instances.
>
> I've set the log level to debug, but I can't see anything suspicious in the
> logs, but maybe there's something particular I should pay attention to?
> I access the store from inside an Akka actor, to which I pass the reference
> to KafkaStreams object, if that matters.
>
> I ran out of ideas what might have caused that behavior, so any help will
> be greatly appreciated.
>


[VOTE] 1.1.0 RC1

2018-03-06 Thread Damian Guy
Hello Kafka users, developers and client-developers,

This is the second candidate for release of Apache Kafka 1.1.0.

This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
Please see the release plan for more details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913

A few highlights:

* Significant Controller improvements (much faster and session expiration
edge cases fixed)
* Data balancing across log directories (JBOD)
* More efficient replication when the number of partitions is large
* Dynamic Broker Configs
* Delegation tokens (KIP-48)
* Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)

Release notes for the 1.1.0 release:
http://home.apache.org/~damianguy/kafka-1.1.0-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Friday, March 9th, 5pm PT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-1.1.0-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-1.1.0-rc1/javadoc/

* Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
https://github.com/apache/kafka/tree/1.1.0-rc1


* Documentation:
http://kafka.apache.org/11/documentation.html

* Protocol:
http://kafka.apache.org/11/protocol.html

* Successful Jenkins builds for the 1.1 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/68
System tests: https://jenkins.confluent.io/job/system-test-kafka/job/1.1/30/

/**

Thanks,
Damian Guy


Re: [kafka-clients] Re: [VOTE] 1.1.0 RC0

2018-03-02 Thread Damian Guy
Thanks Jun

On Fri, 2 Mar 2018 at 02:25 Jun Rao <j...@confluent.io> wrote:

> KAFKA-6111 is now merged to 1.1 branch.
>
> Thanks,
>
> Jun
>
> On Thu, Mar 1, 2018 at 2:50 PM, Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Damian,
>>
>> It would also be useful to include KAFKA-6111, which prevents 
>> deleteLogDirEventNotifications
>> path to be deleted correctly from Zookeeper. The patch should be committed
>> later today.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Mar 1, 2018 at 1:47 PM, Damian Guy <damian@gmail.com> wrote:
>>
>>> Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Thu, 1 Mar 2018 at 19:10 Jason Gustafson <ja...@confluent.io> wrote:
>>>
>>>> The fix has been merged to 1.1.
>>>>
>>>> Thanks,
>>>> Jason
>>>>
>>>> On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy <damian@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Jason,
>>>> >
>>>> > Ok - thanks. Let me know how you get on.
>>>> >
>>>> > Cheers,
>>>> > Damian
>>>> >
>>>> > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson <ja...@confluent.io>
>>>> wrote:
>>>> >
>>>> > > Hey Damian,
>>>> > >
>>>> > > I think we should consider
>>>> > > https://issues.apache.org/jira/browse/KAFKA-6593
>>>> > > for the release. I have a patch available, but still working on
>>>> > validating
>>>> > > both the bug and the fix.
>>>> > >
>>>> > > -Jason
>>>> > >
>>>> > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
>>>> matth...@confluent.io>
>>>> > > wrote:
>>>> > >
>>>> > > > No. Both will be released.
>>>> > > >
>>>> > > > -Matthias
>>>> > > >
>>>> > > > On 2/28/18 6:32 AM, Marina Popova wrote:
>>>> > > > > Sorry, maybe a stupid question, but:
>>>> > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now
>>>> 1.1.0 RC0
>>>> > is
>>>> > > > coming up...
>>>> > > > > Does it mean 1.0.1 will be abandoned and we should be looking
>>>> forward
>>>> > > to
>>>> > > > 1.1.0 instead?
>>>> > > > >
>>>> > > > > thanks!
>>>> > > > >
>>>> > > > > ​Sent with ProtonMail Secure Email.​
>>>> > > > >
>>>> > > > > ‐‐‐ Original Message ‐‐‐
>>>> > > > >
>>>> > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
>>>> > > > vahidhashem...@us.ibm.com> wrote:
>>>> > > > >
>>>> > > > >> +1 (non-binding)
>>>> > > > >>
>>>> > > > >> Built the source and ran quickstart (including streams)
>>>> successfully
>>>> > > on
>>>> > > > >>
>>>> > > > >> Ubuntu (with both Java 8 and Java 9).
>>>> > > > >>
>>>> > > > >> I understand the Windows platform is not officially supported,
>>>> but I
>>>> > > ran
>>>> > > > >>
>>>> > > > >> the same on Windows 10, and except for Step 7 (Connect)
>>>> everything
>>>> > > else
>>>> > > > >>
>>>> > > > >> worked fine.
>>>> > > > >>
>>>> > > > >> There are a number of warning and errors (including
>>>> > > > >>
>>>> > > > >> java.lang.ClassNotFoundException). Here's the final error
>>>> message:
>>>> > > > >>
>>>> > > > >>> bin\\windows\\connect-standalone.bat
>>>> config\\connect-standalone.
>>>> > > > properties
>>>> > > > >>
>>>> > > > >> config\\connect-file-source.properties
>>>> config\\connect-file-sink.
>>>> > > > pro

Re: [VOTE] 1.1.0 RC0

2018-03-01 Thread Damian Guy
Thanks Jason. Assuming the system tests pass i'll cut RC1 tomorrow.

Thanks,
Damian

On Thu, 1 Mar 2018 at 19:10 Jason Gustafson <ja...@confluent.io> wrote:

> The fix has been merged to 1.1.
>
> Thanks,
> Jason
>
> On Wed, Feb 28, 2018 at 11:35 AM, Damian Guy <damian@gmail.com> wrote:
>
> > Hi Jason,
> >
> > Ok - thanks. Let me know how you get on.
> >
> > Cheers,
> > Damian
> >
> > On Wed, 28 Feb 2018 at 19:23 Jason Gustafson <ja...@confluent.io> wrote:
> >
> > > Hey Damian,
> > >
> > > I think we should consider
> > > https://issues.apache.org/jira/browse/KAFKA-6593
> > > for the release. I have a patch available, but still working on
> > validating
> > > both the bug and the fix.
> > >
> > > -Jason
> > >
> > > On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > No. Both will be released.
> > > >
> > > > -Matthias
> > > >
> > > > On 2/28/18 6:32 AM, Marina Popova wrote:
> > > > > Sorry, maybe a stupid question, but:
> > > > >  I see that Kafka 1.0.1 RC2 is still not released, but now 1.1.0
> RC0
> > is
> > > > coming up...
> > > > > Does it mean 1.0.1 will be abandoned and we should be looking
> forward
> > > to
> > > > 1.1.0 instead?
> > > > >
> > > > > thanks!
> > > > >
> > > > > ​Sent with ProtonMail Secure Email.​
> > > > >
> > > > > ‐‐‐ Original Message ‐‐‐
> > > > >
> > > > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
> > > > vahidhashem...@us.ibm.com> wrote:
> > > > >
> > > > >> +1 (non-binding)
> > > > >>
> > > > >> Built the source and ran quickstart (including streams)
> successfully
> > > on
> > > > >>
> > > > >> Ubuntu (with both Java 8 and Java 9).
> > > > >>
> > > > >> I understand the Windows platform is not officially supported,
> but I
> > > ran
> > > > >>
> > > > >> the same on Windows 10, and except for Step 7 (Connect) everything
> > > else
> > > > >>
> > > > >> worked fine.
> > > > >>
> > > > >> There are a number of warning and errors (including
> > > > >>
> > > > >> java.lang.ClassNotFoundException). Here's the final error message:
> > > > >>
> > > > >>> bin\\windows\\connect-standalone.bat config\\connect-standalone.
> > > > properties
> > > > >>
> > > > >> config\\connect-file-source.properties config\\connect-file-sink.
> > > > properties
> > > > >>
> > > > >> ...
> > > > >>
> > > > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector error
> > > > >>
> > > > >> (org.apache.kafka.connect.cli.ConnectStandalone)
> > > > >>
> > > > >> java.lang.NoClassDefFoundError:
> > > > >>
> > > > >> org/apache/kafka/connect/transforms/util/RegexValidator
> > > > >>
> > > > >> at
> > > > >>
> > > > >> org.apache.kafka.connect.runtime.SinkConnectorConfig.<
> > > > clinit>(SinkConnectorConfig.java:46)
> > > > >>
> > > > >> at
> > > > >>
> > > > >>
> > > > >> org.apache.kafka.connect.runtime.AbstractHerder.
> > > > validateConnectorConfig(AbstractHerder.java:263)
> > > > >>
> > > > >> at
> > > > >>
> > > > >> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.
> > > > putConnectorConfig(StandaloneHerder.java:164)
> > > > >>
> > > > >> at
> > > > >>
> > > > >> org.apache.kafka.connect.cli.ConnectStandalone.main(
> > > > ConnectStandalone.java:107)
> > > > >>
> > > > >> Caused by: java.lang.ClassNotFoundException:
> > > > >>
> > > > >> org.apache.kafka.connect.transforms.util.RegexValidator
> > > > >>
> > > > >> at
> > > > >>
> > > &g

Re: [VOTE] 1.1.0 RC0

2018-02-28 Thread Damian Guy
Hi Jason,

Ok - thanks. Let me know how you get on.

Cheers,
Damian

On Wed, 28 Feb 2018 at 19:23 Jason Gustafson <ja...@confluent.io> wrote:

> Hey Damian,
>
> I think we should consider
> https://issues.apache.org/jira/browse/KAFKA-6593
> for the release. I have a patch available, but still working on validating
> both the bug and the fix.
>
> -Jason
>
> On Wed, Feb 28, 2018 at 9:34 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > No. Both will be released.
> >
> > -Matthias
> >
> > On 2/28/18 6:32 AM, Marina Popova wrote:
> > > Sorry, maybe a stupid question, but:
> > >  I see that Kafka 1.0.1 RC2 is still not released, but now 1.1.0 RC0 is
> > coming up...
> > > Does it mean 1.0.1 will be abandoned and we should be looking forward
> to
> > 1.1.0 instead?
> > >
> > > thanks!
> > >
> > > ​Sent with ProtonMail Secure Email.​
> > >
> > > ‐‐‐ Original Message ‐‐‐
> > >
> > > On February 26, 2018 6:28 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Built the source and ran quickstart (including streams) successfully
> on
> > >>
> > >> Ubuntu (with both Java 8 and Java 9).
> > >>
> > >> I understand the Windows platform is not officially supported, but I
> ran
> > >>
> > >> the same on Windows 10, and except for Step 7 (Connect) everything
> else
> > >>
> > >> worked fine.
> > >>
> > >> There are a number of warning and errors (including
> > >>
> > >> java.lang.ClassNotFoundException). Here's the final error message:
> > >>
> > >>> bin\\windows\\connect-standalone.bat config\\connect-standalone.
> > properties
> > >>
> > >> config\\connect-file-source.properties config\\connect-file-sink.
> > properties
> > >>
> > >> ...
> > >>
> > >> \[2018-02-26 14:55:56,529\] ERROR Stopping after connector error
> > >>
> > >> (org.apache.kafka.connect.cli.ConnectStandalone)
> > >>
> > >> java.lang.NoClassDefFoundError:
> > >>
> > >> org/apache/kafka/connect/transforms/util/RegexValidator
> > >>
> > >> at
> > >>
> > >> org.apache.kafka.connect.runtime.SinkConnectorConfig.<
> > clinit>(SinkConnectorConfig.java:46)
> > >>
> > >> at
> > >>
> > >>
> > >> org.apache.kafka.connect.runtime.AbstractHerder.
> > validateConnectorConfig(AbstractHerder.java:263)
> > >>
> > >> at
> > >>
> > >> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.
> > putConnectorConfig(StandaloneHerder.java:164)
> > >>
> > >> at
> > >>
> > >> org.apache.kafka.connect.cli.ConnectStandalone.main(
> > ConnectStandalone.java:107)
> > >>
> > >> Caused by: java.lang.ClassNotFoundException:
> > >>
> > >> org.apache.kafka.connect.transforms.util.RegexValidator
> > >>
> > >> at
> > >>
> > >> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
> > BuiltinClassLoader.java:582)
> > >>
> > >> at
> > >>
> > >> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.
> > loadClass(ClassLoaders.java:185)
> > >>
> > >> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
> > >>
> > >> ... 4 more
> > >>
> > >> Thanks for running the release.
> > >>
> > >> --Vahid
> > >>
> > >> From: Damian Guy damian@gmail.com
> > >>
> > >> To: d...@kafka.apache.org, users@kafka.apache.org,
> > >>
> > >> kafka-clie...@googlegroups.com
> > >>
> > >> Date: 02/24/2018 08:16 AM
> > >>
> > >> Subject: \[VOTE\] 1.1.0 RC0
> > >>
> > >> Hello Kafka users, developers and client-developers,
> > >>
> > >> This is the first candidate for release of Apache Kafka 1.1.0.
> > >>
> > >> This is minor version release of Apache Kakfa. It Includes 29 new
> KIPs.
> > >>
> > >> Please see the release plan for more details:
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
&g

[VOTE] 1.1.0 RC0

2018-02-24 Thread Damian Guy
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 1.1.0.

This is minor version release of Apache Kakfa. It Includes 29 new KIPs.
Please see the release plan for more details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71764913

A few highlights:

* Significant Controller improvements (much faster and session expiration
edge cases fixed)
* Data balancing across log directories (JBOD)
* More efficient replication when the number of partitions is large
* Dynamic Broker Configs
* Delegation tokens (KIP-48)
* Kafka Streams API improvements (KIP-205 / 210 / 220 / 224 / 239)

Release notes for the 1.1.0 release:
http://home.apache.org/~damianguy/kafka-1.1.0-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Wednesday, February 28th, 5pm PT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-1.1.0-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-1.1.0-rc0/javadoc/

* Tag to be voted upon (off 1.1 branch) is the 1.1.0 tag:
https://github.com/apache/kafka/tree/1.1.0-rc0


* Documentation:
http://kafka.apache.org/11/documentation.html

* Protocol:
http://kafka.apache.org/11/protocol.html

* Successful Jenkins builds for the 1.1 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-1.1-jdk7/63/
System tests: https://jenkins.confluent.io/job/system-test-kafka/job/1.1/21/


Finally, just want to say thanks to the people that have helped me get the
System Tests green this week. In particular:

Konstantine Karantasis
Randall Hauch
Colin Mcabe


/**

Thanks,
Damian


Re: [VOTE] 1.0.1 RC1

2018-02-13 Thread Damian Guy
+1

Ran tests, verified streams quickstart works

On Tue, 13 Feb 2018 at 17:52 Damian Guy <damian@gmail.com> wrote:

> Thanks Ewen - i had the staging repo set up as profile that i forgot to
> add to my maven command. All good.
>
> On Tue, 13 Feb 2018 at 17:41 Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
>> Damian,
>>
>> Which quickstart are you referring to? The streams quickstart only
>> executes
>> pre-built stuff afaict.
>>
>> In any case, if you're building a maven streams project, did you modify it
>> to point to the staging repository at
>> https://repository.apache.org/content/groups/staging/ in addition to the
>> default repos? During rc it wouldn't fetch from maven central since it
>> hasn't been published there yet.
>>
>> If that is configured, more compete maven output would be helpful to track
>> down where it is failing to resolve the necessary archetype.
>>
>> -Ewen
>>
>> On Tue, Feb 13, 2018 at 3:03 AM, Damian Guy <damian@gmail.com> wrote:
>>
>> > Hi Ewen,
>> >
>> > I'm trying to run the streams quickstart and I'm getting:
>> > [ERROR] Failed to execute goal
>> > org.apache.maven.plugins:maven-archetype-plugin:3.0.1:generate
>> > (default-cli) on project standalone-pom: The desired archetype does not
>> > exist (org.apache.kafka:streams-quickstart-java:1.0.1)
>> >
>> > Something i'm missing?
>> >
>> > Thanks,
>> > Damian
>> >
>> > On Tue, 13 Feb 2018 at 10:16 Manikumar <manikumar.re...@gmail.com>
>> wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > ran quick-start, unit tests on the src.
>> > >
>> > >
>> > >
>> > > On Tue, Feb 13, 2018 at 5:31 AM, Ewen Cheslack-Postava <
>> > e...@confluent.io>
>> > > wrote:
>> > >
>> > > > Thanks for the heads up, I forgot to drop the old ones, I've done
>> that
>> > > and
>> > > > rc1 artifacts should be showing up now.
>> > > >
>> > > > -Ewen
>> > > >
>> > > >
>> > > > On Mon, Feb 12, 2018 at 12:57 PM, Ted Yu <yuzhih...@gmail.com>
>> wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > Ran test suite which passed.
>> > > > >
>> > > > > BTW it seems the staging repo hasn't been updated yet:
>> > > > >
>> > > > > https://repository.apache.org/content/groups/staging/org/
>> > > > > apache/kafka/kafka-clients/
>> > > > >
>> > > > > On Mon, Feb 12, 2018 at 10:16 AM, Ewen Cheslack-Postava <
>> > > > e...@confluent.io
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > And of course I'm +1 since I've already done normal release
>> > > validation
>> > > > > > before posting this.
>> > > > > >
>> > > > > > -Ewen
>> > > > > >
>> > > > > > On Mon, Feb 12, 2018 at 10:15 AM, Ewen Cheslack-Postava <
>> > > > > e...@confluent.io
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hello Kafka users, developers and client-developers,
>> > > > > > >
>> > > > > > > This is the second candidate for release of Apache Kafka
>> 1.0.1.
>> > > > > > >
>> > > > > > > This is a bugfix release for the 1.0 branch that was first
>> > released
>> > > > > with
>> > > > > > > 1.0.0 about 3 months ago. We've fixed 49 significant issues
>> since
>> > > > that
>> > > > > > > release. Most of these are non-critical, but in aggregate
>> these
>> > > fixes
>> > > > > > will
>> > > > > > > have significant impact. A few of the more significant fixes
>> > > include:
>> > > > > > >
>> > > > > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of
>> > > Connect
>> > > > > > > plugins
>> > > > > > > * KAFKA-6185: Selector memory leak with high likelihood of
>> OOM in
>> > > > case
>> > > > &g

Re: [VOTE] 1.0.1 RC1

2018-02-13 Thread Damian Guy
Thanks Ewen - i had the staging repo set up as profile that i forgot to add
to my maven command. All good.

On Tue, 13 Feb 2018 at 17:41 Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Damian,
>
> Which quickstart are you referring to? The streams quickstart only executes
> pre-built stuff afaict.
>
> In any case, if you're building a maven streams project, did you modify it
> to point to the staging repository at
> https://repository.apache.org/content/groups/staging/ in addition to the
> default repos? During rc it wouldn't fetch from maven central since it
> hasn't been published there yet.
>
> If that is configured, more compete maven output would be helpful to track
> down where it is failing to resolve the necessary archetype.
>
> -Ewen
>
> On Tue, Feb 13, 2018 at 3:03 AM, Damian Guy <damian@gmail.com> wrote:
>
> > Hi Ewen,
> >
> > I'm trying to run the streams quickstart and I'm getting:
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-archetype-plugin:3.0.1:generate
> > (default-cli) on project standalone-pom: The desired archetype does not
> > exist (org.apache.kafka:streams-quickstart-java:1.0.1)
> >
> > Something i'm missing?
> >
> > Thanks,
> > Damian
> >
> > On Tue, 13 Feb 2018 at 10:16 Manikumar <manikumar.re...@gmail.com>
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > ran quick-start, unit tests on the src.
> > >
> > >
> > >
> > > On Tue, Feb 13, 2018 at 5:31 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Thanks for the heads up, I forgot to drop the old ones, I've done
> that
> > > and
> > > > rc1 artifacts should be showing up now.
> > > >
> > > > -Ewen
> > > >
> > > >
> > > > On Mon, Feb 12, 2018 at 12:57 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Ran test suite which passed.
> > > > >
> > > > > BTW it seems the staging repo hasn't been updated yet:
> > > > >
> > > > > https://repository.apache.org/content/groups/staging/org/
> > > > > apache/kafka/kafka-clients/
> > > > >
> > > > > On Mon, Feb 12, 2018 at 10:16 AM, Ewen Cheslack-Postava <
> > > > e...@confluent.io
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > And of course I'm +1 since I've already done normal release
> > > validation
> > > > > > before posting this.
> > > > > >
> > > > > > -Ewen
> > > > > >
> > > > > > On Mon, Feb 12, 2018 at 10:15 AM, Ewen Cheslack-Postava <
> > > > > e...@confluent.io
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Kafka users, developers and client-developers,
> > > > > > >
> > > > > > > This is the second candidate for release of Apache Kafka 1.0.1.
> > > > > > >
> > > > > > > This is a bugfix release for the 1.0 branch that was first
> > released
> > > > > with
> > > > > > > 1.0.0 about 3 months ago. We've fixed 49 significant issues
> since
> > > > that
> > > > > > > release. Most of these are non-critical, but in aggregate these
> > > fixes
> > > > > > will
> > > > > > > have significant impact. A few of the more significant fixes
> > > include:
> > > > > > >
> > > > > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of
> > > Connect
> > > > > > > plugins
> > > > > > > * KAFKA-6185: Selector memory leak with high likelihood of OOM
> in
> > > > case
> > > > > of
> > > > > > > down conversion
> > > > > > > * KAFKA-6269: KTable state restore fails after rebalance
> > > > > > > * KAFKA-6190: GlobalKTable never finishes restoring when
> > consuming
> > > > > > > transactional messages
> > > > > > > * KAFKA-6529: Stop file descriptor leak when client disconnects
> > > with
> > > > > > > staged receives
> > > > > > >
> > > > > > > Release notes for the 1.0.1 release:
> > > > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/
> > RELEASE_NOTES.html
> > > > > > >
> > > > > > > *** Please download, test and vote by Thursday, Feb 15, 5pm PT
> > ***
> > > > > > >
> > > > > > > Kafka's KEYS file containing PGP keys we use to sign the
> release:
> > > > > > > http://kafka.apache.org/KEYS
> > > > > > >
> > > > > > > * Release artifacts to be voted upon (source and binary):
> > > > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/
> > > > > > >
> > > > > > > * Maven artifacts to be voted upon:
> > > > > > > https://repository.apache.org/content/groups/staging/
> > > > > > >
> > > > > > > * Javadoc:
> > > > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/javadoc/
> > > > > > >
> > > > > > > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > > > > > > https://github.com/apache/kafka/tree/1.0.1-rc1
> > > > > > >
> > > > > > > * Documentation:
> > > > > > > http://kafka.apache.org/10/documentation.html
> > > > > > >
> > > > > > > * Protocol:
> > > > > > > http://kafka.apache.org/10/protocol.html
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Ewen Cheslack-Postava
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] 1.0.1 RC1

2018-02-13 Thread Damian Guy
Hi Ewen,

I'm trying to run the streams quickstart and I'm getting:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-archetype-plugin:3.0.1:generate
(default-cli) on project standalone-pom: The desired archetype does not
exist (org.apache.kafka:streams-quickstart-java:1.0.1)

Something i'm missing?

Thanks,
Damian

On Tue, 13 Feb 2018 at 10:16 Manikumar  wrote:

> +1 (non-binding)
>
> ran quick-start, unit tests on the src.
>
>
>
> On Tue, Feb 13, 2018 at 5:31 AM, Ewen Cheslack-Postava 
> wrote:
>
> > Thanks for the heads up, I forgot to drop the old ones, I've done that
> and
> > rc1 artifacts should be showing up now.
> >
> > -Ewen
> >
> >
> > On Mon, Feb 12, 2018 at 12:57 PM, Ted Yu  wrote:
> >
> > > +1
> > >
> > > Ran test suite which passed.
> > >
> > > BTW it seems the staging repo hasn't been updated yet:
> > >
> > > https://repository.apache.org/content/groups/staging/org/
> > > apache/kafka/kafka-clients/
> > >
> > > On Mon, Feb 12, 2018 at 10:16 AM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > > >
> > > wrote:
> > >
> > > > And of course I'm +1 since I've already done normal release
> validation
> > > > before posting this.
> > > >
> > > > -Ewen
> > > >
> > > > On Mon, Feb 12, 2018 at 10:15 AM, Ewen Cheslack-Postava <
> > > e...@confluent.io
> > > > >
> > > > wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the second candidate for release of Apache Kafka 1.0.1.
> > > > >
> > > > > This is a bugfix release for the 1.0 branch that was first released
> > > with
> > > > > 1.0.0 about 3 months ago. We've fixed 49 significant issues since
> > that
> > > > > release. Most of these are non-critical, but in aggregate these
> fixes
> > > > will
> > > > > have significant impact. A few of the more significant fixes
> include:
> > > > >
> > > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of
> Connect
> > > > > plugins
> > > > > * KAFKA-6185: Selector memory leak with high likelihood of OOM in
> > case
> > > of
> > > > > down conversion
> > > > > * KAFKA-6269: KTable state restore fails after rebalance
> > > > > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> > > > > transactional messages
> > > > > * KAFKA-6529: Stop file descriptor leak when client disconnects
> with
> > > > > staged receives
> > > > >
> > > > > Release notes for the 1.0.1 release:
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote by Thursday, Feb 15, 5pm PT ***
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > http://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > > https://repository.apache.org/content/groups/staging/
> > > > >
> > > > > * Javadoc:
> > > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc1/javadoc/
> > > > >
> > > > > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > > > > https://github.com/apache/kafka/tree/1.0.1-rc1
> > > > >
> > > > > * Documentation:
> > > > > http://kafka.apache.org/10/documentation.html
> > > > >
> > > > > * Protocol:
> > > > > http://kafka.apache.org/10/protocol.html
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Ewen Cheslack-Postava
> > > > >
> > > >
> > >
> >
>


Re: question on serialization ..

2018-02-13 Thread Damian Guy
There is an overload `leftJoin(KTable, ValuJoiner, Joined)`

Joined is where you specify the Serde for the KTable and for the resulting
type. We don't need the Serde for the stream at this point as the value has
already been deserialized.

HTH,
Damian

On Tue, 13 Feb 2018 at 05:39 Debasish Ghosh 
wrote:

> Regarding “has an according overload” I agree. But some operators like
> reduce and leftJoin use the serdes implicitly and from the config. So if
> the developer is not careful enough to have the default serdes correct then
> it results in runtime error.
>
> Also one more confusion on my part is that in config we can give one serde
> for key and value. What happens if I have 2 leftJoin in my transformation
> that needs different serdes from config. There is no overload for leftJoin
> that allows me to provide a serde. Or am I missing something ?
>
> regards.
>
> On Tue, 13 Feb 2018 at 12:14 AM, Matthias J. Sax 
> wrote:
>
> > Each operator that needs to use a Serde, has a an according overload
> > method that allows you to overwrite the Serde. If you don't overwrite
> > it, the operator uses the Serde from the config.
> >
> > > If one gets the default
> > >> serializer wrong then she gets run time errors in serialization /
> > >> de-serialization (ClassCastException etc.)
> >
> > Default Serde are helpful if you use a generic format like Avro
> > thoughout the whole topology. If you have many different types, it might
> > be better to set default Serdes to `null` and set the Serde for each
> > operator individually.
> >
> >
> > -Matthias
> >
> > On 2/12/18 2:16 AM, Debasish Ghosh wrote:
> > > Thanks a lot for the clear answer.
> > >
> > > One of the concerns that I have is that it's not always obvious when
> the
> > > default serializers are used. e.g. it looks like KGroupedStream#reduce
> > also
> > > uses the default serializer under the hood. If one gets the default
> > > serializer wrong then she gets run time errors in serialization /
> > > de-serialization (ClassCastException etc.), which are quite hard to
> track
> > > down.
> > >
> > > On Mon, Feb 12, 2018 at 4:52 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> For stream-table-join, only the table is (de)serialized, the
> stream-side
> > >> in only piped through and does lookups into the table.
> > >>
> > >> And when reading the stream
> > >> (https://github.com/confluentinc/kafka-streams-
> > >> examples/blob/3.3.x/src/test/scala/io/confluent/examples/streams/
> > >> StreamToTableJoinScalaIntegrationTest.scala#L129)
> > >> the Serdes from the config are overwritten by parameters passed into
> > >> `#stream()`
> > >>
> > >> The default Serdes are used when reading/writing from/to a topic/store
> > >> (including repartition or changelog) and if the operator does not
> > >> overwrite the default Serdes via passed-in parameters.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 2/10/18 10:34 PM, Debasish Ghosh wrote:
> > >>> The inputs to the leftJoin are the stream with [String, Long] and the
> > >> table
> > >>> with [String, String]. Is the default serializer (I mean from the
> > config)
> > >>> used for [String, String] ? Then how does the [String, Long]
> > >> serialization
> > >>> work ?
> > >>>
> > >>> I guess the basic issue that I am trying to understand is how the
> > default
> > >>> serialisers (stringSerde, stringSerde) registered in config used for
> > >>> serialising the inputs of leftJoin ..
> > >>>
> > >>> regards.
> > >>>
> > >>> On Sun, 11 Feb 2018 at 8:53 AM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > >>> wrote:
> > >>>
> >  userClicksJoinRegion is never serialized...
> > 
> >  It the result of the join and the join only (de)serializes its input
> > in
> >  the internal stores.
> > 
> >  The output it forwarded in-memory to a consecutive map and return
> >  `clicksByRegion` that is [String,Long].
> > 
> > 
> >  -Matthias
> > 
> >  On 2/10/18 1:17 PM, Ted Yu wrote:
> > > Please read the javadoc:
> > >
> >  https://github.com/apache/kafka/blob/trunk/streams/src/
> > >> main/java/org/apache/kafka/streams/Consumed.java
> > >
> > > and correlate with the sample code.
> > >
> > > Thanks
> > >
> > > On Sat, Feb 10, 2018 at 1:10 PM, Debasish Ghosh <
> >  ghosh.debas...@gmail.com>
> > > wrote:
> > >
> > >> Looking at
> > >> https://github.com/confluentinc/kafka-streams-
> > >> examples/blob/3.3.x/src/test/scala/io/confluent/examples/streams/
> > >> StreamToTableJoinScalaIntegrationTest.scala#L148,
> > >> it seems that the leftJoin generates a KStream[String, (String,
> > >> Long)],
> > >> which means the value is a tuple of (String, Long) .. I am not
> able
> > to
> >  get
> > >> how this will serialize/de-serialize with the default serializers
> > >> which
> >  are
> > >> both stringSerde for keys and values.
> > >>
> > 

Re: Kafka Stream tuning.

2018-02-13 Thread Damian Guy
Hi Brilly,

My initial guess is that it is the overhead of committing. Commit is
synchronous and you have the commit interval set to 50ms. Perhaps try
increasing it.

Thanks,
Damian

On Tue, 13 Feb 2018 at 07:49 TSANG, Brilly 
wrote:

> Hi kafka users,
>
> I created a filtering stream with the Processor API;  input topic that
> have input rate at ~5 records per millisecond.  The filtering function on
> average takes 0.05milliseconds to complete which in ideal case would
> translate to (1/0.05)  20 records per millisecond.  However, when I
> benchmark the whole process, the streams is only processing 0.05 record per
> milliseconds.
>
> Anyone have any idea on how to tune the steaming system to be faster as
> 0.05 record is very far away from the theoretical max of 20?  The results
> above are per partition based where I have 16 partition for the input topic
> and all partitions have similar throughput.
>
> I've only set the streams to have the following config:
> Properties config = new Properties();
> config.put(StreamsConfig.APPLICATION_ID_CONFIG, appId);
> config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrap);
> config.put(StreamsConfig.STATE_DIR_CONFIG, stateDir);
> config.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 50);
>
> I'm not defining TimeExtractor so the default one is used.
>
> Thanks for any help in advance.
>
> Regards,
> Brilly
>
> 
>
>
> 
> DISCLAIMER:
> This email and any attachment(s) are intended solely for the person(s)
> named above, and are or may contain information of a proprietary or
> confidential nature. If you are not the intended recipient(s), you should
> delete this message immediately. Any use, disclosure or distribution of
> this message without our prior consent is strictly prohibited.
> This message may be subject to errors, incomplete or late delivery,
> interruption, interception, modification, or may contain viruses. Neither
> Daiwa Capital Markets Hong Kong Limited, its subsidiaries, affiliates nor
> their officers or employees represent or warrant the accuracy or
> completeness, nor accept any responsibility or liability whatsoever for any
> use of or reliance upon, this email or any of the contents hereof. The
> contents of this message are for information purposes only, and subject to
> change without notice.
> This message is not and is not intended to be an offer or solicitation to
> buy or sell any securities or financial products, nor does any
> recommendation, opinion or advice necessarily reflect those of Daiwa
> Capital Markets Hong Kong Limited, its subsidiaries or affiliates.
>
> 
>


Re: One type of event per topic?

2018-01-18 Thread Damian Guy
This might be a good read for you:
https://www.confluent.io/blog/put-several-event-types-kafka-topic/

On Thu, 18 Jan 2018 at 20:57 Maria Pilar  wrote:

> Hi everyone,
>
> I´m working in the configuration of the topics for the integration between
> one API and Data platform system. We have created topic for each entity
> that they would need to integrate in to the datawarehouse.
>
>
> My question and I hope you can help me is, each entity will have diferent
> type of events, for example to create and entity or update entity, and I´m
> not sure if create one event per topic? or perhaps shared differents events
> per topic, but I think that this option will have more complexity
>
> Thanks a lot
>


Re: [VOTE] KIP-247: Add public test utils for Kafka Streams

2018-01-18 Thread Damian Guy
+1

On Thu, 18 Jan 2018 at 15:14 Bill Bejeck  wrote:

> Thanks for the KIP.
>
> +1
>
> -Bill
>
> On Wed, Jan 17, 2018 at 9:09 PM, Matthias J. Sax 
> wrote:
>
> > Hi,
> >
> > I would like to start the vote for KIP-247:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 247%3A+Add+public+test+utils+for+Kafka+Streams
> >
> >
> > -Matthias
> >
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-18 Thread Damian Guy
Congratulations Rajini!

On Thu, 18 Jan 2018 at 00:57 Hu Xi  wrote:

> Congratulations, Rajini Sivaram.  Very well deserved!
>
>
> 
> 发件人: Konstantine Karantasis 
> 发送时间: 2018年1月18日 6:23
> 收件人: d...@kafka.apache.org
> 抄送: users@kafka.apache.org
> 主题: Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram
>
> Congrats Rajini!
>
> -Konstantine
>
> On Wed, Jan 17, 2018 at 2:18 PM, Becket Qin  wrote:
>
> > Congratulations, Rajini!
> >
> > On Wed, Jan 17, 2018 at 1:52 PM, Ismael Juma  wrote:
> >
> > > Congratulations Rajini!
> > >
> > > On 17 Jan 2018 10:49 am, "Gwen Shapira"  wrote:
> > >
> > > Dear Kafka Developers, Users and Fans,
> > >
> > > Rajini Sivaram became a committer in April 2017.  Since then, she
> > remained
> > > active in the community and contributed major patches, reviews and KIP
> > > discussions. I am glad to announce that Rajini is now a member of the
> > > Apache Kafka PMC.
> > >
> > > Congratulations, Rajini and looking forward to your future
> contributions.
> > >
> > > Gwen, on behalf of Apache Kafka PMC
> > >
> >
>


Re: [ANNOUNCE] New committer: Matthias J. Sax

2018-01-12 Thread Damian Guy
Can't think of anyone me deserving! Congratulations Matthias!
On Sat, 13 Jan 2018 at 00:17, Ismael Juma  wrote:

> Congratulations Matthias!
>
> On 12 Jan 2018 10:59 pm, "Guozhang Wang"  wrote:
>
> > Hello everyone,
> >
> > The PMC of Apache Kafka is pleased to announce Matthias J. Sax as our
> > newest Kafka committer.
> >
> > Matthias has made tremendous contributions to Kafka Streams API since
> early
> > 2016. His footprint has been all over the places in Streams: in the past
> > two years he has been the main driver on improving the join semantics
> > inside Streams DSL, summarizing all their shortcomings and bridging the
> > gaps; he has also been largely working on the exactly-once semantics of
> > Streams by leveraging on the transaction messaging feature in 0.11.0. In
> > addition, Matthias have been very active in community activity that goes
> > beyond mailing list: he's getting the close to 1000 up votes and 100
> > helpful flags on SO for answering almost all questions about Kafka
> Streams.
> >
> > Thank you for your contribution and welcome to Apache Kafka, Matthias!
> >
> >
> >
> > Guozhang, on behalf of the Apache Kafka PMC
> >
>


Re: Broker won't exit...

2018-01-10 Thread Damian Guy
Did you stop the broker before stoping zookeeper?

On Wed, 10 Jan 2018 at 10:38 Ted Yu  wrote:

> I think that is the default signal.
> From the script:
>
> SIGNAL=${SIGNAL:-TERM}
>
> FYI
>
> On Wed, Jan 10, 2018 at 2:35 AM, Sam Pegler <
> sam.peg...@infectiousmedia.com>
> wrote:
>
> > Have you tried a normal kill (sigterm) against the java process?
> >
> > __
> >
> > Sam Pegler
> >
> > PRODUCTION ENGINEER
> >
> > T. +44(0) 07 562 867 486
> >
> > 
> > 3-7 Herbal Hill / London / EC1R 5EJ
> > www.infectiousmedia.com
> >
> > This email and any attachments are confidential and may also be
> privileged.
> > If you
> > are not the intended recipient, please notify the sender immediately, and
> > do not
> > disclose the contents to another person, use it for any purpose, or
> store,
> > or copy
> > the information in any medium. Please also destroy and delete the message
> > from
> > your computer.
> >
> >
> > On 9 January 2018 at 22:44, Skip Montanaro 
> > wrote:
> >
> > > I only discovered the kafka-server-stop.sh script a couple days ago. I
> > > can't seem to make it do its thing (the corresponding zookeeper stop
> > > script seems to work just fine). All consumers have been stopped. Lsof
> > > still shows the Kafka broker process listening on its port. The last
> > > connection left the CLOSE_WAIT state several minutes ago. Gstack shows
> > > 169 threads, most in pthread_cond_wait(), a handful in other wait-like
> > > functions (sem_wait, pthread_join, pthread_cond_timedwait, poll,
> > > epoll_wait). I'm running 2.11-1.0.0 on a Red Hat 6 server.
> > >
> > > What does it take to get a broker to exit (short of kill -9)?
> > >
> > > Thx,
> > >
> > > Skip Montanaro
> > >
> >
>


Re: Kafka Streams | Impact on rocksdb stores by Rebalancing

2018-01-09 Thread Damian Guy
Hi,

yes partition assignment is aware of the standby replicas. It will try and
assign tasks to the nodes that have the state for the task, but also will
try and keep the assignment balanced.
So the assignment will be more like your second assignment. If you are
interested you can have a look at:
https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java


On Tue, 9 Jan 2018 at 11:44 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> Hi Damian,
>
> Thanks for your reply. I have some further ques.
>
> Would the partition assignment be aware of the standby replicas. What would
> be the preference for task distribution: load balancing or stand by
> replicas.
>
> For e.g
>
> N1
> assigned partitions: 1,2
> standby partitions: 5,6
>
> N2
> assigned partitions: 3,4
> standby partitions: 1,2
>
> N3
> assigned partitions: 5,6
> standby partitions: 3,4
>
> After N1 goes down, what would be the state of the cluster
>
> N2
> assigned partitions: 3,4,1,2
> standby partitions: 5,6
>
> N3
> assigned partitions: 5,6
> standby partitions: 3,4,1,2
>
> Or
>
> N2
> assigned partitions: 3,4,1
> standby partitions: 2,5,6
>
> N3
> assigned partitions: 5,6,2
> standby partitions: 1,3,4
>
> -Sameer.
>
> On Tue, Jan 9, 2018 at 2:27 PM, Damian Guy <damian@gmail.com> wrote:
>
> > On Tue, 9 Jan 2018 at 07:42 Sameer Kumar <sam.kum.w...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to understand how does rebalance affect state stores
> > > migration. If I have a cluster of 3 nodes, and 1 goes down, the
> > partitions
> > > for node3 gets assigned to node1 and node2, does the rocksdb on
> > node1/node2
> > > also starts updating its store from changelog topic.
> > >
> > >
> > Yes the stores will be migrated to node1 and node2 and they will be
> > restored from the changelog topic
> >
> >
> > > If yes, then what impact would this migration process have on querying.
> > >
> >
> > You can't query the stores until they have all been restored and the
> > rebalance ends.
> >
> > >
> > > Also, if the state store restoration process takes time, how to make
> sure
> > > another rebalance doesn''t happen.
> > >
> > >
> > If you don't lose any more nodes then another rebalance won't happen. If
> > node1 comes back online, then there will be another rebalance, however
> the
> > time taken shouldn't be as long as it will already have most of the state
> > locally, so it only needs to catch up with the remainder of the
> changelog.
> > Additionally, you should run with standby tasks. They are updated in the
> > background and will mean that in the event of failure the other nodes
> > should already have most of the state locally, so the restoration process
> > won't take so long
> >
> >
> > > -Sameer.
> > >
> >
>


Re: Less poll interval on StoreChangelogReader

2018-01-09 Thread Damian Guy
State Store restoration is done on the same thread as processing. It is
actually interleaved with processing, so we keep the poll time small so
that if there is no data immediately available we can continue to process
data from other running tasks.

On Tue, 9 Jan 2018 at 08:03 Sameer Kumar  wrote:

> In StoreChangelogReader.restore, we have a very short poll interval of 10
> ms. Any specfic reasons for the same.
>
> -Sameer.
>


Re: Kafka Streams | Impact on rocksdb stores by Rebalancing

2018-01-09 Thread Damian Guy
On Tue, 9 Jan 2018 at 07:42 Sameer Kumar  wrote:

> Hi,
>
> I would like to understand how does rebalance affect state stores
> migration. If I have a cluster of 3 nodes, and 1 goes down, the partitions
> for node3 gets assigned to node1 and node2, does the rocksdb on node1/node2
> also starts updating its store from changelog topic.
>
>
Yes the stores will be migrated to node1 and node2 and they will be
restored from the changelog topic


> If yes, then what impact would this migration process have on querying.
>

You can't query the stores until they have all been restored and the
rebalance ends.

>
> Also, if the state store restoration process takes time, how to make sure
> another rebalance doesn''t happen.
>
>
If you don't lose any more nodes then another rebalance won't happen. If
node1 comes back online, then there will be another rebalance, however the
time taken shouldn't be as long as it will already have most of the state
locally, so it only needs to catch up with the remainder of the changelog.
Additionally, you should run with standby tasks. They are updated in the
background and will mean that in the event of failure the other nodes
should already have most of the state locally, so the restoration process
won't take so long


> -Sameer.
>


Re: Topic segments being deleted unexpectedly

2017-12-15 Thread Damian Guy
I believe that just controls when the segment gets deleted from disk. It is
removed from memory before that. So i don't believe that will help.

On Fri, 15 Dec 2017 at 13:54 Wim Van Leuven <wim.vanleu...@highestpoint.biz>
wrote:

> So, in our setup, to provide the historic data on the platform, we would
> have to define all topics with a retention period of the business time we
> want to keep the data. However, on the intermediate topics, we would only
> require the data to be there as long as necessary to be able to process the
> data.
>
> Could we achieve this result by increasing the log.segment.delete.delay.ms
> to e.g. 1d? Would this give us a timeframe of a day to process the data on
> the intermediary topics? Or is this just wishful thinking?
>
> Thanks again!
> -wim
>
> On Fri, 15 Dec 2017 at 14:23 Wim Van Leuven <
> wim.vanleu...@highestpoint.biz>
> wrote:
>
> > Is it really? I checked some records on kafka topics using commandline
> > consumers to print key and timestamps and timestamps was logged as
> > CreateTime:1513332523181
> >
> > But that would explain the issue. I'll adjust the retention on the topic
> > and rerun.
> >
> > Thank you already for the insights!
> > -wim
> >
> > On Fri, 15 Dec 2017 at 14:08 Damian Guy <damian@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> It is likely due to the timestamps you are extracting and using as the
> >> record timestamp. Kafka uses the record timestamps for retention. I
> >> suspect
> >> this is causing your segments to roll and be deleted.
> >>
> >> Thanks,
> >> Damian
> >>
> >> On Fri, 15 Dec 2017 at 11:49 Wim Van Leuven <
> >> wim.vanleu...@highestpoint.biz>
> >> wrote:
> >>
> >> > Hello all,
> >> >
> >> > We are running some Kafka Streams processing apps over Confluent OS
> >> > (v3.2.0) and I'm seeing unexpected but 'consitent' behaviour regarding
> >> > segment and index deletion.
> >> >
> >> > So, we have a topic 'input' that contains about 30M records to
> ingest. A
> >> > 1st processor transforms and pipes the data onto a second,
> intermediate
> >> > topic. A 2nd processor picks up the records, treats them and sends
> them
> >> > out.
> >> >
> >> > On our test environment the intermediate topic was set up with a
> >> retention
> >> > of 1 hour because we don't need to keep the data, only while
> processing.
> >> >
> >> > On a test run we saw the 2nd processor exit with exceptions that it
> >> > couldn't read offsets. We do not automatically reset because it should
> >> not
> >> > happen.
> >> >
> >> > org.apache.kafka.streams.errors.StreamsException: No valid committed
> >> offset
> >> > found for input topic cdr-raw-arch (partition 1) and no valid reset
> >> policy
> >> > configured. You need to set configuration parameter
> "auto.offset.reset"
> >> or
> >> > specify a topic specific reset policy via
> >> > KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...)
> or
> >> > KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...)
> >> >
> >> > As we thought that it's the topic data expiring (processing takes
> longer
> >> > than 1 hour) we changed the topic to retain the data for 1 day.
> >> >
> >> > On rerun, we however saw exactly the same behaviour. That's why I'm
> >> saying
> >> > 'consistent behaviour' above.
> >> >
> >> > In the server logs, we see that kafka is rolling segments but
> >> immediately
> >> > scheduling them for deletion.
> >> >
> >> > [2017-12-15 11:01:46,992] INFO Rolled new log segment for
> >> > 'cdr-raw-arch-1' in 1 ms. (kafka.log.Log)
> >> > [2017-12-15 11:01:46,993] INFO Scheduling log segment 7330185 for log
> >> > cdr-raw-arch-1 for deletion. (kafka.log.Log)
> >> > [2017-12-15 11:01:46,995] INFO Rolled new log segment for
> >> > 'cdr-raw-arch-0' in 2 ms. (kafka.log.Log)
> >> > [2017-12-15 11:01:46,995] INFO Scheduling log segment 7335872 for log
> >> > cdr-raw-arch-0 for deletion. (kafka.log.Log)
> >> > [2017-12-15 11:02:46,995] INFO Deleting segment 7330185 from log
> >> > cdr-raw-arch-1. (kafka.log.Log)
> >> > [2017-12-15 11:02:46,996] INFO Deleting segment 7335872 from log
> >> &g

Re: Topic segments being deleted unexpectedly

2017-12-15 Thread Damian Guy
Hi,

It is likely due to the timestamps you are extracting and using as the
record timestamp. Kafka uses the record timestamps for retention. I suspect
this is causing your segments to roll and be deleted.

Thanks,
Damian

On Fri, 15 Dec 2017 at 11:49 Wim Van Leuven 
wrote:

> Hello all,
>
> We are running some Kafka Streams processing apps over Confluent OS
> (v3.2.0) and I'm seeing unexpected but 'consitent' behaviour regarding
> segment and index deletion.
>
> So, we have a topic 'input' that contains about 30M records to ingest. A
> 1st processor transforms and pipes the data onto a second, intermediate
> topic. A 2nd processor picks up the records, treats them and sends them
> out.
>
> On our test environment the intermediate topic was set up with a retention
> of 1 hour because we don't need to keep the data, only while processing.
>
> On a test run we saw the 2nd processor exit with exceptions that it
> couldn't read offsets. We do not automatically reset because it should not
> happen.
>
> org.apache.kafka.streams.errors.StreamsException: No valid committed offset
> found for input topic cdr-raw-arch (partition 1) and no valid reset policy
> configured. You need to set configuration parameter "auto.offset.reset" or
> specify a topic specific reset policy via
> KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...) or
> KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...)
>
> As we thought that it's the topic data expiring (processing takes longer
> than 1 hour) we changed the topic to retain the data for 1 day.
>
> On rerun, we however saw exactly the same behaviour. That's why I'm saying
> 'consistent behaviour' above.
>
> In the server logs, we see that kafka is rolling segments but immediately
> scheduling them for deletion.
>
> [2017-12-15 11:01:46,992] INFO Rolled new log segment for
> 'cdr-raw-arch-1' in 1 ms. (kafka.log.Log)
> [2017-12-15 11:01:46,993] INFO Scheduling log segment 7330185 for log
> cdr-raw-arch-1 for deletion. (kafka.log.Log)
> [2017-12-15 11:01:46,995] INFO Rolled new log segment for
> 'cdr-raw-arch-0' in 2 ms. (kafka.log.Log)
> [2017-12-15 11:01:46,995] INFO Scheduling log segment 7335872 for log
> cdr-raw-arch-0 for deletion. (kafka.log.Log)
> [2017-12-15 11:02:46,995] INFO Deleting segment 7330185 from log
> cdr-raw-arch-1. (kafka.log.Log)
> [2017-12-15 11:02:46,996] INFO Deleting segment 7335872 from log
> cdr-raw-arch-0. (kafka.log.Log)
> [2017-12-15 11:02:47,170] INFO Deleting index
> /data/4/kafka/cdr-raw-arch-1/07330185.index.deleted
> (kafka.log.OffsetIndex)
> [2017-12-15 11:02:47,171] INFO Deleting index
> /data/4/kafka/cdr-raw-arch-1/07330185.timeindex.deleted
> (kafka.log.TimeIndex)
> [2017-12-15 11:02:47,172] INFO Deleting index
> /data/3/kafka/cdr-raw-arch-0/07335872.index.deleted
> (kafka.log.OffsetIndex)
> [2017-12-15 11:02:47,173] INFO Deleting index
> /data/3/kafka/cdr-raw-arch-0/07335872.timeindex.deleted
> (kafka.log.TimeIndex)
>
>
> However, I do not understand the behaviour: Why is kafka deleting the data
> on the intermediary topic before it got processed? Almost immediately even?
>
> We do use timestamp extractors to pull business time from the records. Is
> that taken into account for retention time? Or is retention only based on
> times of the files on disk?
>
> Thank you to shed any light on this problem!
>
> Kind regards!
> -wim
>


Re: Joins in Kafka Streams and partitioning of the topics

2017-11-22 Thread Damian Guy
Hi Artur,

KafkaStreams 0.10.0.0 is quite old and a lot has changed and been fixed
since then. If possible i'd recommend upgrading to at least 0.11.0.2 or 1.0.
For joins you need to ensure that the topics have the same number of
partitions (which they do) and that they are keyed the same.

Thanks,
Damian

On Wed, 22 Nov 2017 at 11:02 Artur Mrozowski  wrote:

> Hi,
> I am joining 4 different topic with 4 partitions each using 0.10.0.0
> version of Kafka Streams.  The joins are KTable to KTable. Is there
> anything I should be aware of considering partitions or version of Kafka
> Streams? In other words should I be expecting consistent results or do I
> need to for example use Global tables.
>
> I'd like to run that application on Kubernetes later on. Should I think of
> anything or do different instances of the same Kafka Streams application
> take care of management of the state?
>
> Grateful for any thoughts or a piece of advice
>
> Best Regards
> /Artur
>


Re: How to set result value Serdes Class in Kafka stream join

2017-11-16 Thread Damian Guy
Hi,

You don't need to set the serde until you do another operation that
requires serialization, i.e., if you followed the join with a `to()`,
`groupBy()` etc, you would pass in the serde to that operation.

Thanks,
Damian

On Thu, 16 Nov 2017 at 10:53 sy.pan  wrote:

> Hi, all:
>
> Recently I have read kafka streams join document(
> https://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl
> <
> https://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl>).
> The sample code is pasted below:
>
> import java.util.concurrent.TimeUnit;
> KStream left = ...;
> KStream right = ...;
> // Java 7 example
> KStream joined = left.join(right,
> new ValueJoiner() {
>   @Override
>   public String apply(Long leftValue, Double rightValue) {
> return "left=" + leftValue + ", right=" + rightValue;
>   }
> },
> JoinWindows.of(TimeUnit.MINUTES.toMillis(5)),
> Serdes.String(), /* key */
> Serdes.Long(),   /* left value */
> Serdes.Double()  /* right value */
>   );
>
> so the question is :
>
> 1) which parameter is used for setting result value(returned by
> ValueJoiner) Serdes class ?
> the sample code only set key , left value and right value Serdes class.
>
> 2) if ValueJoiner return customer value type,  how to set the result value
> Serdes class ?
>
>
>
>


Re: Problem with KGroupedStream.count in 1.0.0

2017-11-15 Thread Damian Guy
Yes, right, that would be because the default serializer is set to bytes.
Sorry i should have spotted that. Your Materialized should look something
like:

Materialized.as[String, java.lang.Long, KeyValueStore[Bytes,
Array[Byte]]](ACCESS_COUNT_PER_HOST_STORE)
   .withKeySerde(Serdes.String())

Thanks,
Damian


On Wed, 15 Nov 2017 at 10:51 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> It's not working fine .. I get the following exception during runtime ..
>
> Exception in thread
>> "kstream-weblog-processing-c37a3bc1-31cc-4ccc-8427-d51314802f64-StreamThread-1"
>> java.lang.ClassCastException: java.lang.String cannot be cast to [B
>> at
>> org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
>> at org.apache.kafka.streams.state.StateSerdes.rawKey(StateSerdes.java:168)
>> at
>> org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerKey(MeteredKeyValueBytesStore.java:60)
>> at
>> org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerKey(MeteredKeyValueBytesStore.java:57)
>> at
>> org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.get(InnerMeteredKeyValueStore.java:184)
>> at
>> org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.get(MeteredKeyValueBytesStore.java:116)
>> at
>> org.apache.kafka.streams.kstream.internals.KStreamAggregate$KStreamAggregateProcessor.process(KStreamAggregate.java:70)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
>> at
>> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:85)
>> at
>> org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
>> at
>> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:216)
>> at
>> org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:403)
>> at
>> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:317)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:942)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:822)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
>
>
> Only when I change the key of the first stream to Array[Byte], things
> work ok .. like this ..
>
> val hosts: KStream[Array[Byte], Array[Byte]] = logRecords.mapValues(record
> => record.host.getBytes("UTF-8"))
>
> regards.
>
> On Wed, Nov 15, 2017 at 4:07 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Hi,
>>
>> That shouldn't be a problem, the inner most store is of type
>> `KeyValueStore<Bytes, byte[]>`, however the outer store will be
>> `KeyValueStore<String, Long>`.
>> It should work fine.
>>
>> Thanks,
>> Damian
>>
>> On Wed, 15 Nov 2017 at 08:37 Debasish Ghosh <ghosh.debas...@gmail.com>
>> wrote:
>>
>>> Hello -
>>>
>>> In my Kafka Streams 0.11 application I have the following transformation
>>> ..
>>>
>>> val hosts: KStream[Array[Byte], String] = logRecords.mapValues(record
>>> => record.host)
>>>
>>> // we are changing the key here so that we can do a groupByKey later
>>> val hostPairs: KStream[String, String] = hosts.map ((_, value) => new
>>> KeyValue(value, value))
>>>
>>> // keys have changed - hence need new serdes
>>> val groupedStream: KGroupedStream[String, String] =
>>> hostPairs.groupByKey(stringSerde,
>>> stringSerde)
>>>
>>> val counts: KTable[String, java.lang.Long] =
>>> groupedStream.count(ACCESS_COUNT_PER_HOST_STORE)
>>>
>>> Now in 1.0.0, this variant of count on KGroupedStream has been deprecated
>>> and the one that is introduced takes only KeyValueStore of Array[Byte] ..
>>>
>>> KTable<K,java.lang.Long>
>>> >
>>> count(Materialized<K,java.lang.Long,KeyValueStore<org.apache.kafka.common.utils.Bytes,byte[]>>
>>> > materialized)
>>> > Count the number of records in this str

Re: Kafka Streams CoGroup

2017-11-13 Thread Damian Guy
Hi,

This KIP didn't make it into 1.0, so it can't be done at the moment.

Thanks,
Damian

On Mon, 13 Nov 2017 at 14:00 Artur Mrozowski  wrote:

> Hi,
> I wonder if anyone could shed some light on how to implement CoGroup in
> Kafka Streams in currrent version 1.0, as mentioned in this blog post
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-150+-+Kafka-Streams+Cogroup
> .
>
> I am new to Kafka and would appreciate if anyone could provide an example
> on how this can be done in little more detail.
>
> Thanks
>
> /Artur
>


Re: kafka streams with multiple threads and state store

2017-11-10 Thread Damian Guy
Hi Ranjit, it sounds like you might want to use a global table for this.
You can use StreamsBuilder#globalTable(String, Materialized) to create the
global table. You could do something like:

KeyValueBytesStoreSupplier supplier =
Stores.inMemoryKeyValueStore("global-store");
Materialized>
materialized = Materialized.as(supplier);
builder.globalTable("topic",
materialized.withKeySerde(Serdes.String()).withValueSerde(Serdes.String()));


On Fri, 10 Nov 2017 at 09:24 Ranjit Kumar  wrote:

> Hi Guozhang,
>
> Thanks for the information.
>
> My requirement is some thing like this.
>
> 1. i want to read the data from one topic (which is continuously feeding),
> so i though of using the kafka streams with threads
> 2. want to store the data in one in memory data base (not the local data
> store per thread)
>
> If i have to write my own Statestore logic with handling of synchronization
> is it equal to having my own global data structure in all threads ?
>
> Any performance impact will be their with our own sync ? Can you pelase
> share if you have any sample programs or links describing on this .
>
> Thanks & Regards,
> Ranjit
>
> On Fri, Nov 10, 2017 at 4:38 AM, Guozhang Wang  wrote:
>
> > Ranjit,
> >
> > Note that the "testStore" instance you are passing is a
> StateStoreSupplier
> > which will generate a new StateStore instance for each thread's task.
> >
> > If you really want to have all the thread's share the same state store
> you
> > should implement your own StateStoreSupplier that only return the same
> > StateStore instance in its "get()" call; however, keep in mind that in
> this
> > case this state store could be concurrently accessed by multi-threads
> which
> > is not protected by the library itself (by default single-thread access
> is
> > guaranteed on the state stores).
> >
> >
> > Guozhang
> >
> > On Thu, Nov 9, 2017 at 2:51 AM, Ranjit Kumar 
> wrote:
> >
> > > Hi All,
> > >
> > > I want to use one state store in all my kafka stream threads in my
> > > application, how can i do it.
> > >
> > > 1. i created one topic (name: test2) with 3 partitions .
> > > 2. wrote kafka stream with num.stream.threads = 3 in java code
> > > 3. using state store (name: count2) in my application.
> > >
> > > But state store (count2) is acting like local to thread, but it should
> be
> > > unique to entire application and the same value to be reflected every
> > where
> > > how can i do it ?
> > >
> > > Do i need to take care any synch also ?
> > >
> > > Code:
> > > 
> > > package com.javatpoint;
> > > import org.apache.kafka.common.serialization.Serdes;
> > > import org.apache.kafka.streams.KafkaStreams;
> > > import org.apache.kafka.streams.StreamsConfig;
> > > import org.apache.kafka.streams.processor.Processor;
> > > import org.apache.kafka.streams.processor.ProcessorContext;
> > > import org.apache.kafka.streams.processor.StateStoreSupplier;
> > > import org.apache.kafka.streams.processor.TopologyBuilder;
> > > import org.apache.kafka.streams.state.Stores;
> > >
> > > import org.apache.kafka.streams.kstream.KStreamBuilder;
> > > import org.apache.kafka.streams.processor.StateStoreSupplier;
> > > import org.apache.kafka.streams.state.KeyValueStore;
> > >
> > > import java.util.Properties;
> > > import java.lang.*;
> > >
> > > /**
> > >  * Hello world!
> > >  *
> > >  */
> > > public class App
> > > {
> > > public static void main( String[] args )
> > > {
> > > /*StateStoreSupplier testStore = Stores.create("count2")
> > > .withKeys(Serdes.String())
> > > .withValues(Serdes.Long())
> > > .persistent()
> > > .build();*/
> > > StateStoreSupplier testStore = Stores.create("count2")
> > > .withStringKeys()
> > > .withLongValues()
> > > .persistent()
> > > .build();
> > >
> > > //TopologyBuilder builder = new TopologyBuilder();
> > > final KStreamBuilder builder = new KStreamBuilder();
> > >
> > > builder.addSource("source", "test2").addProcessor("process",
> > > TestProcessor::new, "source").addStateStore(testStore, "process");
> > >
> > > Properties props = new Properties();
> > > props.put(StreamsConfig.APPLICATION_ID_CONFIG, "app1");
> > > props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
> > > "localhost:9092");
> > > props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> > > Serdes.String().getClass());
> > > props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> > > Serdes.String().getClass());
> > > //props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> > > Serdes.ByteArray().getClass().getName());
> > > //props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> > > Serdes.ByteArray().getClass().getName());
> > >
> > > props.put("auto.offset.reset", "latest");
> > > 

Re: WordCount Example using GlobalKStore

2017-11-01 Thread Damian Guy
Count will always use a StateStore, but if you want you can use an InMemory
store if you don't want a persistent store. You can do this by using the
overloaded `count(StateStoreSupplier)` method. You would use
`Stores.create(name).inMemory()...` to create the inmemory store

On Wed, 1 Nov 2017 at 11:22 pravin kumar  wrote:

> i have created 3 inputtopics with 10 partitions each and output Topic with
> 10 partitions
>
> I did wordcount example  and stored it in GlobalKTable.
> i initally stored counted value  in LocalStateStore and then it to
> GlobalStateStore.
>
> i have atteated the code here:
> https://gist.github.com/Pk007790/d46236b1b5c394301f27b96891a94584
>
> and i have supplied the inputs to the producers like this
> :https://gist.github.com/Pk007790/ba934b7bcea42b8b05f4816de3cb84a0
>
> my ques is:how to store the processed information in GlobalStateStore
> without localStateStore
>


Re: regarding number of Stream Tasks

2017-10-31 Thread Damian Guy
Hi, the `map` when it is followed by `groupByKey` will cause a
repartitioning of the data, so you will have your 5 tasks processing the
input partitions and 5 tasks processing the partitions from the
repartitioning.

On Tue, 31 Oct 2017 at 10:56 pravin kumar <pk007...@gmail.com> wrote:

>  I have created a stream with topic contains 5 partitions and expected to
> create 5 stream tasks ,i got 10 tasks as
> 0_0  0_1  0_2  0_3  0_4  1_0  1_1  1_2  1_3  1_4
>
>
> im doing wordcount in this example,
>
> here is my topology in this link: 1.
> https://gist.github.com/Pk007790/72b0718f26e6963246e83da992b3e725
> 2.https://gist.github.com/Pk007790/a05226007ca90cdd36c362d09d19bda6.
>
> On Tue, Oct 24, 2017 at 3:29 PM, Damian Guy <damian@gmail.com> wrote:
>
> > It would depend on what your topology looks like, which you haven't show
> > here. But if there may be internal topics generated due to repartitioning
> > which would cause the extra tasks.
> > If you provide the topology we would be able to tell you.
> > Thanks,
> > Damian
> >
> > On Tue, 24 Oct 2017 at 10:14 pravin kumar <pk007...@gmail.com> wrote:
> >
> > > I have created a stream with topic contains 5 partitions and expected
> to
> > > create 5 stream tasks ,i got 10 tasks as
> > > 0_0  0_1  0_2  0_3  0_4  1_0  1_1  1_2  1_3  1_4
> > >
> > >
> > > my doubt is:im expected to have 5 tasks how it produced 10 tasks
> > >
> > > here are some logs:
> > > [2017-10-24 10:27:35,284] INFO Kafka commitId :
> > > cb8625948210849f (org.apache.kafka.common.utils.AppInfoParser)
> > > [2017-10-24 10:27:35,284] DEBUG Kafka consumer created
> > > (org.apache.kafka.clients.consumer.KafkaConsumer)
> > > [2017-10-24 10:27:35,304] INFO stream-thread
> > >
> > > [SingleConsumerMultiConsumerUsingStreamx4-4dc5b303-62b4-4898-
> > 9d8f-a1a9a8adfb7d-StreamThread-1]
> > > State transition from CREATED to RUNNING.
> > > (org.apache.kafka.streams.processor.internals.StreamThread)
> > > [2017-10-24 10:27:35,306] DEBUG stream-client
> > >
> > > [SingleConsumerMultiConsumerUsingStreamx4-4dc5b303-62b4-4898-
> > 9d8f-a1a9a8adfb7d]
> > > Removing local Kafka Streams application data in
> > >
> > > /home/admin/Documents/kafka_2.11-0.10.2.1/kafka-streams/
> > SingleConsumerMultiConsumerUsingStreamx4
> > > for application SingleConsumerMultiConsumerUsingStreamx4.
> > > (org.apache.kafka.streams.KafkaStreams)
> > > [2017-10-24 10:27:35,311] DEBUG stream-thread [cleanup] Acquired state
> > dir
> > > lock for task 0_0
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,311] INFO stream-thread [cleanup] Deleting
> obsolete
> > > state directory 0_0 for task 0_0 as cleanup delay of 0 ms has passed
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,322] DEBUG stream-thread [cleanup] Released state
> > dir
> > > lock for task 0_0
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,322] DEBUG stream-thread [cleanup] Acquired state
> > dir
> > > lock for task 1_0
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,322] INFO stream-thread [cleanup] Deleting
> obsolete
> > > state directory 1_0 for task 1_0 as cleanup delay of 0 ms has passed
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Released state
> > dir
> > > lock for task 1_0
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Acquired state
> > dir
> > > lock for task 0_1
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,395] INFO stream-thread [cleanup] Deleting
> obsolete
> > > state directory 0_1 for task 0_1 as cleanup delay of 0 ms has passed
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Released state
> > dir
> > > lock for task 0_1
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Acquired state
> > dir
> > > lock for task 1_1
> > > (org.apache.kafka.streams.processor.internals.StateDirectory)
> > > [2017-10-24 10:

Re: Streams changelog topic retention is high

2017-10-30 Thread Damian Guy
The retention for the joins is as specified above. With until set to 240
and StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG
set to 1 day (the default) this would be 10080

For plain key value stores, there should be no retention period as the
topics are compacted only.

On Mon, 30 Oct 2017 at 15:48 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> Actually I am using Key Value store, I do use join as part of my DAG(until
> for the same has been set at 240 mins). The sink processor is key-value, is
> there any option to control it.
>
> -Sameer.
>
> On Mon, Oct 30, 2017 at 6:33 PM, Damian Guy <damian@gmail.com> wrote:
>
> > The topics in question are both changelogs for window stores. The
> retention
> > period for them is calculated as the Window retention period, which is
> the
> > value that is passed to `JoinWindows.until(...)` (default is 1 day) plus
> > the value of the config
> > StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG (
> > default is 1 day)
> >
> >
> >
> > On Mon, 30 Oct 2017 at 12:49 Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I have configured my settings to be the following:-
> > >
> > > log.retention.hours=3
> > > delete.topic.enable=true
> > > delete.retention.ms=1080
> > > min.cleanable.dirty.ratio=0.20
> > > segment.ms=18
> > >
> > > Howsoever, the changelog topic created as part of stream has the
> > > rentention.ms to be 10080, the source topic has it to be 3 hours.
> > >
> > > [root@dmpkafka6591 kafka-11-single]# bin/kafka-topics.sh --describe
> > > --zookeeper 172.29.65.91:2181 --topic
> > > c-7-aq7-KSTREAM-JOINOTHER-19-store-changelog
> > > Topic:c-7-aq7-KSTREAM-JOINOTHER-19-store-changelog
> > > PartitionCount:60   ReplicationFactor:1 Configs:retention.ms=
> > > 10080,cleanup.policy=delete,compact
> > >
> > > Can someone please explain this behavior.
> > >
> > > -Sameer.
> > >
> >
>


Re: Streams changelog topic retention is high

2017-10-30 Thread Damian Guy
The topics in question are both changelogs for window stores. The retention
period for them is calculated as the Window retention period, which is the
value that is passed to `JoinWindows.until(...)` (default is 1 day) plus
the value of the config
StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG (
default is 1 day)



On Mon, 30 Oct 2017 at 12:49 Sameer Kumar  wrote:

> Hi,
>
> I have configured my settings to be the following:-
>
> log.retention.hours=3
> delete.topic.enable=true
> delete.retention.ms=1080
> min.cleanable.dirty.ratio=0.20
> segment.ms=18
>
> Howsoever, the changelog topic created as part of stream has the
> rentention.ms to be 10080, the source topic has it to be 3 hours.
>
> [root@dmpkafka6591 kafka-11-single]# bin/kafka-topics.sh --describe
> --zookeeper 172.29.65.91:2181 --topic
> c-7-aq7-KSTREAM-JOINOTHER-19-store-changelog
> Topic:c-7-aq7-KSTREAM-JOINOTHER-19-store-changelog
> PartitionCount:60   ReplicationFactor:1 Configs:retention.ms=
> 10080,cleanup.policy=delete,compact
>
> Can someone please explain this behavior.
>
> -Sameer.
>


Re: regarding number of Stream Tasks

2017-10-24 Thread Damian Guy
It would depend on what your topology looks like, which you haven't show
here. But if there may be internal topics generated due to repartitioning
which would cause the extra tasks.
If you provide the topology we would be able to tell you.
Thanks,
Damian

On Tue, 24 Oct 2017 at 10:14 pravin kumar  wrote:

> I have created a stream with topic contains 5 partitions and expected to
> create 5 stream tasks ,i got 10 tasks as
> 0_0  0_1  0_2  0_3  0_4  1_0  1_1  1_2  1_3  1_4
>
>
> my doubt is:im expected to have 5 tasks how it produced 10 tasks
>
> here are some logs:
> [2017-10-24 10:27:35,284] INFO Kafka commitId :
> cb8625948210849f (org.apache.kafka.common.utils.AppInfoParser)
> [2017-10-24 10:27:35,284] DEBUG Kafka consumer created
> (org.apache.kafka.clients.consumer.KafkaConsumer)
> [2017-10-24 10:27:35,304] INFO stream-thread
>
> [SingleConsumerMultiConsumerUsingStreamx4-4dc5b303-62b4-4898-9d8f-a1a9a8adfb7d-StreamThread-1]
> State transition from CREATED to RUNNING.
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2017-10-24 10:27:35,306] DEBUG stream-client
>
> [SingleConsumerMultiConsumerUsingStreamx4-4dc5b303-62b4-4898-9d8f-a1a9a8adfb7d]
> Removing local Kafka Streams application data in
>
> /home/admin/Documents/kafka_2.11-0.10.2.1/kafka-streams/SingleConsumerMultiConsumerUsingStreamx4
> for application SingleConsumerMultiConsumerUsingStreamx4.
> (org.apache.kafka.streams.KafkaStreams)
> [2017-10-24 10:27:35,311] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 0_0
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,311] INFO stream-thread [cleanup] Deleting obsolete
> state directory 0_0 for task 0_0 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,322] DEBUG stream-thread [cleanup] Released state dir
> lock for task 0_0
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,322] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 1_0
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,322] INFO stream-thread [cleanup] Deleting obsolete
> state directory 1_0 for task 1_0 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Released state dir
> lock for task 1_0
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 0_1
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] INFO stream-thread [cleanup] Deleting obsolete
> state directory 0_1 for task 0_1 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Released state dir
> lock for task 0_1
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 1_1
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,395] INFO stream-thread [cleanup] Deleting obsolete
> state directory 1_1 for task 1_1 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,396] DEBUG stream-thread [cleanup] Released state dir
> lock for task 1_1
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,396] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 0_2
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,396] INFO stream-thread [cleanup] Deleting obsolete
> state directory 0_2 for task 0_2 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,397] DEBUG stream-thread [cleanup] Released state dir
> lock for task 0_2
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,397] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 1_2
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,397] INFO stream-thread [cleanup] Deleting obsolete
> state directory 1_2 for task 1_2 as cleanup delay of 0 ms has passed
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,398] DEBUG stream-thread [cleanup] Released state dir
> lock for task 1_2
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,398] DEBUG stream-thread [cleanup] Acquired state dir
> lock for task 0_3
> (org.apache.kafka.streams.processor.internals.StateDirectory)
> [2017-10-24 10:27:35,398] INFO stream-thread [cleanup] Deleting obsolete
> state directory 0_3 for task 0_3 as cleanup delay of 0 ms has passed
> 

Re: Kafka Streams : Problem with Global State Restoration

2017-10-18 Thread Damian Guy
Hi Tony,

The issue is that the GlobalStore doesn't use the Processor when restoring
the state. It just reads the raw records from the underlying topic. You
could work around this by doing the processing and writing to another
topic. Then use the other topic as the source for your global-store.
It is probably worth raising a JIRA for this, too.

Thanks,
Damian

On Wed, 18 Oct 2017 at 17:01 Tony John  wrote:

> Hello All,
>
> I have been trying to create an application on top of Kafka Streams. I am
> newbie to Kafka & Kakfa streams. So please excuse if I my understanding are
> wrong.
>
> I got the application running fine on a single instance ec2 instance in
> AWS. Now I am looking at scaling and ran in to some issues. The application
> has a global state store and couple of other local one's backed by RocksDB.
> It uses the processor API's and the stream is built using the
> TopologyBuilder. The global state store is fed by a topic which send a key
> value pair (both are protobuf objects) and connected to a processor which
> then transforms the value by applying some logic, finally stores the key
> and the modified data to the store. Similarly the local stores are
> connected via processors which are fed by different topics. Now the issue
> is that when I launch a new instance of the app, task re-allocation and
> state restoration happens, and the stores get replicated on to the new
> instance. But the global store which is replicated on to the new instance
> has some other data (I guess thats the raw data) as opposed to the
> processed data.
>
> *Application Topology*
>
> *Global Store*
>
> Source Topic (Partition Count = 1, Replication Factor = 2, Compacted =
> false) -> GlobalStoreProcessor (Persistent, Caching enabled, logging
> disabled) -> Global Store
>
> *Local Store*
>
> Source Topic (Partition Count = 16, Replication Factor = 2, Compacted =
> true)
>
>  -> LocalStoreProcessor (
> Persistent, Caching enabled, Logging enabled
>
> ) -> Local state stores on different partitions
>
> *Sample Code (Written in Kotlin)*
>
> val streams: KafkaStreams
> init {
> val builder = KStreamBuilder().apply {
>
> val globalStore = Stores.create(Config.DICTIONARY)
> .withKeys(Serdes.String())
> .withValues(Serdes.String())
> .persistent()
> .enableCaching()
> .disableLogging()
> .build() as
> StateStoreSupplier>
>
> addGlobalStore(globalStore, "dictionary-words-source",
> Serdes.String().deserializer(), Serdes.String().deserializer(),
> Config.DICTIONARY_WORDS_TOPIC,
> "dictionary-words-processor", DictionaryWordsProcessor.Supplier)
>
>
> addSource("query-source", Serdes.String().deserializer(),
> Serdes.String().deserializer(), Config.QUERIES_TOPIC)
> addProcessor("query-processor", QueryProcessor.Supplier,
> "query-source")
>
> }
>
> val config =
> StreamsConfig(mapOf(StreamsConfig.APPLICATION_ID_CONFIG to
> Config.APPLICATION_ID,
> StreamsConfig.BOOTSTRAP_SERVERS_CONFIG to Config.KAFKA_SERVERS,
> StreamsConfig.STATE_DIR_CONFIG to Config.STATE_STORE_DIR
> ))
> streams = KafkaStreams(builder, config)
>
> Runtime.getRuntime().addShutdownHook(Thread {
> println("Shutting down Kafka Streams...")
> streams.close()
> println("Shut down successfully")
> })
> }
>
> fun run() {
> Utils.createTopic(Config.DICTIONARY_WORDS_TOPIC, 1,
> Config.REPLICATION_FACTOR, true)
> Utils.createTopic(Config.QUERIES_TOPIC, Config.PARTITION_COUNT,
> Config.REPLICATION_FACTOR, false)
> streams.start()
> }
>
>
> *Environment Details:* 1 ZooKeeper, 2 Brokers, and 1/2 application
> instances.
>
>
> So just wanted to know the process of state store restoration while scaling
> up and down. How does the streams manage to restore the data? I was
> expecting when the new instance gets launched, the data flows through the
> same processor so that it gets modified using the same logic which is
> applied when it was stored in instance 1. Could you please help me
> understand this little better. Please let me know if there is anyway to get
> the restoration process to route the data via the same processor.
>
>
> Thanks,
> Tony
>


Re: KTable Tombstone and expiry of records in Session Window

2017-10-18 Thread Damian Guy
Hi Ahmad,


>1. Given SessionTime can continue to expand the window that is
>considered part of the same session, i.e., it's based on data arriving
> for
>that key. What happens with retention time?


As the session expands the data for the session will continue to be
retained as it is still active. The session window uses the endTime of the
window to determine its retention policy.


> I've seen online definitions
>that seem to define the expiry of records due to retention as as
> StreamTime
>- Retention time. Is this correct and does it always hold true even if
> the
>Session continues to expand due to recent activity for a key? The gist
> of
>the question here: Is retention time/expiry calculation impacted by or
> take
>into consideration session window expansions?
>

Yes


>2. In the scenario described above with the KTable.toStream I am getting
>Tombstone records; i.e., records with a Key and Null value. Are these
> to be
>expected? (My assumption is Yes). Are these a result of "expiry" based
> on
>retention period?
>

The tombstones are expected, but are not because the session has expired.
The tombstones are sent went sessions merge to form a larger session. So it
indicates that previous session is no longer valid. For example if you have
2 sessions and you have an inactivity gap of 5:

key=1 start=0 end=0
key=1 start=6 end=6

and then you get another record for the same key at time 3 then the 2
sessions above would be merged into:

key=1 start=0 end=6

and we'd send tombstones
key=1 start=0 end=0
key=1 start=6 end=6


>3. Can I rely on these "Tombstone" records to indicate expiry from the
>session store?
>
>
We don't send tombstones when the sessions have expired. It is the same as
for any other types of windows. We just send the updates to the windows as
they arrive.


>
> This ultimately boils down to understanding Windows better but also towards
> trying to establish a proxy for indicating when a window expires as Kafka
> Streams doesn't seem to support this yet. With that said, any plans on
> supporting an indicator that tells downstream nodes that a message in a
> Window has expired, even if this is done in batch as it seems expiry is
> actually on the rocks-db segment level assuming default state stores.
>
>
A window doesn't really ever expire, it is just retained for a period of
time. This is to allow for late arriving data. When the rocks-db segment is
dropped that means that retention time has passed.


> Thanks!
>
> --
> Ahmad Alkilani
>


Re: Kafka Streams 0.11.0.1 Rebalancing Delay

2017-10-18 Thread Damian Guy
t; > > > >   }
> > > > > }
> > > > >
> > > > > Again, thanks for your help,
> > > > > Johan
> > > > >
> > > > > On Tue, Oct 17, 2017 at 11:57 AM Guozhang Wang <wangg...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hello Johan,
> > > > > >
> > > > > > Could you list the following information in addition to your
> > > topology?
> > > > > >
> > > > > > 1. Your config settings if you have any overwrites, especially
> > > > > consumer's "
> > > > > > poll.ms" and "max.poll.records", and "num.threads".
> > > > > >
> > > > > > 2. Your expected incoming data rate (messages per sec) at normal
> > > > > processing
> > > > > > phase.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 17, 2017 at 9:12 AM, Johan Genberg <
> > > > johan.genb...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Thank you for responding so quickly. This is the topology. I've
> > > > > > simplified
> > > > > > > it a bit, but these are the steps it goes through, not sure if
> > that
> > > > is
> > > > > > > helpful. I'll try to get some logs too in a bit.
> > > > > > >
> > > > > > > *KStream<Integer, Event> eventStream = builder.stream(*
> > > > > > > *  topic.keySerde(),*
> > > > > > > *  topic.valueSerde(),*
> > > > > > > *  topic.name <http://topic.name>());*
> > > > > > >
> > > > > > > *KStream<Integer, Event> enriched =*
> > > > > > > *eventStream*
> > > > > > > *.map(...)*
> > > > > > > *.transformValues(MyStatefulProcessor::new, "store1")*
> > > > > > > *.mapValues(new MyStatelessProcessor());*
> > > > > > > *.through("topic")*
> > > > > > > *.map(new MyStatelessProcessor2());*
> > > > > > > *.through("topic2")*
> > > > > > > *.transformValues(MyStatefulProcessor2::new, "store2")*
> > > > > > > *.through("topic3")*
> > > > > > > *.map(new MyStatelessProcessor3());*
> > > > > > > *.through("topic4");*
> > > > > > >
> > > > > > > Store 1:
> > > > > > >
> > > > > > > *Stores.create("store1")*
> > > > > > > *.withKeys(Serdes.String())*
> > > > > > > *.withValues(Serdes.Long())*
> > > > > > > *.inMemory()*
> > > > > > > *.build();*
> > > > > > >
> > > > > > > Store 2:
> > > > > > >
> > > > > > > *Stores.create("store2")*
> > > > > > > *.withKeys(Serdes.String())*
> > > > > > > *.withValues(valueSerde) // a small json object, 4-5
> > > properties*
> > > > > > > *.persistent()*
> > > > > > > *.build();*
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Johan
> > > > > > >
> > > > > > > On Tue, Oct 17, 2017 at 2:51 AM Damian Guy <
> damian@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Johan,
> > > > > > > >
> > > > > > > > Do you have any logs? The state store restoration changed
> > > > > significantly
> > > > > > > in
> > > > > > > > 0.11.0.1. If you could get some logs at trace level, that
> would
> > > be
> > > > > > > useful.
> > > > > > > > Also if you could provide your topology (removing anything
> > > > > > > > proprietary/sensitive).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Damian
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2017 at 05:55 Johan Genberg <
> > > > johan.genb...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm upgrading a kafka streams application from 0.10.2.1 to
> > > > > 0.11.0.1,
> > > > > > > > > running against a kafka cluster with version 0.10.2.1. The
> > > > > > application
> > > > > > > > uses
> > > > > > > > > a couple of state stores.
> > > > > > > > >
> > > > > > > > > When stopping/starting the application prior to the upgrade
> > > (with
> > > > > > > > 0.10.2.1
> > > > > > > > > client) on 2 instances, it was up and running in less than
> > 30s
> > > > to a
> > > > > > > > minute
> > > > > > > > > on all nodes. However, after client was upgraded to
> 0.11.0.1,
> > > > when
> > > > > > the
> > > > > > > > > instances started (during some load), it took about 6
> minutes
> > > for
> > > > > one
> > > > > > > of
> > > > > > > > > the nodes to reach "RUNNING" state, and the second one
> didn't
> > > get
> > > > > > > there.
> > > > > > > > > After 10 minutes I had to roll back.
> > > > > > > > >
> > > > > > > > > Is it expected that this initial rebalancing takes a little
> > > > longer
> > > > > > with
> > > > > > > > > 0.11.0.1, and is there a way to configure or tweak the
> > client,
> > > or
> > > > > > > > otherwise
> > > > > > > > > optimize this to make this go faster?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Johan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: KIP-99 streams global ktable - slowly changing dimension type 2 supported?

2017-10-17 Thread Damian Guy
Hi Chris,

You can only join on the key of the table, so i don't think this would work
as is. Also, the global table is updated in a different thread and there is
no guarantee that it would have been updated before the purchase.

Perhaps you could do it by making the key of the product table versioned?
And then the purchase references the versioned key? You would have the
version history and be able to join with the appropriate product version,
however there is still the possiblity that the data in the global table
wasn't updated before the purchase.

Thanks,
Damian

On Tue, 17 Oct 2017 at 12:59 chris snow  wrote:

> Thanks, Guozhang.
>
> I've been thinking about the following approach: https://imgur.com/a/pP92Z
>
> Does this approach make sense?
>
> A key consideration will be that the product dimension table updates are
> processed and added to kafka before the corresponding purchase transaction
> record is processed.
>
>
>
> On 17 October 2017 at 02:15, Guozhang Wang  wrote:
>
> > Hello Chris,
> >
> > The global table described in KIP-99 will keep the most recent snapshot
> of
> > the table when applying updates to the table, i.e. it is like type 1:
> > overwrite. So when a table or stream is joined with the global table, it
> is
> > always joined with the most recent values of the global table.
> >
> > However, note that in Kafka Streams api, joining streams are synchronized
> > based on their incoming record's timestamps (i.e. the library will choose
> > which records to process next, either from the global dimension table's
> > changelog, or from the fact table's changelog, based on their stream time
> > in the best effort), so if you have an updated value on the fact table,
> > that update's timestamp will be aligned with the the current updates on
> the
> > global table as well.
> >
> >
> > Guozhang
> >
> >
> > On Mon, Oct 16, 2017 at 12:51 PM, chris snow 
> wrote:
> >
> > > The streams global ktable wiki page [1] describes a data warehouse syle
> > > operation whereby dimension tables are joined to fact tables.
> > >
> > > I’m interested in whether this approach works for type 2 slowly
> changing
> > > dimensions [2]?  In type 2 scd the dimension record history is
> preserved
> > > and the fact table record is joined to the appropriate version of the
> > > dimension table record.
> > >
> > > —
> > > [1]
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 99%3A+Add+Global+Tables+to+Kafka+Streams
> > > [2] https://en.m.wikipedia.org/wiki/Slowly_changing_dimension
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: Kafka Streams 0.11.0.1 Rebalancing Delay

2017-10-17 Thread Damian Guy
Hi Johan,

Do you have any logs? The state store restoration changed significantly in
0.11.0.1. If you could get some logs at trace level, that would be useful.
Also if you could provide your topology (removing anything
proprietary/sensitive).

Thanks,
Damian

On Tue, 17 Oct 2017 at 05:55 Johan Genberg  wrote:

> Hi,
>
> I'm upgrading a kafka streams application from 0.10.2.1 to 0.11.0.1,
> running against a kafka cluster with version 0.10.2.1. The application uses
> a couple of state stores.
>
> When stopping/starting the application prior to the upgrade (with 0.10.2.1
> client) on 2 instances, it was up and running in less than 30s to a minute
> on all nodes. However, after client was upgraded to 0.11.0.1, when the
> instances started (during some load), it took about 6 minutes for one of
> the nodes to reach "RUNNING" state, and the second one didn't get there.
> After 10 minutes I had to roll back.
>
> Is it expected that this initial rebalancing takes a little longer with
> 0.11.0.1, and is there a way to configure or tweak the client, or otherwise
> optimize this to make this go faster?
>
> Thanks,
> Johan
>


Re: Kafka Streams Transformer: context.forward() from different thread

2017-10-10 Thread Damian Guy
Hi,
No, context.forward() always needs to be called from the StreamThread. If
you call it from another thread the behaviour is undefined and in most
cases will be incorrect, likely resulting in an exception.

On Tue, 10 Oct 2017 at 09:04 Murad Mamedov  wrote:

> Hi, here is the question:
>
> Transformer's transform() implementation starts some processing
> asynchronously, i.e. transform() implementation returns null. Then once
> asynchronous processing is complete in another thread, is it correct to
> call context.forward() from that thread?
>
> Thanks in advance
>


Re: Serve interactive queries from standby replicas

2017-10-06 Thread Damian Guy
Hi,

No that isn't supported.

Thanks,
Damian

On Fri, 6 Oct 2017 at 04:18 Stas Chizhov  wrote:

> Hi
>
> Is there a way to serve read read requests from standby replicas?
> StreamsMeatadata does not seem to provide standby end points as far as I
> can see.
>
> Thank you,
> Stas
>


Re: Kafka Streams Avro SerDe version/id caching

2017-10-03 Thread Damian Guy
If you are using the confluent schema registry then the will be cached by
the SchemaRegistryClient.

Thanks,
Damian

On Tue, 3 Oct 2017 at 09:00 Ted Yu  wrote:

> I did a quick search in the code base - there doesn't seem to be caching as
> you described.
>
> On Tue, Oct 3, 2017 at 6:36 AM, Kristopher Kane 
> wrote:
>
> > If using a Byte SerDe and schema registry in the consumer configs of a
> > Kafka streams application, does it cache the Avro schemas by ID and
> version
> > after fetching from the registry once?
> >
> > Thanks,
> >
> > Kris
> >
>


Re: out of order sequence number in exactly once streams

2017-09-29 Thread Damian Guy
You can set ProducerConfig.RETRIES_CONFIG in your StreamsConfig, i.e,

Properties props = new Properties();
props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
...

On Fri, 29 Sep 2017 at 13:17 Sameer Kumar  wrote:

> I guess once stream app are enabled exactly-once, producer idempotence get
> enabled by default and so do the retries. I guess producer retries are
> managed internally and not exposed through streamconfig.
>
> https://kafka.apache.org/0110/documentation/#streamsconfigs
>
> -Sameer.
>
> On Thu, Sep 28, 2017 at 12:12 AM, Matthias J. Sax 
> wrote:
>
> > An OutOfOrderSequenceException should only occur if a idempotent
> > producer gets out of sync with the broker. If you set
> > `enable.idempotence = true` on your producer, you might want to set
> > `retries = Integer.MAX_VALUE`.
> >
> > -Matthias
> >
> > On 9/26/17 11:30 PM, Sameer Kumar wrote:
> > > Hi,
> > >
> > > I again received this exception while running my streams app. I am
> using
> > > Kafka 11.0.1. After restarting my app, this error got fixed.
> > >
> > > I guess this might be due to bad network. Any pointers. Any config
> > > wherein I can configure it for retries.
> > >
> > > Exception trace is attached.
> > >
> > > Regards,
> > > -Sameer.
> >
> >
>


Re: kaka-streams 0.11.0.1 rocksdb bug?

2017-09-26 Thread Damian Guy
It looks like only one of the restoring tasks ever transitions to running,
but it is impossible to tell why from the logs. My guess is there is a bug
in there somewhere.

Interestingly i only see this log line once:
"2017-09-22 14:08:09 DEBUG StoreChangelogReader:152 - stream-thread
[argyle-streams-fp-StreamThread-21] Starting restoring state stores from
changelog topics []"

But there should be one for all the tasks that are restoring.

Can you please also provide your topology?

Thanks,
Damian

On Mon, 25 Sep 2017 at 19:44 Ara Ebrahimi <ara.ebrah...@argyledata.com>
wrote:

> Please find attached the entire log. Hope it helps.
>
> Ara.
>
>
> On Sep 25, 2017, at 7:59 AM, Damian Guy <damian@gmail.com> wrote:
>
> Hi, is that the complete log? It looks like there might be 2 tasks that are
> still restoring:
> 2017-09-22 14:08:09 DEBUG AssignedTasks:90 - stream-thread
> [argyle-streams-fp-StreamThread-6] transitioning stream task 1_18 to
> restoring
> 2017-09-22 14:08:09 DEBUG AssignedTasks:90 - stream-thread
> [argyle-streams-fp-StreamThread-23] transitioning stream task 1_14 to
> restoring
>
> I don't see them transitioning to running.
>
> On Fri, 22 Sep 2017 at 22:21 Ara Ebrahimi <ara.ebrah...@argyledata.com>
> wrote:
>
> I enabled TRACE level logging for kafka streams package and all I see is
> things like this:
>
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-32] Committed all active tasks [0_30] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-32] completed partitions []
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-18] Committed all active tasks [0_4] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-18] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-16] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-13] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-14] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-9] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-5] completed partitions []
> 2017-09-22 14:15:18 DEBUG StreamTask:259 - task [0_29] Committing
> 2017-09-22 14:15:18 DEBUG RecordCollectorImpl:142 - task [0_29] Flushing
> producer
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-13] completed partitions []
> 2017-09-22 14:15:18 DEBUG StreamTask:259 - task [0_1] Committing
> 2017-09-22 14:15:18 DEBUG RecordCollectorImpl:142 - task [0_1] Flushing
> producer
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-5] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-9] completed partitions []
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-14] Committed all active tasks [0_29] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-14] completed partitions []
> 2017-09-22 14:15:19 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-16] Committed all active tasks [0_1] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-16] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-6] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-17] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-23] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-2] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-15] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-30] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-s

Re: kaka-streams 0.11.0.1 rocksdb bug?

2017-09-25 Thread Damian Guy
Hi, is that the complete log? It looks like there might be 2 tasks that are
still restoring:
2017-09-22 14:08:09 DEBUG AssignedTasks:90 - stream-thread
[argyle-streams-fp-StreamThread-6] transitioning stream task 1_18 to
restoring
2017-09-22 14:08:09 DEBUG AssignedTasks:90 - stream-thread
[argyle-streams-fp-StreamThread-23] transitioning stream task 1_14 to
restoring

I don't see them transitioning to running.

On Fri, 22 Sep 2017 at 22:21 Ara Ebrahimi 
wrote:

> I enabled TRACE level logging for kafka streams package and all I see is
> things like this:
>
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-32] Committed all active tasks [0_30] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-32] completed partitions []
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-18] Committed all active tasks [0_4] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-18] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-16] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-13] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-14] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-9] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-5] completed partitions []
> 2017-09-22 14:15:18 DEBUG StreamTask:259 - task [0_29] Committing
> 2017-09-22 14:15:18 DEBUG RecordCollectorImpl:142 - task [0_29] Flushing
> producer
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-13] completed partitions []
> 2017-09-22 14:15:18 DEBUG StreamTask:259 - task [0_1] Committing
> 2017-09-22 14:15:18 DEBUG RecordCollectorImpl:142 - task [0_1] Flushing
> producer
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-5] completed partitions []
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-9] completed partitions []
> 2017-09-22 14:15:18 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-14] Committed all active tasks [0_29] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:18 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-14] completed partitions []
> 2017-09-22 14:15:19 INFO  StreamThread:686 - stream-thread
> [argyle-streams-fp-StreamThread-16] Committed all active tasks [0_1] and
> standby tasks [] in 0ms
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-16] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-6] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-17] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-23] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-2] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-15] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-30] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-22] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-27] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-33] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-3] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-19] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-10] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-4] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 - stream-thread
> [argyle-streams-fp-StreamThread-8] completed partitions []
> 2017-09-22 14:15:19 DEBUG StoreChangelogReader:194 

Re: Kafka Streams application Unable to Horizontally scale and the application on other instances refusing to start.

2017-09-15 Thread Damian Guy
Grepping for StreamThread would be useful, though some logs will be over
multiple lines.
We need to see which partitions have been assigned to the other
instances/threads, it will look something like:

2017-09-14 10:04:22 INFO  StreamThread:160 - stream-thread
[myKafka-kafkareplica101Sept08-cb392e38-1e78-4ab6-9143-eb6bc6ec8219-StreamThread-1]
at state PARTITIONS_REVOKED: new partitions [MYTOPIC05SEPT-24,
MYTOPIC05SEPT-0] assigned at the end of consumer rebalance.
assigned active tasks: [0_0, 0_24]
assigned standby tasks: []
current suspended active tasks: [0_0, 0_1, 0_2, 0_3, 0_4, 0_5,
0_6, 0_7, 0_8, 0_9, 0_10, 0_11, 0_12, 0_13, 0_14, 0_15, 0_16, 0_17,
0_18, 0_19, 0_20, 0_21, 0_22, 0_23, 0_24, 0_25, 0_26, 0_27, 0_28,
0_29, 0_30, 0_31, 0_32, 0_33, 0_34, 0_35]
current suspended standby tasks: []
previous active tasks: [0_0, 0_1, 0_2, 0_3, 0_4, 0_5, 0_6,
0_7, 0_8, 0_9, 0_10, 0_11, 0_12, 0_13, 0_14, 0_15, 0_16, 0_17, 0_18,
0_19, 0_20, 0_21, 0_22, 0_23, 0_24, 0_25, 0_26, 0_27, 0_28, 0_29,
0_30, 0_31, 0_32, 0_33, 0_34, 0_35]


I also noticed that one of your instances looks like it is configured with
12 threads, while the others have 4 - is that correct?

On Fri, 15 Sep 2017 at 10:27 dev loper <spark...@gmail.com> wrote:

> Hi Damian,
>
> I do have the logs for the other application. But its kind of huge since it
> is continuously processing . Do you want me to grep anything specific and
> share it with you ?
>
> Thanks
> Dev
>
> On Fri, Sep 15, 2017 at 2:31 PM, Damian Guy <damian@gmail.com> wrote:
>
> > Hi,
> >
> > Do you have the logs for the other instance?
> >
> > Thanks,
> > Damian
> >
> > On Fri, 15 Sep 2017 at 07:19 dev loper <spark...@gmail.com> wrote:
> >
> > > Dear Kafka Users,
> > >
> > > I am fairly new to Kafka Streams . I have deployed two instances of
> Kafka
> > > 0.11 brokers on AWS M3.Xlarge insatnces. I have created a topic with 36
> > > partitions .and speperate application writes to this topic and it
> > produces
> > > records at the rate of 1 messages per second. I have threes
> instances
> > > of AWS  M4.xlarge instance  where my Kafka streams application is
> running
> > > which consumes these messages produced by the other application. The
> > > application  starts up fine working fine and its processing messages on
> > the
> > > first instance,  but when I start the same application on other
> instances
> > > it is not starting even though the process is alive it is not
> processing
> > > messages.Also I could see the other instances takes a long time to
> start
> > .
> > >
> > > Apart from first instance,  other instances I could see the consumer
> > > getting added and removed repeatedly and I couldn't see any message
> > > processing at all . I have attached the detailed logs where this
> behavior
> > > is observed.
> > >
> > > Consumer is getting started with below log in these instances and
> getting
> > > stopped with below log (* detailed logs attached *)
> > >
> > > INFO  | 21:59:30 | consumer.ConsumerConfig (AbstractConfig.java:223) -
> > > ConsumerConfig values:
> > > auto.commit.interval.ms = 5000
> > > auto.offset.reset = latest
> > > bootstrap.servers = [l-mykafkainstancekafka5101:9092,
> > > l-mykafkainstancekafka5102:9092]
> > > check.crcs = true
> > > client.id =
> > > connections.max.idle.ms = 54
> > > enable.auto.commit = false
> > > exclude.internal.topics = true
> > > fetch.max.bytes = 52428800
> > > fetch.max.wait.ms = 500
> > > fetch.min.bytes = 1
> > > group.id = myKafka-kafkareplica101Sept08
> > > heartbeat.interval.ms = 3000
> > > interceptor.classes = null
> > > internal.leave.group.on.close = true
> > > isolation.level = read_uncommitted
> > > key.deserializer = class mx.july.jmx.proximity.kafka.KafkaKryoCodec
> > > max.partition.fetch.bytes = 1048576
> > > max.poll.interval.ms = 30
> > > max.poll.records = 500
> > > metadata.max.age.ms = 30
> > > metric.reporters = []
> > > metrics.num.samples = 2
> > > metrics.recording.level = INFO
> > > metrics.sample.window.ms = 3
> > > partition.assignment.strategy = [class
> > > org.apache.kafka.clients.consumer.RangeAssignor]
> > > receive.buffer.bytes = 65536
> > > reconnect.backoff.max.ms = 1000
> > >   

Re: Kafka Streams application Unable to Horizontally scale and the application on other instances refusing to start.

2017-09-15 Thread Damian Guy
Hi,

Do you have the logs for the other instance?

Thanks,
Damian

On Fri, 15 Sep 2017 at 07:19 dev loper  wrote:

> Dear Kafka Users,
>
> I am fairly new to Kafka Streams . I have deployed two instances of Kafka
> 0.11 brokers on AWS M3.Xlarge insatnces. I have created a topic with 36
> partitions .and speperate application writes to this topic and it produces
> records at the rate of 1 messages per second. I have threes instances
> of AWS  M4.xlarge instance  where my Kafka streams application is running
> which consumes these messages produced by the other application. The
> application  starts up fine working fine and its processing messages on the
> first instance,  but when I start the same application on other instances
> it is not starting even though the process is alive it is not processing
> messages.Also I could see the other instances takes a long time to start .
>
> Apart from first instance,  other instances I could see the consumer
> getting added and removed repeatedly and I couldn't see any message
> processing at all . I have attached the detailed logs where this behavior
> is observed.
>
> Consumer is getting started with below log in these instances and getting
> stopped with below log (* detailed logs attached *)
>
> INFO  | 21:59:30 | consumer.ConsumerConfig (AbstractConfig.java:223) -
> ConsumerConfig values:
> auto.commit.interval.ms = 5000
> auto.offset.reset = latest
> bootstrap.servers = [l-mykafkainstancekafka5101:9092,
> l-mykafkainstancekafka5102:9092]
> check.crcs = true
> client.id =
> connections.max.idle.ms = 54
> enable.auto.commit = false
> exclude.internal.topics = true
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 500
> fetch.min.bytes = 1
> group.id = myKafka-kafkareplica101Sept08
> heartbeat.interval.ms = 3000
> interceptor.classes = null
> internal.leave.group.on.close = true
> isolation.level = read_uncommitted
> key.deserializer = class mx.july.jmx.proximity.kafka.KafkaKryoCodec
> max.partition.fetch.bytes = 1048576
> max.poll.interval.ms = 30
> max.poll.records = 500
> metadata.max.age.ms = 30
> metric.reporters = []
> metrics.num.samples = 2
> metrics.recording.level = INFO
> metrics.sample.window.ms = 3
> partition.assignment.strategy = [class
> org.apache.kafka.clients.consumer.RangeAssignor]
> receive.buffer.bytes = 65536
> reconnect.backoff.max.ms = 1000
> reconnect.backoff.ms = 50
> request.timeout.ms = 305000
> retry.backoff.ms = 100
> sasl.jaas.config = null
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> sasl.kerberos.min.time.before.relogin = 6
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = PLAINTEXT
> send.buffer.bytes = 131072
> session.timeout.ms = 1
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = null
> ssl.truststore.password = null
> ssl.truststore.type = JKS
> value.deserializer = class my.dev.MessageUpdateCodec
>
>
> DEBUG | 21:59:30 | consumer.KafkaConsumer (KafkaConsumer.java:1617) - The
> Kafka consumer has closed. and the whole process repeats.
>
>
>
> Below you can find my startup code for kafkastreams and the parameters
> which I have configured for starting the kafkastreams application .
>
> private static Properties settings = new Properties();
> settings.put(StreamsConfig.APPLICATION_ID_CONFIG,
> "mykafkastreamsapplication");
> settings.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
> settings.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG,"1");
> settings.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,"3");
>
> settings.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,Integer.MAX_VALUE);
> settings.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1");
>
> settings.put(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG,"6");
>
> KStreamBuilder builder = new KStreamBuilder();
> KafkaStreams streams = new KafkaStreams(builder, settings);
> builder.addSource(.
>  .addProcessor  .
>  .addProcessor  
>
>
> .addStateStore(...).persistent().build(),"myprocessor")
>  .addSink ..
>  . addSink ..
>   streams.start();
>
> and I am using a Simple  processor to process my logic ..
>
> public 

[ANNOUCE] Apache Kafka 0.11.0.1 Released

2017-09-13 Thread Damian Guy
The Apache Kafka community is pleased to announce the release for Apache
Kafka 0.11.0.1. This is a bug fix release that fixes 51 issues in 0.11.0.0.

All of the changes in this release can be found in the release notes:
*https://archive.apache.org/dist/kafka/0.11.0.1/RELEASE_NOTES.html
<https://archive.apache.org/dist/kafka/0.10.2.1/RELEASE_NOTES.html>
<https://archive.apache.org/dist/kafka/0.11.0.1/RELEASE_NOTES.html
<https://archive.apache.org/dist/kafka/0.10.2.1/RELEASE_NOTES.html>.>

Apache Kafka is a distributed streaming platform with four four core APIs:

** The Producer API allows an application to publish a stream records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an output
stream to one or more output topics, effectively transforming the input
streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might capture
every change to a table.three key capabilities:


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react to the
streams of data.


You can download the source release from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/k
afka-0.11.0.1-src.tgz
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.1/kafka-0.10.2.1-src.tgz>

and binary releases from
*https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
kafka_2.11-0.11.0.1.tgz
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz>
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
kafka_2.11-0.11.0.1.tgz
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz>
>*

*https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
kafka_2.12-0.11.0.1.tgz
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.0/kafka_2.12-0.11.0.0.tgz>
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.1/
kafka_2.12-0.11.0.1.tgz
<https://www.apache.org/dyn/closer.cgi?path=/kafka/0.11.0.0/kafka_2.12-0.11.0.0.tgz>
>*


A big thank you for the following 33 contributors to this release!

Apurva Mehta, Bill Bejeck, Colin P. Mccabe, Damian Guy, Derrick Or, Dong
Lin, dongeforever, Eno Thereska, Ewen Cheslack-Postava, Gregor Uhlenheuer,
Guozhang Wang, Hooman Broujerdi, huxihx, Ismael Juma, Jan Burkhardt, Jason
Gustafson, Jeff Klukas, Jiangjie Qin, Joel Dice, Konstantine Karantasis,
Manikumar Reddy, Matthias J. Sax, Max Zheng, Paolo Patierno, ppatierno,
radzish, Rajini Sivaram, Randall Hauch, Robin Moffatt, Stephane Roset,
umesh chaudhary, Vahid Hashemian, Xavier Léauté

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/


Thanks,
Damian


Re: [VOTE] 0.11.0.1 RC0

2017-09-11 Thread Damian Guy
However, it looks like this section:

 tree streams.examples
streams-quickstart
├── pom.xml
└── src
└── main
├── java
│   └── myapps
│   ├── LineSplit.java
│   ├── Pipe.java
│   └── WordCount.java
└── resources
└── log4j.properties

Doesn't render properly - at least for me.

On Mon, 11 Sep 2017 at 09:08 Damian Guy <damian@gmail.com> wrote:

> Hi Guozhang, from what i'm looking at the {{fullDotVersion}} is replaced
> with 0.11.0.1
>
> On Sat, 9 Sep 2017 at 00:55 Guozhang Wang <wangg...@gmail.com> wrote:
>
>> Verified the quickstart and streams tutorial, +1.
>>
>> One minor comment on the web docs of streams tutorial, that when I edited
>> the mvn archetype command I used the template values as:
>>
>> ```
>>
>> 
>> mvn archetype:generate \
>> -DarchetypeGroupId=org.apache.kafka \
>> -DarchetypeArtifactId=streams-quickstart-java \
>> -DarchetypeVersion={{fullDotVersion}} \
>> -DgroupId=streams.examples \
>> -DartifactId=streams.examples \
>> -Dversion=0.1 \
>> -Dpackage=myapps
>> 
>>
>> ```
>>
>> However the {{fullDotVersion}} does auto-render under , at least
>> on my local apache server. Is that also the case on your end Damian?
>>
>>
>>
>> Guozhang
>>
>>
>>
>> On Thu, Sep 7, 2017 at 2:20 AM, Magnus Edenhill <mag...@edenhill.se>
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > Verified with librdkafka regression test suite
>> >
>> > 2017-09-06 11:52 GMT+02:00 Damian Guy <damian@gmail.com>:
>> >
>> > > Resending as i wasn't part of the kafka-clients mailing list
>> > >
>> > > On Tue, 5 Sep 2017 at 21:34 Damian Guy <damian@gmail.com> wrote:
>> > >
>> > > > Hello Kafka users, developers and client-developers,
>> > > >
>> > > > This is the first candidate for release of Apache Kafka 0.11.0.1.
>> > > >
>> > > > This is a bug fix release and it includes fixes and improvements
>> from
>> > 49
>> > > > JIRAs (including a few critical bugs).
>> > > >
>> > > > Release notes for the 0.11.0.1 release:
>> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
>> > RELEASE_NOTES.html
>> > > >
>> > > > *** Please download, test and vote by Saturday, September 9, 9am PT
>> > > >
>> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > > http://kafka.apache.org/KEYS
>> > > >
>> > > > * Release artifacts to be voted upon (source and binary):
>> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
>> > > >
>> > > > * Maven artifacts to be voted upon:
>> > > > https://repository.apache.org/content/groups/staging/
>> > > >
>> > > > * Javadoc:
>> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/javadoc/
>> > > >
>> > > > * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.1 tag:
>> > > >
>> > > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
>> > > a8aa61266aedcf62e45b3595a2cf68c819ca1a6c
>> > > >
>> > > >
>> > > > * Documentation:
>> > > > Note the documentation can't be pushed live due to changes that will
>> > not
>> > > > go live until the release. You can manually verify by downloading
>> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
>> > > kafka_2.11-0.11.0.1-site-docs.tgz
>> > > >
>> > > >
>> > > > * Protocol:
>> > > > http://kafka.apache.org/0110/protocol.html
>> > > >
>> > > > * Successful Jenkins builds for the 0.11.0 branch:
>> > > > Unit/integration tests:
>> > > > https://builds.apache.org/job/kafka-0.11.0-jdk7/298
>> > > >
>> > > > System tests:
>> > > > http://confluent-kafka-0-11-0-system-test-results.s3-us-
>> > > west-2.amazonaws.com/2017-09-05--001.1504612096--apache--0.
>> > > 11.0--7b6e5f9/report.html
>> > > >
>> > > > /**
>> > > >
>> > > > Thanks,
>> > > > Damian
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>


Re: [VOTE] 0.11.0.1 RC0

2017-09-11 Thread Damian Guy
Hi Guozhang, from what i'm looking at the {{fullDotVersion}} is replaced
with 0.11.0.1

On Sat, 9 Sep 2017 at 00:55 Guozhang Wang <wangg...@gmail.com> wrote:

> Verified the quickstart and streams tutorial, +1.
>
> One minor comment on the web docs of streams tutorial, that when I edited
> the mvn archetype command I used the template values as:
>
> ```
>
> 
> mvn archetype:generate \
> -DarchetypeGroupId=org.apache.kafka \
> -DarchetypeArtifactId=streams-quickstart-java \
> -DarchetypeVersion={{fullDotVersion}} \
> -DgroupId=streams.examples \
> -DartifactId=streams.examples \
> -Dversion=0.1 \
> -Dpackage=myapps
> 
>
> ```
>
> However the {{fullDotVersion}} does auto-render under , at least
> on my local apache server. Is that also the case on your end Damian?
>
>
>
> Guozhang
>
>
>
> On Thu, Sep 7, 2017 at 2:20 AM, Magnus Edenhill <mag...@edenhill.se>
> wrote:
>
> > +1 (non-binding)
> >
> > Verified with librdkafka regression test suite
> >
> > 2017-09-06 11:52 GMT+02:00 Damian Guy <damian@gmail.com>:
> >
> > > Resending as i wasn't part of the kafka-clients mailing list
> > >
> > > On Tue, 5 Sep 2017 at 21:34 Damian Guy <damian@gmail.com> wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the first candidate for release of Apache Kafka 0.11.0.1.
> > > >
> > > > This is a bug fix release and it includes fixes and improvements from
> > 49
> > > > JIRAs (including a few critical bugs).
> > > >
> > > > Release notes for the 0.11.0.1 release:
> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
> > RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Saturday, September 9, 9am PT
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/javadoc/
> > > >
> > > > * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.1 tag:
> > > >
> > > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > > a8aa61266aedcf62e45b3595a2cf68c819ca1a6c
> > > >
> > > >
> > > > * Documentation:
> > > > Note the documentation can't be pushed live due to changes that will
> > not
> > > > go live until the release. You can manually verify by downloading
> > > > http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/
> > > kafka_2.11-0.11.0.1-site-docs.tgz
> > > >
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/0110/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 0.11.0 branch:
> > > > Unit/integration tests:
> > > > https://builds.apache.org/job/kafka-0.11.0-jdk7/298
> > > >
> > > > System tests:
> > > > http://confluent-kafka-0-11-0-system-test-results.s3-us-
> > > west-2.amazonaws.com/2017-09-05--001.1504612096--apache--0.
> > > 11.0--7b6e5f9/report.html
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: KTable-KTable Join Semantics on NULL Key

2017-09-08 Thread Damian Guy
It is shown in the table what happens when you get null values for a key.

On Fri, 8 Sep 2017 at 12:31 Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Kafka Users,
>
> KTable-KTable Join Semantics is explained in detailed [here][1]. But,
> it's not clear when the input record is , some times the output
> records are generated  and in some cases it's not.
>
> It will be helpful, if someone explain on how the output records are
> generated for all the 3 types of joins on receiving a record with NULL
> value.
>
> [1]: https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Join+Semantics#KafkaStreamsJoinSemantics-KTable-KTableJoin
>
> -- Kamal
>


[VOTE] 0.11.0.1 RC0

2017-09-05 Thread Damian Guy
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 0.11.0.1.

This is a bug fix release and it includes fixes and improvements from 49
JIRAs (including a few critical bugs).

Release notes for the 0.11.0.1 release:
http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Saturday, September 9, 9am PT

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/javadoc/

* Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.1 tag:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=a8aa61266aedcf62e45b3595a2cf68c819ca1a6c


* Documentation:
Note the documentation can't be pushed live due to changes that will not go
live until the release. You can manually verify by downloading
http://home.apache.org/~damianguy/kafka-0.11.0.1-rc0/kafka_2.11-0.11.0.1-site-docs.tgz


* Protocol:
http://kafka.apache.org/0110/protocol.html

* Successful Jenkins builds for the 0.11.0 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-0.11.0-jdk7/298

System tests:
http://confluent-kafka-0-11-0-system-test-results.s3-us-west-2.amazonaws.com/2017-09-05--001.1504612096--apache--0.11.0--7b6e5f9/report.html

/**

Thanks,
Damian


Re: Potential Bug | GlobalStateManager checkpoint

2017-09-04 Thread Damian Guy
Thanks Sameer, yes this looks like a bug. Can you file a JIRA?

On Mon, 4 Sep 2017 at 12:23 Sameer Kumar  wrote:

> Hi,
>
> I am using InMemoryStore along with GlobalKTable. I came to realize that I
> was losing on data once I restart my stream application while it was
> consuming data from kafka topic since it would always start with last saved
> checkpoint. This shall work fine with RocksDB it being a persistent store.
> for in memory store it should be consume from beginning.
>
> Debugging it further, I looked at the code for GlobalStateManagerImpl(this
> one works for GlobalKTable) and was comparing the same with
> ProcessorStateManagerImpl(this one works for KTable).
>
> In ProcessorStateManagerImpl.checkpoint, we have added the check for when
> state store being persistent before writing the checkpoints, the same check
> is not there in GlobalStateManagerImpl.checkpoint method. Do you think the
> same check needs to be added for GlobalStateManagerImpl.
>
>   public void checkpoint(final Map ackedOffsets) {
> log.trace("{} Writing checkpoint: {}", logPrefix, ackedOffsets);
> checkpointedOffsets.putAll(changelogReader.restoredOffsets());
> for (final Map.Entry entry : stores.entrySet())
> {
> final String storeName = entry.getKey();
> // only checkpoint the offset to the offsets file if
> // it is persistent AND changelog enabled
> *if (entry.getValue().persistent() &&
> storeToChangelogTopic.containsKey(storeName)) {*
> final String changelogTopic = storeToChangelogTopic.get(
> storeName);
> final TopicPartition topicPartition = new
> TopicPartition(changelogTopic, getPartition(storeName));
> if (ackedOffsets.containsKey(topicPartition)) {
> // store the last offset + 1 (the log position after
> restoration)
> checkpointedOffsets.put(topicPartition,
> ackedOffsets.get(topicPartition) + 1);
> } else if (restoredOffsets.containsKey(topicPartition)) {
> checkpointedOffsets.put(topicPartition,
> restoredOffsets.get(topicPartition));
> }
> }
> }
> // write the checkpoint file before closing, to indicate clean
> shutdown
> try {
> if (checkpoint == null) {
> checkpoint = new OffsetCheckpoint(new File(baseDir,
> CHECKPOINT_FILE_NAME));
> }
> checkpoint.write(checkpointedOffsets);
> } catch (final IOException e) {
> log.warn("Failed to write checkpoint file to {}:", new
> File(baseDir, CHECKPOINT_FILE_NAME), e);
> }
> }
>
> Regards,
> -Sameer.
>


Re: Kafka streams application (v 0.10.0.1) stuck at close

2017-08-23 Thread Damian Guy
Hi,

If you can then i'd recommend upgrading to a newer version. As you said
many bugs have been fixed since 0.10.0.1

On Wed, 23 Aug 2017 at 05:08 Balaprassanna Ilangovan <
balaprassanna1...@gmail.com> wrote:

> Hi,
>
> I have the following three question regarding Apache Kafka streams.
>
> 1. I am in a position to use v 0.10.0.1 of Apache Kafka though so many bugs
> related to streams are fixed in the later versions. My application consists
> of low level processors that run for more than an hour for larger files
> (video transcoding). So, we use a session timeout and request timeout of 2
> hrs. streams.close() is stuck for a long time even the processors are idle.
> Is there a reason? Is there a work around for this version?
>
>
There were some bugs to do with streams.close() in earlier versions that
may cause deadlocks. This may be the issue:
https://issues.apache.org/jira/browse/KAFKA-4366


> 2. Also, what does processorContext.commit() do exactly? Does it save the
> position of application in a topology or commit consumed message offset in
> the partition? Though commits are handled automatically by streams, should
> context.commit() be called at the end of each processor in a topology?
>
>
context.commit is just telling the Streams Library to commit that task the
next time it goes through the poll loop. You don't need to call this unless
you specifically want to commit after you have processed 1 or more records.
Otherwise this is handled automatically by the commit.interval.ms config


> 3. In a topology, if two processor completes successfully and the
> application goes down during third processor execution, does it start from
> first when the application comes back?
>

Each task will start from the last committed position. So if that was all
in a single thread, then it will start from the beginning again.


> --
> With Regards,
> Bala Prassanna I,
> 8124831208 <(812)%20483-1208>
>


Re: Global KTable value is null in Kafka Stream left join

2017-08-18 Thread Damian Guy
Hi,

If the userData value is null then that would usually mean that there
wasn't a record with the provided key in the global table. So you should
probably check if you have the expected data in the global table and also
check that your KeyMapper is returning the correct key.

Thanks,
Damian



On Fri, 18 Aug 2017 at 12:13 Duy Truong  wrote:

> Hi everyone,
>
> When using left join, I can't get the value of Global KTable record in
> ValueJoiner parameter (3rd parameter). Here is my code:
>
> val userTable: GlobalKTable[String, UserData] =
> builder.globalTable(Serdes.String(), userDataSede, userTopic,
> userDataStore)
>
> val jvnStream: KStream[String, JVNModel] = sourceStream.leftJoin(userTable,
>   (eventId: String, dataLog: DataLog) => {
> dataLog.rawData.userId
>   },
>   (dataLog, userData: UserData) => {
> // userData is null.
>
>   })
>
> What I have to do to resolve this issue?
>
> Thanks
> --
> *Duy Truong*
>


Re: Continue to consume messages when exception occurs in Kafka Stream

2017-08-18 Thread Damian Guy
Duy, if it is in you logic then you need to handle the exception yourself.
If you don't then it will bubble out and kill the thread.

On Fri, 18 Aug 2017 at 10:27 Duy Truong  wrote:

> Hi Eno,
>
> Sorry for late reply, it's not a deserialization exception, it's a pattern
> matching exception in my logic.
>
> val jvnStream: KStream[String, JVNModel] = sourceStream.leftJoin(userTable,
>   (eventId: String, datatup: (DataLog, Option[CrawlData])) => {
> datatup._1.rawData.userId
>   },
>   (tuple, fbData: FacebookData) => {
> val (dmpData, Some(crawData)) = tuple // exception here
>
> // something here
>
>   })
>
> Thanks
>
>
> On Thu, Aug 17, 2017 at 11:11 PM, Duy Truong 
> wrote:
>
> > Hi everyone,
> >
> > My kafka stream app has an exception (my business exception), and then it
> > doesn't consume messages anymore. Is there any way to make my app
> continues
> > consume messages when the exception occurs?
> >
> > Thanks
> >
> > --
> > *Duy Truong*
> >
>
>
>
> --
> *Duy Truong*
>


Re: RocksDB error

2017-08-16 Thread Damian Guy
I see. It is the same issue, though. The problem is that Long.MAX_VALUE is
actually too large, it causes an overflow so the task will still run, i.e,
in this bit of code:

if (now > lastCleanMs + cleanTimeMs) {
stateDirectory.cleanRemovedTasks(cleanTimeMs);
lastCleanMs = now;
}

So, you will need to set it to a large enough value to disable it but not
Long.MAX_VALUE (sorry)

On Wed, 16 Aug 2017 at 10:21 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> I have already set this configuration. This info is there in logs as well.
>
> state.cleanup.delay.ms = 9223372036854775807
>
> -Sameer.
>
> On Wed, Aug 16, 2017 at 1:56 PM, Damian Guy <damian@gmail.com> wrote:
>
> > I believe it is related to a bug in the state directory cleanup. This has
> > been fixed on trunk and also on the 0.11 branch (will be part of 0.11.0.1
> > that will hopefully be released soon). The fix is in this JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-5562
> >
> > To work around it you should set
> > StreamsConfig.STATE_CLEANUP_DELAY_MS_CONFIG to Long.MAX_VALUE, i.e.,
> > disable it.
> >
> > Thanks,
> > Damian
> >
> >
> > On Wed, 16 Aug 2017 at 07:39 Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
> >
> > > Hi Damian,
> > >
> > > Please find the relevant logs as requested. Let me know if you need any
> > > more info.
> > >
> > > -Sameer.
> > >
> > > On Tue, Aug 15, 2017 at 9:33 PM, Damian Guy <damian@gmail.com>
> > wrote:
> > >
> > >> Sameer, the log you attached doesn't contain the logs *before* the
> > >
> > >
> > >> exception happened.
> > >>
> > >> On Tue, 15 Aug 2017 at 06:13 Sameer Kumar <sam.kum.w...@gmail.com>
> > wrote:
> > >>
> > >> > I have added a attachement containing complete trace in my initial
> > mail.
> > >> >
> > >> > On Mon, Aug 14, 2017 at 9:47 PM, Damian Guy <damian@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Do you have the logs leading up to the exception?
> > >> > >
> > >> > > On Mon, 14 Aug 2017 at 06:52 Sameer Kumar <sam.kum.w...@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Exception while doing the join, cant decipher more on this. Has
> > >> anyone
> > >> > > > faced it. complete exception trace attached.
> > >> > > >
> > >> > > > 2017-08-14 11:15:55 ERROR ConsumerCoordinator:269 - User
> provided
> > >> > > listener
> > >> > > > org.apache.kafka.streams.processor.internals.
> > >> > > StreamThread$RebalanceListener
> > >> > > > for group c-7-a34 failed on partition assignment
> > >> > > > org.apache.kafka.streams.errors.ProcessorStateException: Error
> > >> opening
> > >> > > > store KSTREAM-JOINTHIS-18-store-201708140520 at location
> > >> > > > /data/streampoc/c-7-a34/0_4/KSTREAM-JOINTHIS-18-
> > >> > > store/KSTREAM-JOINTHIS-18-store-201708140520
> > >> > > > at
> > >> > > > org.apache.kafka.streams.state.internals.RocksDBStore.
> > >> > > openDB(RocksDBStore.java:198)
> > >> > > > at
> > >> > > > org.apache.kafka.streams.state.internals.RocksDBStore.
> > >> > > openDB(RocksDBStore.java:165)
> > >> > > > at
> > >> > > >
> > >> > org.apache.kafka.streams.state.internals.Segment.
> > openDB(Segment.java:40)
> > >> > > > at
> > >> > > > org.apache.kafka.streams.state.internals.Segments.
> > >> > > getOrCreateSegment(Segments.java:86)
> > >> > > > at
> > >> > > >
> > >> > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.
> > put(
> > >> > > RocksDBSegmentedBytesStore.java:81)
> > >> > > > at
> > >> > > >
> > >> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore$1.
> > >> > > restore(RocksDBSegmentedBytesStore.java:113)
> > >> > > > at
> > >> > > > org.apache.kafka.streams.proce

Re: RocksDB error

2017-08-16 Thread Damian Guy
I believe it is related to a bug in the state directory cleanup. This has
been fixed on trunk and also on the 0.11 branch (will be part of 0.11.0.1
that will hopefully be released soon). The fix is in this JIRA:
https://issues.apache.org/jira/browse/KAFKA-5562

To work around it you should set
StreamsConfig.STATE_CLEANUP_DELAY_MS_CONFIG to Long.MAX_VALUE, i.e.,
disable it.

Thanks,
Damian


On Wed, 16 Aug 2017 at 07:39 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> Hi Damian,
>
> Please find the relevant logs as requested. Let me know if you need any
> more info.
>
> -Sameer.
>
> On Tue, Aug 15, 2017 at 9:33 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Sameer, the log you attached doesn't contain the logs *before* the
>
>
>> exception happened.
>>
>> On Tue, 15 Aug 2017 at 06:13 Sameer Kumar <sam.kum.w...@gmail.com> wrote:
>>
>> > I have added a attachement containing complete trace in my initial mail.
>> >
>> > On Mon, Aug 14, 2017 at 9:47 PM, Damian Guy <damian@gmail.com>
>> wrote:
>> >
>> > > Do you have the logs leading up to the exception?
>> > >
>> > > On Mon, 14 Aug 2017 at 06:52 Sameer Kumar <sam.kum.w...@gmail.com>
>> > wrote:
>> > >
>> > > > Exception while doing the join, cant decipher more on this. Has
>> anyone
>> > > > faced it. complete exception trace attached.
>> > > >
>> > > > 2017-08-14 11:15:55 ERROR ConsumerCoordinator:269 - User provided
>> > > listener
>> > > > org.apache.kafka.streams.processor.internals.
>> > > StreamThread$RebalanceListener
>> > > > for group c-7-a34 failed on partition assignment
>> > > > org.apache.kafka.streams.errors.ProcessorStateException: Error
>> opening
>> > > > store KSTREAM-JOINTHIS-18-store-201708140520 at location
>> > > > /data/streampoc/c-7-a34/0_4/KSTREAM-JOINTHIS-18-
>> > > store/KSTREAM-JOINTHIS-18-store-201708140520
>> > > > at
>> > > > org.apache.kafka.streams.state.internals.RocksDBStore.
>> > > openDB(RocksDBStore.java:198)
>> > > > at
>> > > > org.apache.kafka.streams.state.internals.RocksDBStore.
>> > > openDB(RocksDBStore.java:165)
>> > > > at
>> > > >
>> > org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40)
>> > > > at
>> > > > org.apache.kafka.streams.state.internals.Segments.
>> > > getOrCreateSegment(Segments.java:86)
>> > > > at
>> > > >
>> > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(
>> > > RocksDBSegmentedBytesStore.java:81)
>> > > > at
>> > > >
>> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore$1.
>> > > restore(RocksDBSegmentedBytesStore.java:113)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.StateRestorer.restore(
>> > > StateRestorer.java:55)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.StoreChangelogReader.
>> > > processNext(StoreChangelogReader.java:216)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.StoreChangelogReader.
>> > > restorePartition(StoreChangelogReader.java:186)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.
>> > > StoreChangelogReader.restore(StoreChangelogReader.java:151)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.StreamThread$
>> > > RebalanceListener.onPartitionsAssigned(StreamThread.java:184)
>> > > > at
>> > > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
>> > > onJoinComplete(ConsumerCoordinator.java:265)
>> > > > at
>> > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> > > joinGroupIfNeeded(AbstractCoordinator.java:363)
>> > > > at
>> > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> > > ensureActiveGroup(AbstractCoordinator.java:310)
>> > > > at
>> > > >
>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
>> > > ConsumerCoordinator.java:297)
>> > > > at
>> > > > org.apache.kafka.clients.consumer.KafkaConsumer.
>> > > pollOnce(KafkaConsumer.java:1078)
>> > > > at
>> > > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(
>> > > KafkaConsumer.java:1043)
>> > > > at
>> > > >
>> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(
>> > > StreamThread.java:582)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> > > StreamThread.java:553)
>> > > > at
>> > > > org.apache.kafka.streams.processor.internals.
>> > > StreamThread.run(StreamThread.java:527)
>> > > > Caused by: org.rocksdb.RocksDBException:
>> > > > at org.rocksdb.RocksDB.open(Native Method)
>> > > > at org.rocksdb.RocksDB.open(RocksDB.java:231)
>> > > > at
>> > > > org.apache.kafka.streams.state.internals.RocksDBStore.
>> > > openDB(RocksDBStore.java:191)
>> > > > ... 19 more
>> > > >
>> > > > -sameer.
>> > > >
>> > >
>> >
>>
>


Re: RocksDB error

2017-08-15 Thread Damian Guy
Sameer, the log you attached doesn't contain the logs *before* the
exception happened.

On Tue, 15 Aug 2017 at 06:13 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> I have added a attachement containing complete trace in my initial mail.
>
> On Mon, Aug 14, 2017 at 9:47 PM, Damian Guy <damian@gmail.com> wrote:
>
> > Do you have the logs leading up to the exception?
> >
> > On Mon, 14 Aug 2017 at 06:52 Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
> >
> > > Exception while doing the join, cant decipher more on this. Has anyone
> > > faced it. complete exception trace attached.
> > >
> > > 2017-08-14 11:15:55 ERROR ConsumerCoordinator:269 - User provided
> > listener
> > > org.apache.kafka.streams.processor.internals.
> > StreamThread$RebalanceListener
> > > for group c-7-a34 failed on partition assignment
> > > org.apache.kafka.streams.errors.ProcessorStateException: Error opening
> > > store KSTREAM-JOINTHIS-18-store-201708140520 at location
> > > /data/streampoc/c-7-a34/0_4/KSTREAM-JOINTHIS-18-
> > store/KSTREAM-JOINTHIS-18-store-201708140520
> > > at
> > > org.apache.kafka.streams.state.internals.RocksDBStore.
> > openDB(RocksDBStore.java:198)
> > > at
> > > org.apache.kafka.streams.state.internals.RocksDBStore.
> > openDB(RocksDBStore.java:165)
> > > at
> > >
> org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40)
> > > at
> > > org.apache.kafka.streams.state.internals.Segments.
> > getOrCreateSegment(Segments.java:86)
> > > at
> > >
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(
> > RocksDBSegmentedBytesStore.java:81)
> > > at
> > > org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore$1.
> > restore(RocksDBSegmentedBytesStore.java:113)
> > > at
> > > org.apache.kafka.streams.processor.internals.StateRestorer.restore(
> > StateRestorer.java:55)
> > > at
> > > org.apache.kafka.streams.processor.internals.StoreChangelogReader.
> > processNext(StoreChangelogReader.java:216)
> > > at
> > > org.apache.kafka.streams.processor.internals.StoreChangelogReader.
> > restorePartition(StoreChangelogReader.java:186)
> > > at
> > > org.apache.kafka.streams.processor.internals.
> > StoreChangelogReader.restore(StoreChangelogReader.java:151)
> > > at
> > > org.apache.kafka.streams.processor.internals.StreamThread$
> > RebalanceListener.onPartitionsAssigned(StreamThread.java:184)
> > > at
> > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> > onJoinComplete(ConsumerCoordinator.java:265)
> > > at
> > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> > joinGroupIfNeeded(AbstractCoordinator.java:363)
> > > at
> > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> > ensureActiveGroup(AbstractCoordinator.java:310)
> > > at
> > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> > ConsumerCoordinator.java:297)
> > > at
> > > org.apache.kafka.clients.consumer.KafkaConsumer.
> > pollOnce(KafkaConsumer.java:1078)
> > > at
> > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> > KafkaConsumer.java:1043)
> > > at
> > > org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(
> > StreamThread.java:582)
> > > at
> > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> > StreamThread.java:553)
> > > at
> > > org.apache.kafka.streams.processor.internals.
> > StreamThread.run(StreamThread.java:527)
> > > Caused by: org.rocksdb.RocksDBException:
> > > at org.rocksdb.RocksDB.open(Native Method)
> > > at org.rocksdb.RocksDB.open(RocksDB.java:231)
> > > at
> > > org.apache.kafka.streams.state.internals.RocksDBStore.
> > openDB(RocksDBStore.java:191)
> > > ... 19 more
> > >
> > > -sameer.
> > >
> >
>


Re: RocksDB error

2017-08-14 Thread Damian Guy
Do you have the logs leading up to the exception?

On Mon, 14 Aug 2017 at 06:52 Sameer Kumar  wrote:

> Exception while doing the join, cant decipher more on this. Has anyone
> faced it. complete exception trace attached.
>
> 2017-08-14 11:15:55 ERROR ConsumerCoordinator:269 - User provided listener
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener
> for group c-7-a34 failed on partition assignment
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening
> store KSTREAM-JOINTHIS-18-store-201708140520 at location
> /data/streampoc/c-7-a34/0_4/KSTREAM-JOINTHIS-18-store/KSTREAM-JOINTHIS-18-store-201708140520
> at
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:198)
> at
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:165)
> at
> org.apache.kafka.streams.state.internals.Segment.openDB(Segment.java:40)
> at
> org.apache.kafka.streams.state.internals.Segments.getOrCreateSegment(Segments.java:86)
> at
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:81)
> at
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore$1.restore(RocksDBSegmentedBytesStore.java:113)
> at
> org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:55)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:216)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restorePartition(StoreChangelogReader.java:186)
> at
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:151)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:184)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
> Caused by: org.rocksdb.RocksDBException:
> at org.rocksdb.RocksDB.open(Native Method)
> at org.rocksdb.RocksDB.open(RocksDB.java:231)
> at
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:191)
> ... 19 more
>
> -sameer.
>


Re: Kafka Streams Job | DirectoryNotEmptyException

2017-08-09 Thread Damian Guy
The issue was fixed by this:
https://issues.apache.org/jira/browse/KAFKA-5562 it is on trunk, but will
likely be back ported to 0.11

On Wed, 9 Aug 2017 at 10:57 Damian Guy <damian@gmail.com> wrote:

> Hi,
>
> This is a bug in 0.11. You can work around it by setting
> StreamsConfig.STATE_DIR_CLEANUP_DELAY_MS_CONFIG to Long.MAX_VALUE
>
> Also, if you have logs it would be easier to either attach them or put
> them in a gist. It is a bit hard to read in an email.
>
> Thanks,
> Damian
>
> On Wed, 9 Aug 2017 at 10:10 Sameer Kumar <sam.kum.w...@gmail.com> wrote:
>
>> Hi All,
>>
>> I wrote a Kafka Streams job that went running for 3-4 hrs, after which it
>> started throwing these errors.Not sure why we got these errors.
>>
>> I am using Kafka 11.0 both on broker side as well as on consumer side.
>>
>> *Machine1*
>>
>> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-7]
>> partition
>> assignment took 1003 ms.
>> current active tasks: [0_24, 0_9]
>> current standby tasks: []
>> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] Starting
>> restoring state stores from changelog topics []
>> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] Adding
>> assigned standby tasks {}
>> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] State
>> transition from ASSIGNING_PARTITIONS to RUNNING.
>> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6]
>> partition
>> assignment took 1011 ms.
>> current active tasks: [0_22, 0_7]
>> current standby tasks: []
>> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] Starting
>> restoring state stores from changelog topics []
>> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] Adding
>> assigned standby tasks {}
>> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] State
>> transition from ASSIGNING_PARTITIONS to RUNNING.
>> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4]
>> partition
>> assignment took 1015 ms.
>> current active tasks: [0_19, 0_4]
>> current standby tasks: []
>> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11]
>> Starting
>> restoring state stores from changelog topics []
>> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11] Adding
>> assigned standby tasks {}
>> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11] State
>> transition from ASSIGNING_PARTITIONS to RUNNING.
>> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11]
>> partition assignment took 1043 ms.
>> current active tasks: [0_20, 0_5]
>> current standby tasks: []
>> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10]
>> Starting
>> restoring state stores from changelog topics []
>> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10] Adding
>> assigned standby tasks {}
>> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10] State
>> transition from ASSIGNING_PARTITIONS to RUNNING.
>> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10]
>> partition assignment took 1044 ms.
>> current active tasks: [0_26, 0_11]
>> current standby tasks: []
>> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
>> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-13]
>> Starting
>> restoring state stores from changelog topics []
>> 2017-08-08 1

Re: Kafka Streams Job | DirectoryNotEmptyException

2017-08-09 Thread Damian Guy
Hi,

This is a bug in 0.11. You can work around it by setting
StreamsConfig.STATE_DIR_CLEANUP_DELAY_MS_CONFIG to Long.MAX_VALUE

Also, if you have logs it would be easier to either attach them or put them
in a gist. It is a bit hard to read in an email.

Thanks,
Damian

On Wed, 9 Aug 2017 at 10:10 Sameer Kumar  wrote:

> Hi All,
>
> I wrote a Kafka Streams job that went running for 3-4 hrs, after which it
> started throwing these errors.Not sure why we got these errors.
>
> I am using Kafka 11.0 both on broker side as well as on consumer side.
>
> *Machine1*
>
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-7] partition
> assignment took 1003 ms.
> current active tasks: [0_24, 0_9]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] Adding
> assigned standby tasks {}
> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] State
> transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-6] partition
> assignment took 1011 ms.
> current active tasks: [0_22, 0_7]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] Adding
> assigned standby tasks {}
> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] State
> transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-4] partition
> assignment took 1015 ms.
> current active tasks: [0_19, 0_4]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11] Adding
> assigned standby tasks {}
> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11] State
> transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-11]
> partition assignment took 1043 ms.
> current active tasks: [0_20, 0_5]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10] Adding
> assigned standby tasks {}
> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10] State
> transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-10]
> partition assignment took 1044 ms.
> current active tasks: [0_26, 0_11]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-13] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-13] Adding
> assigned standby tasks {}
> 2017-08-08 17:25:35 INFO  StreamThread:980 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-13] State
> transition from ASSIGNING_PARTITIONS to RUNNING.
> 2017-08-08 17:25:35 INFO  StreamThread:193 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-13]
> partition assignment took 1053 ms.
> current active tasks: [0_25, 0_10]
> current standby tasks: []
> 2017-08-08 17:25:35 INFO  StoreChangelogReader:121 - stream-thread
> [LICSp-7-a20-f9405688-4aba-4971-9f5e-b029756dcdce-StreamThread-12] Starting
> restoring state stores from changelog topics []
> 2017-08-08 17:25:35 INFO  StreamThread:1345 - stream-thread
> 

Re: [kafka streams] 'null' values in state stores

2017-08-08 Thread Damian Guy
The change logger is not used during restoration of the in-memory-store.
Restoration is handled
https://github.com/apache/kafka/blob/0.11.0/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStore.java#L79

But, even then it is just putting `null` when it should be deleting it.
Feel free to raise a JIRA
Thanks,
Damian

On Tue, 8 Aug 2017 at 12:09 Bart Vercammen <b...@cloutrix.com> wrote:

> That's RocksDB .. I'm using in-memory stores ...
> here:
>
> https://github.com/apache/kafka/blob/0.11.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ChangeLoggingKeyValueBytesStore.java#L56
> the 'null' is not checked ...
>
> On Tue, Aug 8, 2017 at 12:52 PM, Damian Guy <damian@gmail.com> wrote:
>
> > Hi,
> > The null values are treated as deletes when they are written to the
> store.
> > You can see here:
> > https://github.com/apache/kafka/blob/0.11.0/streams/src/
> > main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L261
> >
> > On Tue, 8 Aug 2017 at 11:22 Bart Vercammen <b...@cloutrix.com> wrote:
> >
> > > Hi,
> > >
> > > I noticed the following:
> > > When a kafka streams application starts, it will restore its state in
> its
> > > state-stores (from the log-compacted kafka topic).  All good so far,
> but
> > I
> > > noticed that the 'deleted' entries are actually read in into the store
> as
> > > 'key' with value:`null`
> > >
> > > Is this expected behaviour?  I would assume that 'null' values are
> > ignored
> > > when restoring the state as this is exactly how the entries are deleted
> > on
> > > the log-compacted kafka-topic.
> > >
> > > When the compaction has run on the kafka topic, all is fine, but when
> the
> > > segment is not compacted yet, these null values are read in.
> > >
> > > Greets,
> > > Bart
> > >
> >
>
>
>
> --
> Mvg,
> Bart Vercammen
>
>
> clouTrix BVBA
> +32 486 69 17 68 <+32%20486%2069%2017%2068>
> i...@cloutrix.com
>


Re: [kafka streams] 'null' values in state stores

2017-08-08 Thread Damian Guy
Hi,
The null values are treated as deletes when they are written to the store.
You can see here:
https://github.com/apache/kafka/blob/0.11.0/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L261

On Tue, 8 Aug 2017 at 11:22 Bart Vercammen  wrote:

> Hi,
>
> I noticed the following:
> When a kafka streams application starts, it will restore its state in its
> state-stores (from the log-compacted kafka topic).  All good so far, but I
> noticed that the 'deleted' entries are actually read in into the store as
> 'key' with value:`null`
>
> Is this expected behaviour?  I would assume that 'null' values are ignored
> when restoring the state as this is exactly how the entries are deleted on
> the log-compacted kafka-topic.
>
> When the compaction has run on the kafka topic, all is fine, but when the
> segment is not compacted yet, these null values are read in.
>
> Greets,
> Bart
>


Re: Kafka streams regex match

2017-08-08 Thread Damian Guy
Hi Shekar, that warning is expected during rebalances and should generally
resolve itself.
How many threads/app instances are you running?
It is impossible to tell what is happening with the full logs.

Thanks,
Damian

On Mon, 7 Aug 2017 at 22:46 Shekar Tippur <ctip...@gmail.com> wrote:

> Damien,
>
> Thanks for pointing out the error. I had tried a different version of
> initializing the store.
>
> Now that I am able to compile, I started to get the below error. I looked
> up other suggestions for the same error and followed up to upgrade Kafka to
> 0.11.0.0 version. I still get this error :/
>
> [2017-08-07 14:40:41,264] WARN stream-thread
> [streams-pipe-b67a7ffa-5535-4311-8886-ad6362617dc5-StreamThread-1] Could
> not create task 0_0. Will retry:
> (org.apache.kafka.streams.processor.internals.StreamThread)
>
> org.apache.kafka.streams.errors.LockException: task [0_0] Failed to lock
> the state directory for task 0_0
>
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:99)
>
> at
>
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:80)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:111)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>
> at
>
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>
> at
>
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>
> at
>
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>
> at
>
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>
> On Fri, Aug 4, 2017 at 4:16 PM, Shekar Tippur <ctip...@gmail.com> wrote:
>
> > Damian,
> >
> > I am getting a syntax error. I have responded on gist.
> > Appreciate any inputs.
> >
> > - Shekar
> >
> > On Sat, Jul 29, 2017 at 1:57 AM, Damian Guy <damian@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> I left a comment on your gist.
> >>
> >> Thanks,
> >> Damian
> >>
> >> On Fri, 28 Jul 2017 at 21:50 Shekar Tippur <ctip...@gmail.com> wrote:
> >>
> >> > Damien,
> >> >
> >> > Here is a public gist:
> >> > https://gist.github.com/ctippur/9f0900b1719793d0c67f5bb143d16ec8
> >> >
> >> > - Shekar
> >> >
> >> > On Fri, Jul 28, 2017 at 11:45 AM, Damian Guy <damian@gmail.com>
> >> wrote:
> >> >
> >> > > It might be easier if you make a github gist with your code. It is
> >> quite
> >> > > difficult to see what is happening in an email.
> >> > >
> >> > > Cheers,
> >> > > Damian
> >> > > On Fri, 28 Jul 2017 at 19:22, Shekar Tippur <ctip...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Thanks a lot Damien.
> >> > > > I am able to get to see if the join worked (using foreach). I
> tried
> >> to
> >> > > add
> >> > > > the logic to query the store after starting the streams:
> >> > > > Looks like the code is not getting there. Here is the modified
> code:
> >> > > >

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-08-02 Thread Damian Guy
Hi Jan,

Thanks for taking the time to put this together, appreciated. For the
benefit of others would you mind explaining a bit about your motivation?

Cheers,
Damian

On Wed, 2 Aug 2017 at 01:40 Jan Filipiak <jan.filip...@trivago.com> wrote:

> Hi all,
>
> after some further discussions, the best thing to show my Idea of how it
> should evolve would be a bigger mock/interface description.
> The goal is to reduce the store maintaining processors to only the
> Aggregators + and KTableSource. While having KTableSource optionally
> materialized.
>
> Introducing KTable:copy() will allow users to maintain state twice if
> they really want to. KStream::join*() wasn't touched. I never personally
> used that so I didn't feel
> comfortable enough touching it. Currently still making up my mind. None
> of the suggestions made it querieable so far. Gouzhangs 'Buffered' idea
> seems ideal here.
>
> please have a look. Looking forward for your opinions.
>
> Best Jan
>
>
>
> On 21.06.2017 17:24, Eno Thereska wrote:
> > (cc’ing user-list too)
> >
> > Given that we already have StateStoreSuppliers that are configurable
> using the fluent-like API, probably it’s worth discussing the other
> examples with joins and serdes first since those have many overloads and
> are in need of some TLC.
> >
> > So following your example, I guess you’d have something like:
> > .join()
> > .withKeySerdes(…)
> > .withValueSerdes(…)
> > .withJoinType(“outer”)
> >
> > etc?
> >
> > I like the approach since it still remains declarative and it’d reduce
> the number of overloads by quite a bit.
> >
> > Eno
> >
> >> On Jun 21, 2017, at 3:37 PM, Damian Guy <damian@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'd like to get a discussion going around some of the API choices we've
> >> made in the DLS. In particular those that relate to stateful operations
> >> (though this could expand).
> >> As it stands we lean heavily on overloaded methods in the API, i.e,
> there
> >> are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> >> feel it is only going to get worse as we add more optional params. In
> >> particular we've had some requests to be able to turn caching off, or
> >> change log configs,  on a per operator basis (note this can be done now
> if
> >> you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> >>
> >> So this is a bit of an open question. How can we change the DSL
> overloads
> >> so that it flows, is simple to use and understand, and is easily
> extended
> >> in the future?
> >>
> >> One option would be to use a fluent API approach for providing the
> optional
> >> params, so something like this:
> >>
> >> groupedStream.count()
> >>.withStoreName("name")
> >>.withCachingEnabled(false)
> >>.withLoggingEnabled(config)
> >>.table()
> >>
> >>
> >>
> >> Another option would be to provide a Builder to the count method, so it
> >> would look something like this:
> >> groupedStream.count(new
> >> CountBuilder("storeName").withCachingEnabled(false).build())
> >>
> >> Another option is to say: Hey we don't need this, what are you on about!
> >>
> >> The above has focussed on state store related overloads, but the same
> ideas
> >> could  be applied to joins etc, where we presently have many join
> methods
> >> and many overloads.
> >>
> >> Anyway, i look forward to hearing your opinions.
> >>
> >> Thanks,
> >> Damian
>
>


Re: Kafka Streams Application crashing on Rebalance

2017-08-01 Thread Damian Guy
Hi, Yes the issue is in 0.10.2 also.

On Tue, 1 Aug 2017 at 17:37 Eric Lalonde <e...@autonomic.ai> wrote:

>
> > On Aug 1, 2017, at 8:00 AM, Damian Guy <damian@gmail.com> wrote:
> >
> > It is a bug in 0.10.2 or lower. It has been fixed in 0.11 by
> > https://issues.apache.org/jira/browse/KAFKA-4494
>
> Hi Damien, the Affects Version is set to 0.10.1.0 in KAFKA-4494. Is the
> issue in 0.10.2.0 as well?


Re: Kafka Streams Application crashing on Rebalance

2017-08-01 Thread Damian Guy
It is a bug in 0.10.2 or lower. It has been fixed in 0.11 by
https://issues.apache.org/jira/browse/KAFKA-4494

On Tue, 1 Aug 2017 at 15:40 Marcus Clendenin  wrote:

> Hi All,
>
>
>
> I have a kafka streams application that is doing a join between a KTable
> and a KStream and it seems that after it starts loading the KTable if I
> either restart the application or start a new jar with the same
> application-id it starts failing. It looks like when it tries to rejoin the
> application-id and do a rebalance of the partitions it throws an error
> regarding a null value coming from RocksDB. Any thoughts on where this is
> coming from? I am running this inside of a docker container if that affects
> anything but the RocksDB folder is mounted as a volume on the host machine.
>
>
>
>
>
> Stacktrace:
>
>
>
> 2017-08-01 13:31:50,309 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] Starting
>
> 2017-08-01 13:31:50,379 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.c.consumer.internals.AbstractCoordinator  - Discovered coordinator
> .com:9092 (id: 2147483535 <(214)%20748-3535> rack: null) for group
> test-application-id.
>
> 2017-08-01 13:31:50,386 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.c.consumer.internals.ConsumerCoordinator  - Revoking previously
> assigned partitions [] for group test-application-id
>
> 2017-08-01 13:31:50,386 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] at state RUNNING: partitions [] revoked at the beginning
> of consumer rebalance.
>
> 2017-08-01 13:31:50,387 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] State transition from RUNNING to PARTITIONS_REVOKED.
>
> 2017-08-01 13:31:50,387 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams org.apache.kafka.streams.KafkaStreams
> - stream-client [test-application-id-67f96d6e-d1fd-4f31-8ec4-45e82a9cf01c]
> State transition from RUNNING to REBALANCING.
>
> 2017-08-01 13:31:50,388 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] Updating suspended tasks to contain active tasks []
>
> 2017-08-01 13:31:50,388 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] Removing all active tasks []
>
> 2017-08-01 13:31:50,388 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.kafka.streams.processor.internals.StreamThread  - stream-thread
> [StreamThread-1] Removing all standby tasks []
>
> 2017-08-01 13:31:50,389 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.c.consumer.internals.AbstractCoordinator  - (Re-)joining group
> test-application-id
>
> 2017-08-01 13:31:50,416 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.s.p.internals.StreamPartitionAssignor  - stream-thread
> [StreamThread-1] Constructed client metadata
> {67f96d6e-d1fd-4f31-8ec4-45e82a9cf01c=ClientMetadata{hostInfo=null,
>
> consumers=[test-application-id-67f96d6e-d1fd-4f31-8ec4-45e82a9cf01c-StreamThread-1-consumer-f6ed6af8-0aee-4d2e-92a9-00955f7b3441],
> state=[activeTasks: ([]) assignedTasks: ([]) prevActiveTasks: ([])
> prevAssignedTasks: ([]) capacity: 1.0 cost: 0.0]}} from the member
> subscriptions.
>
> 2017-08-01 13:31:50,417 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.s.p.internals.StreamPartitionAssignor  - stream-thread
> [StreamThread-1] Completed validating internal topics in partition assignor
>
> 2017-08-01 13:31:50,417 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.s.p.internals.StreamPartitionAssignor  - stream-thread
> [StreamThread-1] Completed validating internal topics in partition assignor
>
> 2017-08-01 13:31:50,419 trackingId=X thread=[StreamThread-1] logType=INFO
>
> module=kafka.streams
> o.a.k.s.p.internals.StreamPartitionAssignor  - stream-thread
> [StreamThread-1] Assigned tasks to clients as
> {67f96d6e-d1fd-4f31-8ec4-45e82a9cf01c=[activeTasks: ([0_0, 0_1, 0_2, 0_3,
> 0_4, 0_5]) assignedTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 0_5]) prevActiveTasks:
> ([]) prevAssignedTasks: ([]) capacity: 1.0 cost: 3.0]}.
>
> 2017-08-01 13:31:50,429 trackingId=X thread=[StreamThread-1] logType=INFO
>
> 

Re: Monitor all stream consumers for lag

2017-08-01 Thread Damian Guy
Hi Garrett,

The global state store doesn't use consumer groups and doesn't commit
offsets. The offsets are checkpointed to local disk, so they won't show up
with the ConsumerGroupCommand.

That said it would be useful to see the lag, so maybe raise a JIRA for it?

Thanks,
Damian

On Tue, 1 Aug 2017 at 15:06 Garrett Barton  wrote:

> I have a simple stream setup which reads a source topic and forks to an
> aggregation with its own statestore, and a flatmap().to("topic1") and that
> topic is read in to a global state store.
>
> I use ConsumerGroupCommand to query for the lag of each consumer on the
> topics.
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala
>
> It seems like ConsumerGroupCommand only shows some consumers, but not all.
> I can see the consumer for the original source topic, but I don't see one
> for 'topic1', yet the globalstatestore is populated.
>
> How can I see the lag of the globalstatestore consumer?
>


Re: Kafka streams store migration - best practices

2017-08-01 Thread Damian Guy
No you don't need to set a listener. Was just mentioning as it an option if
you wan't to know that the metadata needs refreshing,

On Tue, 1 Aug 2017 at 13:25 Debasish Ghosh <ghosh.debas...@gmail.com> wrote:

> Regarding the last point, do I need to set up the listener ?
>
> All I want is to do a query from the store. For that I need to invoke 
> streams.store()
> first, which can potentially throw an InvalidStateStoreException during
> rebalancing / migration of stores. If I call streams.store() with retries
> till the rebalancing is done or I exceed some max retry count, then I think
> I should good.
>
> Or am I missing something ?
>
> regards.
>
> On Tue, Aug 1, 2017 at 1:10 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Hi,
>>
>> On Tue, 1 Aug 2017 at 08:34 Debasish Ghosh <ghosh.debas...@gmail.com>
>> wrote:
>>
>>> Hi -
>>>
>>> I have a Kafka Streams application that needs to run on multiple
>>> instances.
>>> It fetches metadata from all local stores and has an http query layer for
>>> interactive queries. In some cases when I have new instances deployed,
>>> store migration takes place making the current metadata invalid. Here are
>>> my questions regarding some of the best practices to be followed to
>>> handle
>>> this issue of store migration -
>>>
>>>- When the migration is in process, a query for the metadata may
>>> result
>>>in InvalidStateStoreException - is it a good practice to always have a
>>>retry semantics based query for the metadata ?
>>>
>>
>> Yes. Whenever the application is rebalancing the stores will be
>> unavailable, so retrying is the right thing to do.
>>
>>
>>>- Should I check KafkaStreams.state() and only assume that I have got
>>>the correct metadata when the state() call returns Running. If it
>>>returns Rebalancing, then I should re-query. Is this correct approach
>>> ?
>>>
>>
>> Correct again! If the state is rebalancing, then the metadata (for some
>> stores at least) is going to change, so you should get it again. You can
>> set a StateListener on the KafkaStreams instance to listen to these events.
>>
>>
>>>
>>> regards.
>>>
>>> --
>>> Debasish Ghosh
>>> http://manning.com/ghosh2
>>> http://manning.com/ghosh
>>>
>>> Twttr: @debasishg
>>> Blog: http://debasishg.blogspot.com
>>> Code: http://github.com/debasishg
>>>
>>
>
>
> --
> Debasish Ghosh
> http://manning.com/ghosh2
> http://manning.com/ghosh
>
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com
> Code: http://github.com/debasishg
>


Re: Kafka streams store migration - best practices

2017-08-01 Thread Damian Guy
Hi,

On Tue, 1 Aug 2017 at 08:34 Debasish Ghosh  wrote:

> Hi -
>
> I have a Kafka Streams application that needs to run on multiple instances.
> It fetches metadata from all local stores and has an http query layer for
> interactive queries. In some cases when I have new instances deployed,
> store migration takes place making the current metadata invalid. Here are
> my questions regarding some of the best practices to be followed to handle
> this issue of store migration -
>
>- When the migration is in process, a query for the metadata may result
>in InvalidStateStoreException - is it a good practice to always have a
>retry semantics based query for the metadata ?
>

Yes. Whenever the application is rebalancing the stores will be
unavailable, so retrying is the right thing to do.


>- Should I check KafkaStreams.state() and only assume that I have got
>the correct metadata when the state() call returns Running. If it
>returns Rebalancing, then I should re-query. Is this correct approach ?
>

Correct again! If the state is rebalancing, then the metadata (for some
stores at least) is going to change, so you should get it again. You can
set a StateListener on the KafkaStreams instance to listen to these events.


>
> regards.
>
> --
> Debasish Ghosh
> http://manning.com/ghosh2
> http://manning.com/ghosh
>
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com
> Code: http://github.com/debasishg
>


Re: Kafka streams regex match

2017-07-29 Thread Damian Guy
Hi,

I left a comment on your gist.

Thanks,
Damian

On Fri, 28 Jul 2017 at 21:50 Shekar Tippur <ctip...@gmail.com> wrote:

> Damien,
>
> Here is a public gist:
> https://gist.github.com/ctippur/9f0900b1719793d0c67f5bb143d16ec8
>
> - Shekar
>
> On Fri, Jul 28, 2017 at 11:45 AM, Damian Guy <damian@gmail.com> wrote:
>
> > It might be easier if you make a github gist with your code. It is quite
> > difficult to see what is happening in an email.
> >
> > Cheers,
> > Damian
> > On Fri, 28 Jul 2017 at 19:22, Shekar Tippur <ctip...@gmail.com> wrote:
> >
> > > Thanks a lot Damien.
> > > I am able to get to see if the join worked (using foreach). I tried to
> > add
> > > the logic to query the store after starting the streams:
> > > Looks like the code is not getting there. Here is the modified code:
> > >
> > > KafkaStreams streams = new KafkaStreams(builder, props);
> > >
> > > streams.start();
> > >
> > >
> > > parser.foreach(new ForeachAction<String, JsonNode>() {
> > > @Override
> > > public void apply(String key, JsonNode value) {
> > > System.out.println(key + ": " + value);
> > > if (value == null){
> > > System.out.println("null match");
> > > ReadOnlyKeyValueStore<String, Long> keyValueStore =
> > > null;
> > > try {
> > > keyValueStore =
> > > IntegrationTestUtils.waitUntilStoreIsQueryable("local-store",
> > > QueryableStoreTypes.keyValueStore(), streams);
> > > } catch (InterruptedException e) {
> > > e.printStackTrace();
> > > }
> > >
> > > KeyValueIterator  kviterator =
> > > keyValueStore.range("test_nod","test_node");
> > > }
> > > }
> > > });
> > >
> > >
> > > On Fri, Jul 28, 2017 at 12:52 AM, Damian Guy <damian@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > The store won't be queryable until after you have called
> > streams.start().
> > > > No stores have been created until the application is up and running
> and
> > > > they are dependent on the underlying partitions.
> > > >
> > > > To check that a stateful operation has produced a result you would
> > > normally
> > > > add another operation after the join, i.e.,
> > > > stream.join(other,...).foreach(..) or stream.join(other,...).to("
> > topic")
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Thu, 27 Jul 2017 at 22:52 Shekar Tippur <ctip...@gmail.com>
> wrote:
> > > >
> > > > > One more thing.. How do we check if the stateful join operation
> > > resulted
> > > > in
> > > > > a kstream of some value in it (size of kstream)? How do we check
> the
> > > > > content of a kstream?
> > > > >
> > > > > - S
> > > > >
> > > > > On Thu, Jul 27, 2017 at 2:06 PM, Shekar Tippur <ctip...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Damien,
> > > > > >
> > > > > > Thanks a lot for pointing out.
> > > > > >
> > > > > > I got a little further. I am kind of stuck with the sequencing.
> > > Couple
> > > > of
> > > > > > issues:
> > > > > > 1. I cannot initialise KafkaStreams before the parser.to().
> > > > > > 2. Do I need to create a new KafkaStreams object when I create a
> > > > > > KeyValueStore?
> > > > > > 3. How do I initialize KeyValueIterator with <String, JsonNode> I
> > > seem
> > > > to
> > > > > > get a error when I try:
> > > > > > *KeyValueIterator <String,JsonNode> kviterator
> > > > > > = keyValueStore.range("test_nod","test_node");*
> > > > > >
> > > > > > /// START CODE /
> > > > > > //parser is a kstream as a result of join
> > > > > > if (parser.toString().matches("null")){
> > > > > >
> > > > > > ReadOnlyKeyValueStore<String, Long> keyValueStore =
> > > > > >  

Re: Kafka streams regex match

2017-07-28 Thread Damian Guy
It might be easier if you make a github gist with your code. It is quite
difficult to see what is happening in an email.

Cheers,
Damian
On Fri, 28 Jul 2017 at 19:22, Shekar Tippur <ctip...@gmail.com> wrote:

> Thanks a lot Damien.
> I am able to get to see if the join worked (using foreach). I tried to add
> the logic to query the store after starting the streams:
> Looks like the code is not getting there. Here is the modified code:
>
> KafkaStreams streams = new KafkaStreams(builder, props);
>
> streams.start();
>
>
> parser.foreach(new ForeachAction<String, JsonNode>() {
> @Override
> public void apply(String key, JsonNode value) {
> System.out.println(key + ": " + value);
> if (value == null){
> System.out.println("null match");
> ReadOnlyKeyValueStore<String, Long> keyValueStore =
> null;
> try {
> keyValueStore =
> IntegrationTestUtils.waitUntilStoreIsQueryable("local-store",
> QueryableStoreTypes.keyValueStore(), streams);
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
>
> KeyValueIterator  kviterator =
> keyValueStore.range("test_nod","test_node");
> }
> }
> });
>
>
> On Fri, Jul 28, 2017 at 12:52 AM, Damian Guy <damian@gmail.com> wrote:
>
> > Hi,
> > The store won't be queryable until after you have called streams.start().
> > No stores have been created until the application is up and running and
> > they are dependent on the underlying partitions.
> >
> > To check that a stateful operation has produced a result you would
> normally
> > add another operation after the join, i.e.,
> > stream.join(other,...).foreach(..) or stream.join(other,...).to("topic")
> >
> > Thanks,
> > Damian
> >
> > On Thu, 27 Jul 2017 at 22:52 Shekar Tippur <ctip...@gmail.com> wrote:
> >
> > > One more thing.. How do we check if the stateful join operation
> resulted
> > in
> > > a kstream of some value in it (size of kstream)? How do we check the
> > > content of a kstream?
> > >
> > > - S
> > >
> > > On Thu, Jul 27, 2017 at 2:06 PM, Shekar Tippur <ctip...@gmail.com>
> > wrote:
> > >
> > > > Damien,
> > > >
> > > > Thanks a lot for pointing out.
> > > >
> > > > I got a little further. I am kind of stuck with the sequencing.
> Couple
> > of
> > > > issues:
> > > > 1. I cannot initialise KafkaStreams before the parser.to().
> > > > 2. Do I need to create a new KafkaStreams object when I create a
> > > > KeyValueStore?
> > > > 3. How do I initialize KeyValueIterator with <String, JsonNode> I
> seem
> > to
> > > > get a error when I try:
> > > > *KeyValueIterator <String,JsonNode> kviterator
> > > > = keyValueStore.range("test_nod","test_node");*
> > > >
> > > > /// START CODE /
> > > > //parser is a kstream as a result of join
> > > > if (parser.toString().matches("null")){
> > > >
> > > > ReadOnlyKeyValueStore<String, Long> keyValueStore =
> > > > null;
> > > > KafkaStreams newstreams = new KafkaStreams(builder, props);
> > > >     try {
> > > > keyValueStore =
> > > IntegrationTestUtils.waitUntilStoreIsQueryable("local-store",
> > > > QueryableStoreTypes.keyValueStore(), newstreams);
> > > > } catch (InterruptedException e) {
> > > > e.printStackTrace();
> > > > }
> > > > *KeyValueIterator kviterator
> > > > = keyValueStore.range("test_nod","test_node");*
> > > > }else {
> > > >
> > > > *    parser.to <http://parser.to>(stringSerde, jsonSerde,
> "parser");*}
> > > >
> > > > *KafkaStreams streams = new KafkaStreams(builder, props);*
> > > > streams.start();
> > > >
> > > > /// END CODE /
> > > >
> > > > - S
> > > >
> > > >
> > > >
> > > > On Thu, Jul 27, 2017 at 10:05 AM, Damian Guy <damian@gmail.com>
> > > wrote:
> > > > >
> > > > > It is part of the ReadOnlyKeyValueStore interface:
> > > > &

Re: Kafka Streams state store issue on cluster

2017-07-28 Thread Damian Guy
Hmmm, i'm not sure that is going to work as both nodes will have the same
setting for StreamsConfig.APPLICATION_SERVER_PORT, i.e, 0.0.0.0:7070

On Fri, 28 Jul 2017 at 16:02 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> The log file is a huge one. I can send it to you though. Before that let
> me confirm one point ..
>
> I set the APPLICATION_SERVER_CONFIG to
> s"${config.httpInterface}:${config.httpPort}". In my case the
> httpInterface is "0.0.0.0" and the port is set to 7070. Since the two
> instances start on different nodes, this should be ok - right ?
>
> regards.
>
> On Fri, Jul 28, 2017 at 8:18 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Do you have any logs that might help to work out what is going wrong?
>>
>> On Fri, 28 Jul 2017 at 14:16 Damian Guy <damian@gmail.com> wrote:
>>
>>> The config looks ok to me
>>>
>>> On Fri, 28 Jul 2017 at 13:24 Debasish Ghosh <ghosh.debas...@gmail.com>
>>> wrote:
>>>
>>>> I am setting APPLICATION_SERVER_CONFIG, which is possibly what u r
>>>> referring to. Just now I noticed that I may also need to set
>>>> REPLICATION_FACTOR_CONFIG, which needs to be set to 2 (default is 1).
>>>> Anything else that I may be missing ?
>>>>
>>>>
>>>> regards.
>>>>
>>>> On Fri, Jul 28, 2017 at 5:46 PM, Debasish Ghosh <
>>>> ghosh.debas...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Damien -
>>>> >
>>>> > I am not sure I understand what u mean .. I have the following set in
>>>> the
>>>> > application .. Do I need to set anything else at the host level ?
>>>> > Environment variable ?
>>>> >
>>>> > val streamingConfig = {
>>>> >   val settings = new Properties
>>>> >   settings.put(StreamsConfig.APPLICATION_ID_CONFIG,
>>>> > "kstream-weblog-processing")
>>>> >   settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
>>>> config.brokers)
>>>> >
>>>> >   config.schemaRegistryUrl.foreach{ url =>
>>>> >
>>>>  settings.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
>>>> > url)
>>>> >   }
>>>> >
>>>> >   settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
>>>> > Serdes.ByteArray.getClass.getName)
>>>> >   settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
>>>> > Serdes.String.getClass.getName)
>>>> >
>>>> >   // setting offset reset to earliest so that we can re-run the
>>>> demo
>>>> > code with the same pre-loaded data
>>>> >   // Note: To re-run the demo, you need to use the offset reset
>>>> tool:
>>>> >   // https://cwiki.apache.org/confluence/display/KAFKA/
>>>> > Kafka+Streams+Application+Reset+Tool
>>>> >   settings.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
>>>> "earliest")
>>>> >
>>>> >   // need this for query service
>>>> >   settings.put(StreamsConfig.APPLICATION_SERVER_CONFIG,
>>>> > s"${config.httpInterface}:${config.httpPort}")
>>>> >
>>>> >   // default is /tmp/kafka-streams
>>>> >   settings.put(StreamsConfig.STATE_DIR_CONFIG,
>>>> config.stateStoreDir)
>>>> >
>>>> >   // Set the commit interval to 500ms so that any changes are
>>>> flushed
>>>> > frequently and the summary
>>>> >   // data are updated with low latency.
>>>> >   settings.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "500")
>>>> >
>>>> >   settings
>>>> > }
>>>> >
>>>> > Please explain a bit ..
>>>> >
>>>> > regards.
>>>> >
>>>> >
>>>> > On Fri, Jul 28, 2017 at 5:36 PM, Damian Guy <damian@gmail.com>
>>>> wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> Do you have the application.server property set appropriately for
>>>> both
>>>> >> hosts?
>>>> >>
>>>> >> The second stack trace is this bug:
>>>> >> https://issues.apache.org/jira/browse/KAFKA-5556
>>>> &

Re: Kafka Streams state store issue on cluster

2017-07-28 Thread Damian Guy
Do you have any logs that might help to work out what is going wrong?

On Fri, 28 Jul 2017 at 14:16 Damian Guy <damian@gmail.com> wrote:

> The config looks ok to me
>
> On Fri, 28 Jul 2017 at 13:24 Debasish Ghosh <ghosh.debas...@gmail.com>
> wrote:
>
>> I am setting APPLICATION_SERVER_CONFIG, which is possibly what u r
>> referring to. Just now I noticed that I may also need to set
>> REPLICATION_FACTOR_CONFIG, which needs to be set to 2 (default is 1).
>> Anything else that I may be missing ?
>>
>>
>> regards.
>>
>> On Fri, Jul 28, 2017 at 5:46 PM, Debasish Ghosh <ghosh.debas...@gmail.com
>> >
>> wrote:
>>
>> > Hi Damien -
>> >
>> > I am not sure I understand what u mean .. I have the following set in
>> the
>> > application .. Do I need to set anything else at the host level ?
>> > Environment variable ?
>> >
>> > val streamingConfig = {
>> >   val settings = new Properties
>> >   settings.put(StreamsConfig.APPLICATION_ID_CONFIG,
>> > "kstream-weblog-processing")
>> >   settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
>> config.brokers)
>> >
>> >   config.schemaRegistryUrl.foreach{ url =>
>> >
>>  settings.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
>> > url)
>> >   }
>> >
>> >   settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
>> > Serdes.ByteArray.getClass.getName)
>> >   settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
>> > Serdes.String.getClass.getName)
>> >
>> >   // setting offset reset to earliest so that we can re-run the demo
>> > code with the same pre-loaded data
>> >   // Note: To re-run the demo, you need to use the offset reset
>> tool:
>> >   // https://cwiki.apache.org/confluence/display/KAFKA/
>> > Kafka+Streams+Application+Reset+Tool
>> >   settings.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
>> >
>> >   // need this for query service
>> >   settings.put(StreamsConfig.APPLICATION_SERVER_CONFIG,
>> > s"${config.httpInterface}:${config.httpPort}")
>> >
>> >   // default is /tmp/kafka-streams
>> >   settings.put(StreamsConfig.STATE_DIR_CONFIG, config.stateStoreDir)
>> >
>> >   // Set the commit interval to 500ms so that any changes are
>> flushed
>> > frequently and the summary
>> >   // data are updated with low latency.
>> >   settings.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "500")
>> >
>> >   settings
>> > }
>> >
>> > Please explain a bit ..
>> >
>> > regards.
>> >
>> >
>> > On Fri, Jul 28, 2017 at 5:36 PM, Damian Guy <damian@gmail.com>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> Do you have the application.server property set appropriately for both
>> >> hosts?
>> >>
>> >> The second stack trace is this bug:
>> >> https://issues.apache.org/jira/browse/KAFKA-5556
>> >>
>> >> On Fri, 28 Jul 2017 at 12:55 Debasish Ghosh <ghosh.debas...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi -
>> >> >
>> >> > In my Kafka Streams application, I have a state store resulting from
>> a
>> >> > stateful streaming topology. The environment is
>> >> >
>> >> >- Kafka 0.10.2.1
>> >> >- It runs on a DC/OS cluster
>> >> >- I am running Confluent-Kafka 3.2.2 on the cluster
>> >> >- Each topic that I have has 2 partitions with replication factor
>> = 2
>> >> >- The application also has an associated http service that does
>> >> >interactive queries on the state store
>> >> >
>> >> > The application runs fine when I invoke a single instance. I can use
>> the
>> >> > http endpoints to do queries and everything looks good. Problems
>> surface
>> >> > when I try to spawn another instance of the application. I use the
>> same
>> >> > APPLICATION_ID_CONFIG and the instance starts on a different node of
>> the
>> >> > cluster. The data consumption part works fine as the new instance
>> also
>> >> > starts consuming from the same topic as the first one. But when I try
>> >> 

Re: Kafka Streams state store issue on cluster

2017-07-28 Thread Damian Guy
The config looks ok to me

On Fri, 28 Jul 2017 at 13:24 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> I am setting APPLICATION_SERVER_CONFIG, which is possibly what u r
> referring to. Just now I noticed that I may also need to set
> REPLICATION_FACTOR_CONFIG, which needs to be set to 2 (default is 1).
> Anything else that I may be missing ?
>
>
> regards.
>
> On Fri, Jul 28, 2017 at 5:46 PM, Debasish Ghosh <ghosh.debas...@gmail.com>
> wrote:
>
> > Hi Damien -
> >
> > I am not sure I understand what u mean .. I have the following set in the
> > application .. Do I need to set anything else at the host level ?
> > Environment variable ?
> >
> > val streamingConfig = {
> >   val settings = new Properties
> >   settings.put(StreamsConfig.APPLICATION_ID_CONFIG,
> > "kstream-weblog-processing")
> >   settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
> config.brokers)
> >
> >   config.schemaRegistryUrl.foreach{ url =>
> >
>  settings.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
> > url)
> >   }
> >
> >   settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> > Serdes.ByteArray.getClass.getName)
> >   settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> > Serdes.String.getClass.getName)
> >
> >   // setting offset reset to earliest so that we can re-run the demo
> > code with the same pre-loaded data
> >   // Note: To re-run the demo, you need to use the offset reset tool:
> >   // https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams+Application+Reset+Tool
> >   settings.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
> >
> >   // need this for query service
> >   settings.put(StreamsConfig.APPLICATION_SERVER_CONFIG,
> > s"${config.httpInterface}:${config.httpPort}")
> >
> >   // default is /tmp/kafka-streams
> >   settings.put(StreamsConfig.STATE_DIR_CONFIG, config.stateStoreDir)
> >
> >   // Set the commit interval to 500ms so that any changes are flushed
> > frequently and the summary
> >   // data are updated with low latency.
> >   settings.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "500")
> >
> >   settings
> > }
> >
> > Please explain a bit ..
> >
> > regards.
> >
> >
> > On Fri, Jul 28, 2017 at 5:36 PM, Damian Guy <damian@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> Do you have the application.server property set appropriately for both
> >> hosts?
> >>
> >> The second stack trace is this bug:
> >> https://issues.apache.org/jira/browse/KAFKA-5556
> >>
> >> On Fri, 28 Jul 2017 at 12:55 Debasish Ghosh <ghosh.debas...@gmail.com>
> >> wrote:
> >>
> >> > Hi -
> >> >
> >> > In my Kafka Streams application, I have a state store resulting from a
> >> > stateful streaming topology. The environment is
> >> >
> >> >- Kafka 0.10.2.1
> >> >- It runs on a DC/OS cluster
> >> >- I am running Confluent-Kafka 3.2.2 on the cluster
> >> >- Each topic that I have has 2 partitions with replication factor
> = 2
> >> >- The application also has an associated http service that does
> >> >interactive queries on the state store
> >> >
> >> > The application runs fine when I invoke a single instance. I can use
> the
> >> > http endpoints to do queries and everything looks good. Problems
> surface
> >> > when I try to spawn another instance of the application. I use the
> same
> >> > APPLICATION_ID_CONFIG and the instance starts on a different node of
> the
> >> > cluster. The data consumption part works fine as the new instance also
> >> > starts consuming from the same topic as the first one. But when I try
> >> the
> >> > http query, the metadata fetch fails ..
> >> >
> >> > I have some code snippet like this as part of the query that tries to
> >> fetch
> >> > the metadata so that I can locate the host to query on ..
> >> >
> >> > metadataService.streamsMetadataForStoreAndKey(store, hostKey,
> >> > stringSerializer) match {
> >> >   case Success(host) => {
> >> > // hostKey is on another instance. call the other instance to
> >> fetch
> >> > the data.
> >> >  

Re: Kafka Streams state store issue on cluster

2017-07-28 Thread Damian Guy
Hi,

Do you have the application.server property set appropriately for both
hosts?

The second stack trace is this bug:
https://issues.apache.org/jira/browse/KAFKA-5556

On Fri, 28 Jul 2017 at 12:55 Debasish Ghosh 
wrote:

> Hi -
>
> In my Kafka Streams application, I have a state store resulting from a
> stateful streaming topology. The environment is
>
>- Kafka 0.10.2.1
>- It runs on a DC/OS cluster
>- I am running Confluent-Kafka 3.2.2 on the cluster
>- Each topic that I have has 2 partitions with replication factor = 2
>- The application also has an associated http service that does
>interactive queries on the state store
>
> The application runs fine when I invoke a single instance. I can use the
> http endpoints to do queries and everything looks good. Problems surface
> when I try to spawn another instance of the application. I use the same
> APPLICATION_ID_CONFIG and the instance starts on a different node of the
> cluster. The data consumption part works fine as the new instance also
> starts consuming from the same topic as the first one. But when I try the
> http query, the metadata fetch fails ..
>
> I have some code snippet like this as part of the query that tries to fetch
> the metadata so that I can locate the host to query on ..
>
> metadataService.streamsMetadataForStoreAndKey(store, hostKey,
> stringSerializer) match {
>   case Success(host) => {
> // hostKey is on another instance. call the other instance to fetch
> the data.
> if (!thisHost(host)) {
>   logger.warn(s"Key $hostKey is on another instance not on $host -
> requerying ..")
>   httpRequester.queryFromHost[Long](host, path)
> } else {
>   // hostKey is on this instance
>   localStateStoreQuery.queryStateStore(streams, store, hostKey)
> }
>   }
>   case Failure(ex) => Future.failed(ex)
> }
>
> and the metadataService.streamsMetadataForStoreAndKey has the following
> call ..
>
> streams.metadataForKey(store, key, serializer) match {
>   case null => throw new IllegalArgumentException(s"Metadata for key
> $key not found in $store")
>   case metadata => new HostStoreInfo(metadata.host, metadata.port,
> metadata.stateStoreNames.asScala.toSet)
> }
>
> When I start the second instance, streams.metadataForKey returns null for
> any key I pass .. Here's the relevant stack trace ..
>
> java.lang.IllegalArgumentException: Metadata for key mtc.clark.net not
> found in access-count-per-host
> at
>
> com.xx.fdp.sample.kstream.services.MetadataService.$anonfun$streamsMetadataForStoreAndKey$1(MetadataService.scala:51)
> at scala.util.Try$.apply(Try.scala:209)
> at
>
> com.xx.fdp.sample.kstream.services.MetadataService.streamsMetadataForStoreAndKey(MetadataService.scala:46)
> at
>
> com.xx.fdp.sample.kstream.http.KeyValueFetcher.fetchSummaryInfo(KeyValueFetcher.scala:36)
> at
>
> com.xx.fdp.sample.kstream.http.KeyValueFetcher.fetchAccessCountSummary(KeyValueFetcher.scala:29)
> at
>
> com.xx.fdp.sample.kstream.http.WeblogDSLHttpService.$anonfun$routes$10(WeblogDSLHttpService.scala:41)
> at
>
> akka.http.scaladsl.server.directives.RouteDirectives.$anonfun$complete$1(RouteDirectives.scala:47)
> at
>
> akka.http.scaladsl.server.StandardRoute$$anon$1.apply(StandardRoute.scala:19)
> at
>
> akka.http.scaladsl.server.StandardRoute$$anon$1.apply(StandardRoute.scala:19)
> ...
>
> and following this exception I get another one which looks like an internal
> exception that stops the application ..
>
> 09:57:33.731 TKD [StreamThread-1] ERROR o.a.k.s.p.internals.StreamThread -
> stream-thread [StreamThread-1] Streams application error during processing:
> java.lang.IllegalStateException: Attempt to retrieve exception from future
> which hasn't failed
> at
>
> org.apache.kafka.clients.consumer.internals.RequestFuture.exception(RequestFuture.java:99)
> at
>
> org.apache.kafka.clients.consumer.internals.RequestFuture.isRetriable(RequestFuture.java:89)
> at
>
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:590)
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1124)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:296)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:79)
> at
>
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
> 

Re: RocksDB Error on partition assignment

2017-07-28 Thread Damian Guy
It is due to a bug. You should set
StreamsConfig.STATE_DIR_CLEANUP_DELAY_MS_CONFIG to Long.MAX_VALUE - i.e.,
disabling it.

On Fri, 28 Jul 2017 at 10:38 Sameer Kumar  wrote:

> Hi,
>
> I am facing this error, no clue why this occurred. No other exception in
> stacktrace was found.
>
> Only thing different I did was I ran kafka streams jar on machine2 a couple
> of mins after i ran it on machine1.
>
> Please search for this string in the log below:-
> org.apache.kafka.streams.processor.internals.StreamThread$1 for group
> LICSp-4-25k failed on partition assignment
>
>
> 2017-07-28 14:55:51 INFO  StateDirectory:213 - Deleting obsolete state
> directory 2_43 for task 2_43
> 2017-07-28 14:55:51 INFO  StateDirectory:213 - Deleting obsolete state
> directory 1_29 for task 1_29
> 2017-07-28 14:55:52 INFO  StateDirectory:213 - Deleting obsolete state
> directory 2_22 for task 2_22
> 2017-07-28 14:55:52 INFO  StateDirectory:213 - Deleting obsolete state
> directory 0_9 for task 0_9
> 2017-07-28 14:55:52 INFO  StateDirectory:213 - Deleting obsolete state
> directory 0_49 for task 0_49
> 2017-07-28 14:55:52 INFO  StateDirectory:213 - Deleting obsolete state
> directory 2_27 for task 2_27
> 2017-07-28 14:55:52 INFO  StateDirectory:213 - Deleting obsolete state
> directory 2_32 for task 2_32
> 2017-07-28 14:55:52 INFO  StreamThread:767 - stream-thread [StreamThread-7]
> Committing all tasks because the commit interval 5000ms has elapsed
> 2017-07-28 14:55:52 INFO  StreamThread:805 - stream-thread [StreamThread-7]
> Committing task StreamTask 0_1
> 2017-07-28 14:55:52 ERROR StreamThread:813 - stream-thread [StreamThread-2]
> Failed to commit StreamTask 1_35 state:
> org.apache.kafka.streams.errors.ProcessorStateException: task [1_35] Failed
> to flush state store lic3-deb-ci-25k
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:337)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:72)
> at
>
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
> at
>
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> while executing flush from store lic3-deb-ci-25k-201707280900
> at
>
> org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:354)
> at
>
> org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:345)
> at
> org.apache.kafka.streams.state.internals.Segments.flush(Segments.java:138)
> at
>
> org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.flush(RocksDBSegmentedBytesStore.java:117)
> at
>
> org.apache.kafka.streams.state.internals.WrappedStateStore$AbstractWrappedStateStore.flush(WrappedStateStore.java:80)
> at
>
> org.apache.kafka.streams.state.internals.MeteredSegmentedBytesStore.flush(MeteredSegmentedBytesStore.java:111)
> at
>
> org.apache.kafka.streams.state.internals.RocksDBWindowStore.flush(RocksDBWindowStore.java:92)
> at
>
> org.apache.kafka.streams.state.internals.CachingWindowStore.flush(CachingWindowStore.java:120)
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:335)
> ... 8 more
> Caused by: org.rocksdb.RocksDBException: s
> at org.rocksdb.RocksDB.flush(Native Method)
> at org.rocksdb.RocksDB.flush(RocksDB.java:1642)
> at
>
> org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:352)
> ... 16 more
> 2017-07-28 14:55:52 INFO  StreamThread:767 - stream-thread
> [StreamThread-12] Committing all tasks because the commit interval 5000ms
> has elapsed
> 2017-07-28 14:55:52 INFO  StreamThread:390 - stream-thread [StreamThread-2]
> Shutting down
> 2017-07-28 14:55:52 INFO  StreamThread:805 - stream-thread
> [StreamThread-12] Committing task StreamTask 1_32
> 2017-07-28 14:55:52 INFO  StreamThread:1075 - stream-thread
> [StreamThread-2] Closing task 0_0
> 2017-07-28 14:55:53 INFO  StreamThread:767 - stream-thread
> [StreamThread-15] Committing all tasks because the commit interval 5000ms
> has elapsed
> 2017-07-28 14:55:53 INFO  StreamThread:805 - stream-thread
> [StreamThread-15] Committing task StreamTask 0_32
> 2017-07-28 14:55:53 INFO  StreamThread:767 - stream-thread [StreamThread-5]
> Committing all tasks because the commit interval 5000ms has elapsed
> 2017-07-28 14:55:53 INFO  

Re: Kafka streams regex match

2017-07-28 Thread Damian Guy
Hi,
The store won't be queryable until after you have called streams.start().
No stores have been created until the application is up and running and
they are dependent on the underlying partitions.

To check that a stateful operation has produced a result you would normally
add another operation after the join, i.e.,
stream.join(other,...).foreach(..) or stream.join(other,...).to("topic")

Thanks,
Damian

On Thu, 27 Jul 2017 at 22:52 Shekar Tippur <ctip...@gmail.com> wrote:

> One more thing.. How do we check if the stateful join operation resulted in
> a kstream of some value in it (size of kstream)? How do we check the
> content of a kstream?
>
> - S
>
> On Thu, Jul 27, 2017 at 2:06 PM, Shekar Tippur <ctip...@gmail.com> wrote:
>
> > Damien,
> >
> > Thanks a lot for pointing out.
> >
> > I got a little further. I am kind of stuck with the sequencing. Couple of
> > issues:
> > 1. I cannot initialise KafkaStreams before the parser.to().
> > 2. Do I need to create a new KafkaStreams object when I create a
> > KeyValueStore?
> > 3. How do I initialize KeyValueIterator with <String, JsonNode> I seem to
> > get a error when I try:
> > *KeyValueIterator <String,JsonNode> kviterator
> > = keyValueStore.range("test_nod","test_node");*
> >
> > /// START CODE /
> > //parser is a kstream as a result of join
> > if (parser.toString().matches("null")){
> >
> > ReadOnlyKeyValueStore<String, Long> keyValueStore =
> > null;
> > KafkaStreams newstreams = new KafkaStreams(builder, props);
> > try {
> > keyValueStore =
> IntegrationTestUtils.waitUntilStoreIsQueryable("local-store",
> > QueryableStoreTypes.keyValueStore(), newstreams);
> > } catch (InterruptedException e) {
> > e.printStackTrace();
> > }
> > *KeyValueIterator kviterator
> > = keyValueStore.range("test_nod","test_node");*
> > }else {
> >
> > *parser.to <http://parser.to>(stringSerde, jsonSerde, "parser");*}
> >
> > *KafkaStreams streams = new KafkaStreams(builder, props);*
> > streams.start();
> >
> > /// END CODE /
> >
> > - S
> >
> >
> >
> > On Thu, Jul 27, 2017 at 10:05 AM, Damian Guy <damian@gmail.com>
> wrote:
> > >
> > > It is part of the ReadOnlyKeyValueStore interface:
> > >
> > > https://github.com/apache/kafka/blob/trunk/streams/src/
> > main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java
> > >
> > > On Thu, 27 Jul 2017 at 17:17 Shekar Tippur <ctip...@gmail.com> wrote:
> > >
> > > > That's cool. This feature is a part of rocksdb object and not ktable?
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Jul 27, 2017, at 07:57, Damian Guy <damian@gmail.com>
> wrote:
> > > > >
> > > > > Yes they can be strings,
> > > > >
> > > > > so you could do something like:
> > > > > store.range("test_host", "test_hosu");
> > > > >
> > > > > This would return an iterator containing all of the values
> > (inclusive)
> > > > from
> > > > > "test_host" -> "test_hosu".
> > > > >
> > > > >> On Thu, 27 Jul 2017 at 14:48 Shekar Tippur <ctip...@gmail.com>
> > wrote:
> > > > >>
> > > > >> Can you please point me to an example? Can from and to be a
> string?
> > > > >>
> > > > >> Sent from my iPhone
> > > > >>
> > > > >>> On Jul 27, 2017, at 04:04, Damian Guy <damian@gmail.com>
> > wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> You can't use a regex, but you could use a range query.
> > > > >>> i.e, keyValueStore.range(from, to)
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Damian
> > > > >>>
> > > > >>>> On Wed, 26 Jul 2017 at 22:34 Shekar Tippur <ctip...@gmail.com>
> > wrote:
> > > > >>>>
> > > > >>>> Hello,
> > > > >>>>
> > > > >>>> I am able to get the kstream to ktable join work. I have some
> use
> > > > cases
> > > > >>>> where the key is not always a exact match.
> > > > >>>> I was wondering if there is a way to lookup keys based on regex.
> > > > >>>>
> > > > >>>> For example,
> > > > >>>> I have these entries for a ktable:
> > > > >>>> test_host1,{ "source": "test_host", "UL1": "test1_l1" }
> > > > >>>>
> > > > >>>> test_host2,{ "source": "test_host2", "UL1": "test2_l2" }
> > > > >>>>
> > > > >>>> test_host3,{ "source": "test_host3", "UL1": "test3_l3" }
> > > > >>>>
> > > > >>>> blah,{ "source": "blah_host", "UL1": "blah_l3" }
> > > > >>>>
> > > > >>>> and this for a kstream:
> > > > >>>>
> > > > >>>> test_host,{ "source": "test_host", "custom": { "test ": {
> > > > >> "creation_time ":
> > > > >>>> "1234 " } } }
> > > > >>>>
> > > > >>>> In this case, if the exact match does not work, I would like to
> > lookup
> > > > >>>> ktable for all entries that contains "test_host*" in it and have
> > > > >>>> application logic to determine what would be the best fit.
> > > > >>>>
> > > > >>>> Appreciate input.
> > > > >>>>
> > > > >>>> - Shekar
> > > > >>>>
> > > > >>
> > > >
> >
>


Re: Kafka streams regex match

2017-07-27 Thread Damian Guy
It is part of the ReadOnlyKeyValueStore interface:

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java

On Thu, 27 Jul 2017 at 17:17 Shekar Tippur <ctip...@gmail.com> wrote:

> That's cool. This feature is a part of rocksdb object and not ktable?
>
> Sent from my iPhone
>
> > On Jul 27, 2017, at 07:57, Damian Guy <damian@gmail.com> wrote:
> >
> > Yes they can be strings,
> >
> > so you could do something like:
> > store.range("test_host", "test_hosu");
> >
> > This would return an iterator containing all of the values (inclusive)
> from
> > "test_host" -> "test_hosu".
> >
> >> On Thu, 27 Jul 2017 at 14:48 Shekar Tippur <ctip...@gmail.com> wrote:
> >>
> >> Can you please point me to an example? Can from and to be a string?
> >>
> >> Sent from my iPhone
> >>
> >>> On Jul 27, 2017, at 04:04, Damian Guy <damian@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> You can't use a regex, but you could use a range query.
> >>> i.e, keyValueStore.range(from, to)
> >>>
> >>> Thanks,
> >>> Damian
> >>>
> >>>> On Wed, 26 Jul 2017 at 22:34 Shekar Tippur <ctip...@gmail.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I am able to get the kstream to ktable join work. I have some use
> cases
> >>>> where the key is not always a exact match.
> >>>> I was wondering if there is a way to lookup keys based on regex.
> >>>>
> >>>> For example,
> >>>> I have these entries for a ktable:
> >>>> test_host1,{ "source": "test_host", "UL1": "test1_l1" }
> >>>>
> >>>> test_host2,{ "source": "test_host2", "UL1": "test2_l2" }
> >>>>
> >>>> test_host3,{ "source": "test_host3", "UL1": "test3_l3" }
> >>>>
> >>>> blah,{ "source": "blah_host", "UL1": "blah_l3" }
> >>>>
> >>>> and this for a kstream:
> >>>>
> >>>> test_host,{ "source": "test_host", "custom": { "test ": {
> >> "creation_time ":
> >>>> "1234 " } } }
> >>>>
> >>>> In this case, if the exact match does not work, I would like to lookup
> >>>> ktable for all entries that contains "test_host*" in it and have
> >>>> application logic to determine what would be the best fit.
> >>>>
> >>>> Appreciate input.
> >>>>
> >>>> - Shekar
> >>>>
> >>
>


Re: Kafka streams regex match

2017-07-27 Thread Damian Guy
Hi,

You can't use a regex, but you could use a range query.
i.e, keyValueStore.range(from, to)

Thanks,
Damian

On Wed, 26 Jul 2017 at 22:34 Shekar Tippur  wrote:

> Hello,
>
> I am able to get the kstream to ktable join work. I have some use cases
> where the key is not always a exact match.
> I was wondering if there is a way to lookup keys based on regex.
>
> For example,
> I have these entries for a ktable:
> test_host1,{ "source": "test_host", "UL1": "test1_l1" }
>
> test_host2,{ "source": "test_host2", "UL1": "test2_l2" }
>
> test_host3,{ "source": "test_host3", "UL1": "test3_l3" }
>
> blah,{ "source": "blah_host", "UL1": "blah_l3" }
>
> and this for a kstream:
>
> test_host,{ "source": "test_host", "custom": { "test ": { "creation_time ":
> "1234 " } } }
>
> In this case, if the exact match does not work, I would like to lookup
> ktable for all entries that contains "test_host*" in it and have
> application logic to determine what would be the best fit.
>
> Appreciate input.
>
> - Shekar
>


Re: handling exceptions in a Kafka Streams application ..

2017-07-27 Thread Damian Guy
On Wed, 26 Jul 2017 at 15:53 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> One of the brokers died. The good thing is that it's not a production
> cluster, it's just a demo cluster. I have no replicas. But I can knock off
> the current Kafka instance and have a new one.
>
>
That explains it.


> Just for my understanding, if I don't have a replica, how should such
> situations be handled ? And if I have replicas, is there any documentation
> that discusses how the leader for the partition will be decided in such
> situations, so that I can take care of things when I move to production.
>
>
If you don't have any replicas, and the broker with that partition goes
offline, then you won't be able to access that partition until the broker
comes back online. There are some docs on replication here:
https://kafka.apache.org/documentation/#replication

Thanks,
Damian


> regards.
>
> On Wed, Jul 26, 2017 at 7:51 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Hi,
>>
>> It looks to me that there is currently no leader for the partition, i.e.,
>> leader -1. Also there are no replicas? Something up with your brokers?
>>
>> Thanks,
>> Damian
>>
>> On Wed, 26 Jul 2017 at 12:34 Debasish Ghosh <ghosh.debas...@gmail.com>
>> wrote:
>>
>>> Hi Damian -
>>>
>>> Yes, it exists .. It's actually a change log topic corresponding to the
>>> state store log-count
>>>
>>> $ dcos confluent-kafka topic describe
>>> kstream-log-count-log-counts-changelog
>>> {
>>>   "partitions": [
>>> {
>>>   "0": {
>>> "leader": -1,
>>> "controller_epoch": 3,
>>> "isr": [],
>>> "leader_epoch": 3,
>>> "version": 1
>>>   }
>>> }
>>>   ]
>>> }
>>>
>>> Also 1 point to note is that when Mesos restarts the process it starts
>>> in a
>>> different node. So the local state store will not exist there. But I
>>> expect
>>> Kafka will create it from the corresponding backed up topic. Hence the
>>> exception looks a bit confusing to me.
>>>
>>> Thoughts ?
>>>
>>> regards.
>>>
>>> On Wed, Jul 26, 2017 at 3:43 PM, Damian Guy <damian@gmail.com>
>>> wrote:
>>>
>>> > The exception indicates that streams was unable to find that
>>> > topic-partition on the kafka brokers. Can you verify that it exists?
>>> > Also, i'm assuming you are on 0.10.2.x?
>>> >
>>> > On Wed, 26 Jul 2017 at 10:54 Debasish Ghosh <ghosh.debas...@gmail.com>
>>> > wrote:
>>> >
>>> > > Thanks Damien .. this worked. But now after the application
>>> restarts, I
>>> > > see the following exception ..
>>> > >
>>> > > 09:41:26.516 TKD [StreamThread-1] ERROR
>>> > >> c.l.fdp.sample.kstream.WeblogDriver$ - Stream terminated because of
>>> > >> uncaught exception .. Shutting down app
>>> > >> org.apache.kafka.streams.errors.StreamsException: stream-thread
>>> > >> [StreamThread-1] Failed to rebalance
>>> > >> at
>>> > >> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>>> > StreamThread.java:598)
>>> > >> at
>>> > >> org.apache.kafka.streams.processor.internals.
>>> > StreamThread.run(StreamThread.java:361)
>>> > >> Caused by: org.apache.kafka.streams.errors.StreamsException: task
>>> [0_0]
>>> > >> Store log-counts's change log
>>> (kstream-log-count-log-counts-changelog)
>>> > does
>>> > >> not contain partition 0
>>> > >> at
>>> > >> org.apache.kafka.streams.processor.internals.ProcessorStateManager.
>>> > register(ProcessorStateManager.java:188)
>>> > >> at
>>> > >>
>>> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.
>>> > register(AbstractProcessorContext.java:99)
>>> > >
>>> > >
>>> > > I found this thread ..
>>> > > https://stackoverflow.com/questions/42329387/failed-to-
>>> > rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
>>> > > but unlike this use case I don't make any change in the partition of
>>> any
>>> &g

Re: handling exceptions in a Kafka Streams application ..

2017-07-26 Thread Damian Guy
Hi,

It looks to me that there is currently no leader for the partition, i.e.,
leader -1. Also there are no replicas? Something up with your brokers?

Thanks,
Damian

On Wed, 26 Jul 2017 at 12:34 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> Hi Damian -
>
> Yes, it exists .. It's actually a change log topic corresponding to the
> state store log-count
>
> $ dcos confluent-kafka topic describe
> kstream-log-count-log-counts-changelog
> {
>   "partitions": [
> {
>   "0": {
> "leader": -1,
> "controller_epoch": 3,
> "isr": [],
> "leader_epoch": 3,
> "version": 1
>   }
> }
>   ]
> }
>
> Also 1 point to note is that when Mesos restarts the process it starts in a
> different node. So the local state store will not exist there. But I expect
> Kafka will create it from the corresponding backed up topic. Hence the
> exception looks a bit confusing to me.
>
> Thoughts ?
>
> regards.
>
> On Wed, Jul 26, 2017 at 3:43 PM, Damian Guy <damian@gmail.com> wrote:
>
> > The exception indicates that streams was unable to find that
> > topic-partition on the kafka brokers. Can you verify that it exists?
> > Also, i'm assuming you are on 0.10.2.x?
> >
> > On Wed, 26 Jul 2017 at 10:54 Debasish Ghosh <ghosh.debas...@gmail.com>
> > wrote:
> >
> > > Thanks Damien .. this worked. But now after the application restarts, I
> > > see the following exception ..
> > >
> > > 09:41:26.516 TKD [StreamThread-1] ERROR
> > >> c.l.fdp.sample.kstream.WeblogDriver$ - Stream terminated because of
> > >> uncaught exception .. Shutting down app
> > >> org.apache.kafka.streams.errors.StreamsException: stream-thread
> > >> [StreamThread-1] Failed to rebalance
> > >> at
> > >> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> > StreamThread.java:598)
> > >> at
> > >> org.apache.kafka.streams.processor.internals.
> > StreamThread.run(StreamThread.java:361)
> > >> Caused by: org.apache.kafka.streams.errors.StreamsException: task
> [0_0]
> > >> Store log-counts's change log (kstream-log-count-log-counts-changelog)
> > does
> > >> not contain partition 0
> > >> at
> > >> org.apache.kafka.streams.processor.internals.ProcessorStateManager.
> > register(ProcessorStateManager.java:188)
> > >> at
> > >> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.
> > register(AbstractProcessorContext.java:99)
> > >
> > >
> > > I found this thread ..
> > > https://stackoverflow.com/questions/42329387/failed-to-
> > rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
> > > but unlike this use case I don't make any change in the partition of
> any
> > > topic in between the restarts. BTW my application uses stateful
> streaming
> > > and hence Kafka creates any internal topics. Not sure if it's related
> to
> > > this exception though. But the store name mentioned in the exception
> > > (log-count) is one for stateful streaming.
> > >
> > > regards.
> > >
> > > On Wed, Jul 26, 2017 at 2:20 PM, Damian Guy <damian@gmail.com>
> > wrote:
> > >
> > >> Hi Debasish,
> > >>
> > >> It might be that it is blocked in `streams.close()`
> > >> You might want to to try the overload that has a long and TimeUnit as
> > >> params, i.e., `streams.close(1, TimeUnit.MINUTES)`
> > >>
> > >> Thanks,
> > >> Damian
> > >>
> > >> On Wed, 26 Jul 2017 at 09:11 Debasish Ghosh <ghosh.debas...@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Hi -
> > >>>
> > >>> I have a Kafka streams application deployed on a Mesos DC/OS cluster.
> > >>> While
> > >>> the application was running, Kafka suddenly reported to be unhealthy
> > and
> > >>> the application got an exception ..
> > >>>
> > >>> 07:45:16.606 TKD [StreamThread-1] ERROR
> > >>> c.l.f.s.kstream.WeblogProcessing$ -
> > >>> > Stream terminated because of uncaught exception .. Shutting down
> app
> > >>> > org.apache.kafka.streams.errors.StreamsException: task [1_0]
> > exception
> > >>> > caught when pro

Re: Key Value State Store value retention

2017-07-26 Thread Damian Guy
This might help: https://kafka.apache.org/documentation/#compaction

On Wed, 26 Jul 2017 at 12:37 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> Damian,
>
> Does this mean data is retained for infinite time limited only by disk
> space.
>
> -Sameer.
>
> On Wed, Jul 26, 2017 at 3:53 PM, Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
>
> > got it. Thanks.
> >
> > On Wed, Jul 26, 2017 at 3:24 PM, Damian Guy <damian@gmail.com>
> wrote:
> >
> >> The changelog is one created by kafka streams, then it is a compacted
> >> topic
> >> and the retention period is irrelevant. If it is one you have created
> >> yourself and isn't compacted, then the data will be retained in the
> topic
> >> for as long as the retention period.
> >> If you use a non-compacted topic and the kafka-streams instance crashes
> >> then that data may be lost from the state store as it will use the topic
> >> to
> >> restore its state.
> >>
> >> On Wed, 26 Jul 2017 at 10:24 Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
> >>
> >> > ok. Thanks.
> >> >
> >> > Actually, I had this confusion. Changelog like every Kafka topic would
> >> have
> >> > its retention period, lets say 2 days. and if the value on day1 for
> >> key1 =
> >> > 4 and data for key1 doesnt come for next 3 days. Would it still retail
> >> the
> >> > same value(key1=4) on day4.
> >> >
> >> > -Sameer.
> >> >
> >> > On Wed, Jul 26, 2017 at 2:22 PM, Damian Guy <damian@gmail.com>
> >> wrote:
> >> >
> >> > > Sameer,
> >> > >
> >> > > For a KeyValue store the changelog topic is a compacted topic so
> >> there is
> >> > > no retention period. You will always retain the latest value for a
> >> key.
> >> > >
> >> > > Thanks,
> >> > > Damian
> >> > >
> >> > > On Wed, 26 Jul 2017 at 08:36 Sameer Kumar <sam.kum.w...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > Retention period for state stores are clear(default, otherwise
> >> > specified
> >> > > by
> >> > > > TimeWindows.until). Intrigued to know the retention period for key
> >> > > values.
> >> > > >
> >> > > > The use case is something like I am reading from a windowed store,
> >> and
> >> > > > using plain reduce() with out any time windows. Would the values
> be
> >> > > > retained foreever.
> >> > > >
> >> > > > -Sameer.
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: handling exceptions in a Kafka Streams application ..

2017-07-26 Thread Damian Guy
The exception indicates that streams was unable to find that
topic-partition on the kafka brokers. Can you verify that it exists?
Also, i'm assuming you are on 0.10.2.x?

On Wed, 26 Jul 2017 at 10:54 Debasish Ghosh <ghosh.debas...@gmail.com>
wrote:

> Thanks Damien .. this worked. But now after the application restarts, I
> see the following exception ..
>
> 09:41:26.516 TKD [StreamThread-1] ERROR
>> c.l.fdp.sample.kstream.WeblogDriver$ - Stream terminated because of
>> uncaught exception .. Shutting down app
>> org.apache.kafka.streams.errors.StreamsException: stream-thread
>> [StreamThread-1] Failed to rebalance
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:598)
>> at
>> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
>> Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_0]
>> Store log-counts's change log (kstream-log-count-log-counts-changelog) does
>> not contain partition 0
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:188)
>> at
>> org.apache.kafka.streams.processor.internals.AbstractProcessorContext.register(AbstractProcessorContext.java:99)
>
>
> I found this thread ..
> https://stackoverflow.com/questions/42329387/failed-to-rebalance-error-in-kafka-streams-with-more-than-one-topic-partition
> but unlike this use case I don't make any change in the partition of any
> topic in between the restarts. BTW my application uses stateful streaming
> and hence Kafka creates any internal topics. Not sure if it's related to
> this exception though. But the store name mentioned in the exception
> (log-count) is one for stateful streaming.
>
> regards.
>
> On Wed, Jul 26, 2017 at 2:20 PM, Damian Guy <damian@gmail.com> wrote:
>
>> Hi Debasish,
>>
>> It might be that it is blocked in `streams.close()`
>> You might want to to try the overload that has a long and TimeUnit as
>> params, i.e., `streams.close(1, TimeUnit.MINUTES)`
>>
>> Thanks,
>> Damian
>>
>> On Wed, 26 Jul 2017 at 09:11 Debasish Ghosh <ghosh.debas...@gmail.com>
>> wrote:
>>
>>> Hi -
>>>
>>> I have a Kafka streams application deployed on a Mesos DC/OS cluster.
>>> While
>>> the application was running, Kafka suddenly reported to be unhealthy and
>>> the application got an exception ..
>>>
>>> 07:45:16.606 TKD [StreamThread-1] ERROR
>>> c.l.f.s.kstream.WeblogProcessing$ -
>>> > Stream terminated because of uncaught exception .. Shutting down app
>>> > org.apache.kafka.streams.errors.StreamsException: task [1_0] exception
>>> > caught when producing
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:76)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647)
>>> > at
>>> >
>>> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
>>> > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring
>>> 205
>>> > record(s) for
>>> > kstream-log-processing-windowed-access-count-per-host-repartition-0:
>>> 30020
>>> > ms has passed since last attempt plus backoff time
>>> > 07:45:16.606 TKD [StreamThread-1] ERROR
>>> c.l.f.s.kstream.WeblogProcessing$
>>> > - Stoppi

Re: Key Value State Store value retention

2017-07-26 Thread Damian Guy
The changelog is one created by kafka streams, then it is a compacted topic
and the retention period is irrelevant. If it is one you have created
yourself and isn't compacted, then the data will be retained in the topic
for as long as the retention period.
If you use a non-compacted topic and the kafka-streams instance crashes
then that data may be lost from the state store as it will use the topic to
restore its state.

On Wed, 26 Jul 2017 at 10:24 Sameer Kumar <sam.kum.w...@gmail.com> wrote:

> ok. Thanks.
>
> Actually, I had this confusion. Changelog like every Kafka topic would have
> its retention period, lets say 2 days. and if the value on day1 for key1 =
> 4 and data for key1 doesnt come for next 3 days. Would it still retail the
> same value(key1=4) on day4.
>
> -Sameer.
>
> On Wed, Jul 26, 2017 at 2:22 PM, Damian Guy <damian@gmail.com> wrote:
>
> > Sameer,
> >
> > For a KeyValue store the changelog topic is a compacted topic so there is
> > no retention period. You will always retain the latest value for a key.
> >
> > Thanks,
> > Damian
> >
> > On Wed, 26 Jul 2017 at 08:36 Sameer Kumar <sam.kum.w...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Retention period for state stores are clear(default, otherwise
> specified
> > by
> > > TimeWindows.until). Intrigued to know the retention period for key
> > values.
> > >
> > > The use case is something like I am reading from a windowed store, and
> > > using plain reduce() with out any time windows. Would the values be
> > > retained foreever.
> > >
> > > -Sameer.
> > >
> >
>


Re: Kafka Streams 0.10.2.1 client crash - .checkpoint.tmp (No such file or directory)

2017-07-26 Thread Damian Guy
Hi,

Sorry, yes this is a bug to do with file locking and the clean-up thread.
For now the workaround is to configure
StreamsConfig.STATE_CLEANUP_DELAY_MS_CONFIG to a very large value, i.e.,
Long.MAX_VALUE. So it is effectively disabled.

There are a couple of related JIRAs
https://issues.apache.org/jira/browse/KAFKA-5562 and
https://issues.apache.org/jira/browse/KAFKA-4890

Thanks,
Damian

On Wed, 26 Jul 2017 at 06:44 Eric Lalonde <e...@autonomic.ai> wrote:

> Hello, I am able to reproduce this. It occurs during rebalancing when the
> service is restarted. kafka-clients and kafka-streams are both at version
> 0.10.2.1. 3 instances of the service, 4 threads per instance, 100
> partitions.
>
> log excerpt:
>
>  Wed Jul 26 05:32:07 UTC 2017
>  Streams state: REBALANCING
>  Num Stream Threads: 4
>
>  2017-07-26 05:32:20.497 ERROR 7 --- [ StreamThread-1]
> o.a.k.s.p.internals.StreamThread : stream-thread [StreamThread-1]
> Failed to remove suspended task 2_68
>
>  org.apache.kafka.streams.errors.ProcessorStateException: Error while
> closing the state manager
>  at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:133)
> ~[kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.StreamThread.closeNonAssignedSuspendedTasks(StreamThread.java:898)
> [kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
> [kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:233)
> [kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
> [kafka-clients-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:592)
> [kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
> [kafka-streams-0.10.2.1.jar!/:na]
>  Caused by: java.io.FileNotFoundException:
> /home/kafka-streams/data/2_68/.checkpoint.tmp (No such file or directory)
>  at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_102]
>  at java.io.FileOutputStream.open(FileOutputStream.java:270)
> ~[na:1.8.0_102]
>  at java.io.FileOutputStream.(FileOutputStream.java:213)
> ~[na:1.8.0_102]
>  at java.io.FileOutputStream.(FileOutputStream.java:162)
> ~[na:1.8.0_102]
>  at
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:71)
> ~[kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:386)
> ~[kafka-streams-0.10.2.1.jar!/:na]
>  at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:131)
> ~[kafka-streams-0.10.2.1.jar!/:na]
>  ... 11 common frames omitted
>
>
> Interestingly, the directory 2_68 does not appear to exist on the instance
> on which this  exception was thrown:
>
> $ ls /home/kafka-streams/data/
>
> 0_10  0_13  0_18  0_23  0_27  0_3   0_33  0_35  0_39  0_43  0_46  0_50
> 0_52  0_62  0_66  0_7   0_76  0_81  0_83  0_87  0_89  0_97  1_0  1_11  1_2
>  1_26  1_31  1_35  1_39  1_41  1_5   1_54  1_56  1_63  1_65  1_69  1_73
> 1_77  1_79  1_88  1_92  1_94  1_96  2_0  2_12  2_16  2_2   2_24  2_33
> 2_40  2_46  2_54  2_60  2_67  2_71  2_76  2_80  2_86  2_91 0_11  0_14
> 0_21  0_24   0_29  0_30  0_34  0_37  0_42  0_44  0_5   0_51  0_57  0_65
> 0_67  0_71  0_80  0_82  0_84  0_88  0_91  0_99  1_1  1_12  1_20  1_30
> 1_34  1_37  1_4   1_45  1_53  1_55  1_62  1_64  1_68  1_70  1_75  1_78
> 1_80  1_89  1_93  1_95  1_99  2_1  2_13  2_19  2_21  2_3   2_34  2_44
> 2_53  2_6   2_65  2_69  2_73  2_78  2_81  2_90  2_99
>
> > On Jul 6, 2017, at 7:50 AM, Ian Duffy <i...@ianduffy.ie> wrote:
> >
> > Hi Damian,
> >
> > Sorry for the delayed reply have been out of offic

  1   2   3   >