Re: [DISCUSS] KIP-247: Add public test utils for Kafka Streams

2018-01-16 Thread Jeff Klukas
>From what I can tell, global state stores are managed separately from other
state stores and are accessed via different methods.

Do the proposed methods on TopologyTestDriver (such as getStateStore) cover
global stores? If not, can we add an interface for accessing and testing
global stores in the scope of this KIP?

On Thu, Jan 11, 2018 at 9:06 PM, Matthias J. Sax 
wrote:

> Dear Kafka community,
>
> I want to propose KIP-247 to add public test utils to the Streams API.
> The goal is to simplify testing of Kafka Streams applications.
>
> Please find details in the wiki:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 247%3A+Add+public+test+utils+for+Kafka+Streams
>
> This is an initial KIP, and we hope to add more utility functions later.
> Thus, this KIP is not comprehensive but a first step. Of course, we can
> enrich this initial KIP if we think it falls too short. But we should
> not aim to be comprehensive to keep the scope manageable.
>
> In fact, I think we should add some more helpers to simplify result
> verification. I will update the KIP with this asap. Just wanted to start
> the discussion early on.
>
> An initial WIP PR can be found here:
> https://github.com/apache/kafka/pull/4402
>
> I also included the user-list (please hit "reply-all" to include both
> lists in this KIP discussion).
>
> Thanks a lot.
>
>
> -Matthias
>
>
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-15 Thread Jeff Klukas
I've made all the changes on the KIP-215 and general KIP page to mark this
as approved. Just waiting on approval and merging of the PR at this point.

On Thu, Nov 9, 2017 at 9:36 AM, Randall Hauch <rha...@gmail.com> wrote:

> Jeff,
>
> This KIP does pass with 3 binding +1s and no other binding votes, and with
> more than 72 hour for voting. Do you want to update the KIP-215 and KIP
> list pages accordingly? We can also merge the PR.
>
> Thanks, and very nice work!
>
> Randall
>
> On Fri, Nov 3, 2017 at 8:12 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > +1! The S3 or hdfs connector will now be super powerful !
> >
> > On 4 Nov. 2017 11:27 am, "Konstantine Karantasis" <
> > konstant...@confluent.io>
> > wrote:
> >
> > > Nice addition!
> > >
> > > +1 (non-binding)
> > >
> > > Konstantine
> > >
> > > On Fri, Nov 3, 2017 at 4:52 PM, Jeff Klukas <jklu...@simple.com>
> wrote:
> > >
> > > > So sorry for skirting the process there. I wasn't aware of the 72
> hour
> > > > window and I don't see that mentioned in in
> > > > https://cwiki.apache.org/confluence/display/KAFKA/
> Bylaws#Bylaws-Voting
> > > >
> > > > Should I feel free to update that wiki page with a note about the
> > window?
> > > >
> > > > On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Jeff,
> > > > >
> > > > > Just FYI re: process, I think you're pretty much definitely in the
> > > clear
> > > > > hear since this one is a straightforward design I doubt anybody
> would
> > > > > object to, but voting normally stays open 72h to ensure everyone
> has
> > a
> > > > > chance to weigh in.
> > > > >
> > > > > Again thanks for the KIP and we can move any final discussion over
> to
> > > the
> > > > > PR!
> > > > >
> > > > > -Ewen
> > > > >
> > > > > On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas <jklu...@simple.com>
> > > wrote:
> > > > >
> > > > > > Looks like we've achieved lazy majority, so I'll move the KIP to
> > > > > approved.
> > > > > >
> > > > > > Thanks all for looking this over.
> > > > > >
> > > > > > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <
> > ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > +1. Thanks for the KIP!
> > > > > > >
> > > > > > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 binding
> > > > > > > >
> > > > > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <
> > > > > > e...@confluent.io
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 binding
> > > > > > > > >
> > > > > > > > > Thanks Jeff!
> > > > > > > > >
> > > > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <
> > > rha...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > > Thanks for pushing this through. Great work!
> > > > > > > > > >
> > > > > > > > > > Randall Hauch
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <
> > > > jklu...@simple.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I haven't heard any additional concerns over the
> > proposal,
> > > so
> > > > > I'd
> > > > > > > > like
> > > > > > > > > to
> > > > > > > > > > > get the voting process started for:
> > > > > > > > > > >
> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > > > > > > > >
> > > > > > > > > > > It adds a topics.regex option for Kafka Connect sinks
> as
> > an
> > > > > > > > alternative
> > > > > > > > > > to
> > > > > > > > > > > the existing required topics option.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-03 Thread Jeff Klukas
So sorry for skirting the process there. I wasn't aware of the 72 hour
window and I don't see that mentioned in in
https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Voting

Should I feel free to update that wiki page with a note about the window?

On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Jeff,
>
> Just FYI re: process, I think you're pretty much definitely in the clear
> hear since this one is a straightforward design I doubt anybody would
> object to, but voting normally stays open 72h to ensure everyone has a
> chance to weigh in.
>
> Again thanks for the KIP and we can move any final discussion over to the
> PR!
>
> -Ewen
>
> On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas <jklu...@simple.com> wrote:
>
> > Looks like we've achieved lazy majority, so I'll move the KIP to
> approved.
> >
> > Thanks all for looking this over.
> >
> > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > +1. Thanks for the KIP!
> > >
> > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > > > +1 binding
> > > >
> > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > +1 binding
> > > > >
> > > > > Thanks Jeff!
> > > > >
> > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <rha...@gmail.com>
> > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks for pushing this through. Great work!
> > > > > >
> > > > > > Randall Hauch
> > > > > >
> > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <jklu...@simple.com>
> > > > wrote:
> > > > > >
> > > > > > > I haven't heard any additional concerns over the proposal, so
> I'd
> > > > like
> > > > > to
> > > > > > > get the voting process started for:
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > > > >
> > > > > > > It adds a topics.regex option for Kafka Connect sinks as an
> > > > alternative
> > > > > > to
> > > > > > > the existing required topics option.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-03 Thread Jeff Klukas
Looks like we've achieved lazy majority, so I'll move the KIP to approved.

Thanks all for looking this over.

On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <ja...@confluent.io> wrote:

> +1. Thanks for the KIP!
>
> On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > +1 binding
> >
> > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <e...@confluent.io
> >
> > wrote:
> >
> > > +1 binding
> > >
> > > Thanks Jeff!
> > >
> > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks for pushing this through. Great work!
> > > >
> > > > Randall Hauch
> > > >
> > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <jklu...@simple.com>
> > wrote:
> > > >
> > > > > I haven't heard any additional concerns over the proposal, so I'd
> > like
> > > to
> > > > > get the voting process started for:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > >
> > > > > It adds a topics.regex option for Kafka Connect sinks as an
> > alternative
> > > > to
> > > > > the existing required topics option.
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


[VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-01 Thread Jeff Klukas
I haven't heard any additional concerns over the proposal, so I'd like to
get the voting process started for:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-215%3A+Add+topic+regex+support+for+Connect+sinks

It adds a topics.regex option for Kafka Connect sinks as an alternative to
the existing required topics option.


Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-31 Thread Jeff Klukas
I responded to Ewen's suggestions in the PR and went back to using
ConfigException.

If I don't hear any other concerns today, I'll start a [VOTE] thread for
the KIP.

On Mon, Oct 30, 2017 at 9:29 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I took a quick pass at the PR, looks good so far. ConfigException would
> still be fine in the case you're highlighting as it's inside the framework
> anyway and we'd expect a ConfigException from configure() if connectors try
> to use their ConfigDef to parse an invalid config. But here I don't feel
> strongly about which to use since the error message is clear anyway and
> will just end up in logs / the REST API for the user to sort out.
>
> -Ewen
>
> On Fri, Oct 27, 2017 at 6:39 PM, Jeff Klukas <jklu...@simple.com> wrote:
>
> > I've updated the KIP to use the topics.regex name and opened a WIP PR
> with
> > an implementation that shows some additional complexity in how the
> > configuration option gets passed through, affecting various public
> function
> > signatures.
> >
> > I would appreciate any eyes on that for feedback on whether more design
> > discussion needs to happen in the KIP.
> >
> > https://github.com/apache/kafka/pull/4151
> >
> > On Fri, Oct 27, 2017 at 7:50 AM, Jeff Klukas <jklu...@simple.com> wrote:
> >
> > > I added a note in the KIP about ConfigException being thrown. I also
> > > changed the proposed default for the new config to empty string rather
> > than
> > > null.
> > >
> > > Absent a clear definition of what "common" regex syntax is, it seems an
> > > undue burden to ask the user to guess at what Pattern features are
> safe.
> > If
> > > we do end up implementing a different regex style, I think it will be
> > > necessary to still support the Java Pattern style long-term as an
> option.
> > > If we want to use a different regex style as default down the road, we
> > > could require "power users" of Java Pattern to enable an additional
> > config
> > > option to maintain compatibility.
> > >
> > > One additional change I might make to the KIP is that 'topics.regex'
> > might
> > > be a better choice for config name than 'topics.pattern'. That would be
> > in
> > > keeping with RegexRouter that has a 'regex' configuration option rather
> > > than 'pattern'.
> > >
> > > On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > > > wrote:
> > >
> > >> It's fine to be more detailed, but ConfigException is already implied
> > for
> > >> all other config issues as well.
> > >>
> > >> Default could be either null or just empty string. re: alternatives,
> if
> > >> you
> > >> wanted to be slightly more detailed (though still a bit vague) re:
> > >> supported syntax, you could just say that while Pattern is used, we
> only
> > >> guarantee support for common regular expression syntax. Not sure if
> > >> there's
> > >> a good way of defining what "common" syntax is.
> > >>
> > >> Otherwise LGTM, and thanks for helping fill in a longstanding gap!
> > >>
> > >> -Ewen
> > >>
> > >> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >>
> > >> > bq. Users may specify only one of 'topics' or 'topics.pattern'.
> > >> >
> > >> > Can you fill in which exception would be thrown if both of them are
> > >> > specified
> > >> > ?
> > >> >
> > >> > Cheers
> > >> >
> > >> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net>
> wrote:
> > >> >
> > >> > > Looking for feedback on
> > >> > >
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-27 Thread Jeff Klukas
I've updated the KIP to use the topics.regex name and opened a WIP PR with
an implementation that shows some additional complexity in how the
configuration option gets passed through, affecting various public function
signatures.

I would appreciate any eyes on that for feedback on whether more design
discussion needs to happen in the KIP.

https://github.com/apache/kafka/pull/4151

On Fri, Oct 27, 2017 at 7:50 AM, Jeff Klukas <jklu...@simple.com> wrote:

> I added a note in the KIP about ConfigException being thrown. I also
> changed the proposed default for the new config to empty string rather than
> null.
>
> Absent a clear definition of what "common" regex syntax is, it seems an
> undue burden to ask the user to guess at what Pattern features are safe. If
> we do end up implementing a different regex style, I think it will be
> necessary to still support the Java Pattern style long-term as an option.
> If we want to use a different regex style as default down the road, we
> could require "power users" of Java Pattern to enable an additional config
> option to maintain compatibility.
>
> One additional change I might make to the KIP is that 'topics.regex' might
> be a better choice for config name than 'topics.pattern'. That would be in
> keeping with RegexRouter that has a 'regex' configuration option rather
> than 'pattern'.
>
> On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <e...@confluent.io
> > wrote:
>
>> It's fine to be more detailed, but ConfigException is already implied for
>> all other config issues as well.
>>
>> Default could be either null or just empty string. re: alternatives, if
>> you
>> wanted to be slightly more detailed (though still a bit vague) re:
>> supported syntax, you could just say that while Pattern is used, we only
>> guarantee support for common regular expression syntax. Not sure if
>> there's
>> a good way of defining what "common" syntax is.
>>
>> Otherwise LGTM, and thanks for helping fill in a longstanding gap!
>>
>> -Ewen
>>
>> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > bq. Users may specify only one of 'topics' or 'topics.pattern'.
>> >
>> > Can you fill in which exception would be thrown if both of them are
>> > specified
>> > ?
>> >
>> > Cheers
>> >
>> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net> wrote:
>> >
>> > > Looking for feedback on
>> > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > 215%3A+Add+topic+regex+support+for+Connect+sinks
>> > >
>> >
>>
>
>


Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-27 Thread Jeff Klukas
I added a note in the KIP about ConfigException being thrown. I also
changed the proposed default for the new config to empty string rather than
null.

Absent a clear definition of what "common" regex syntax is, it seems an
undue burden to ask the user to guess at what Pattern features are safe. If
we do end up implementing a different regex style, I think it will be
necessary to still support the Java Pattern style long-term as an option.
If we want to use a different regex style as default down the road, we
could require "power users" of Java Pattern to enable an additional config
option to maintain compatibility.

One additional change I might make to the KIP is that 'topics.regex' might
be a better choice for config name than 'topics.pattern'. That would be in
keeping with RegexRouter that has a 'regex' configuration option rather
than 'pattern'.

On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> It's fine to be more detailed, but ConfigException is already implied for
> all other config issues as well.
>
> Default could be either null or just empty string. re: alternatives, if you
> wanted to be slightly more detailed (though still a bit vague) re:
> supported syntax, you could just say that while Pattern is used, we only
> guarantee support for common regular expression syntax. Not sure if there's
> a good way of defining what "common" syntax is.
>
> Otherwise LGTM, and thanks for helping fill in a longstanding gap!
>
> -Ewen
>
> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > bq. Users may specify only one of 'topics' or 'topics.pattern'.
> >
> > Can you fill in which exception would be thrown if both of them are
> > specified
> > ?
> >
> > Cheers
> >
> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net> wrote:
> >
> > > Looking for feedback on
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > >
> >
>


[DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-26 Thread Jeff Klukas
Looking for feedback on

https://cwiki.apache.org/confluence/display/KAFKA/KIP-215%3A+Add+topic+regex+support+for+Connect+sinks


[jira] [Updated] (KAFKA-4932) Add UUID Serde

2017-03-22 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-4932:
---
Description: 
I propose adding serializers and deserializers for the java.util.UUID class.

I have many use cases where I want to set the key of a Kafka message to be a 
UUID. Currently, I need to turn UUIDs into strings or byte arrays and use their 
associated Serdes, but it would be more convenient to serialize and deserialize 
UUIDs directly.

I'd propose that the serializer and deserializer use the 36-byte string 
representation, calling UUID.toString and UUID.fromString. We would also wrap 
these in a Serde and modify the streams Serdes class to include this in the 
list of supported types.

Optionally, we could have the deserializer support a 16-byte representation and 
it would check the size of the input byte array to determine whether it's a 
binary or string representation of the UUID. It's not well defined whether the 
most significant bits or least significant go first, so this deserializer would 
have to support only one or the other.

Similary, if the deserializer supported a 16-byte representation, there could 
be two variants of the serializer, a UUIDStringSerializer and a 
UUIDBytesSerializer.

I would be willing to write this PR, but am looking for feedback about whether 
there are significant concerns here around ambiguity of what the byte 
representation of a UUID should be, or if there's desire to keep to list of 
built-in Serdes minimal such that a PR would be unlikely to be accepted.

  was:
I propose adding serializers and deserializers for the java.util.UUID class.

I have many use cases where I want to set the key of a Kafka message to be a 
UUID. Currently, I need turn UUIDs into strings or byte arrays and use the 
associated Serdes, but it would be more convenient to serialize and deserialize 
UUIDs directly.

I'd propose that the serializer and deserializer use the 36-byte string 
representation, calling UUID.toString and UUID.fromString

Optionally, we could also has the deserializer support a 16-byte representation 
and it would check size of the input byte array to determine whether it's a 
binary or string representation of the UUID. It's not well defined whether the 
most significant bits or least significant go first, so this deserializer would 
have to support only one or the other.

Optionally, there could be two variants of the serializer, a 
UUIDStringSerializer and a UUIDBytesSerializer.

We would also wrap these in a Serde and modify the Serdes class to include this 
in the list of supported types.

I would be willing to write this PR, but am looking for feedback about whether 
there are significant concerns here around ambiguity of what the byte 
representation of a UUID should be, or if there's desire to keep to list of 
built-in Serdes minimal such that a PR would be unlikely to be accepted.


> Add UUID Serde
> --
>
> Key: KAFKA-4932
> URL: https://issues.apache.org/jira/browse/KAFKA-4932
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>    Reporter: Jeff Klukas
>Priority: Minor
>
> I propose adding serializers and deserializers for the java.util.UUID class.
> I have many use cases where I want to set the key of a Kafka message to be a 
> UUID. Currently, I need to turn UUIDs into strings or byte arrays and use 
> their associated Serdes, but it would be more convenient to serialize and 
> deserialize UUIDs directly.
> I'd propose that the serializer and deserializer use the 36-byte string 
> representation, calling UUID.toString and UUID.fromString. We would also wrap 
> these in a Serde and modify the streams Serdes class to include this in the 
> list of supported types.
> Optionally, we could have the deserializer support a 16-byte representation 
> and it would check the size of the input byte array to determine whether it's 
> a binary or string representation of the UUID. It's not well defined whether 
> the most significant bits or least significant go first, so this deserializer 
> would have to support only one or the other.
> Similary, if the deserializer supported a 16-byte representation, there could 
> be two variants of the serializer, a UUIDStringSerializer and a 
> UUIDBytesSerializer.
> I would be willing to write this PR, but am looking for feedback about 
> whether there are significant concerns here around ambiguity of what the byte 
> representation of a UUID should be, or if there's desire to keep to list of 
> built-in Serdes minimal such that a PR would be unlikely to be accepted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4932) Add UUID Serde

2017-03-22 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-4932:
---
Summary: Add UUID Serde  (was: Add UUID Serdes)

> Add UUID Serde
> --
>
> Key: KAFKA-4932
> URL: https://issues.apache.org/jira/browse/KAFKA-4932
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>    Reporter: Jeff Klukas
>Priority: Minor
>
> I propose adding serializers and deserializers for the java.util.UUID class.
> I have many use cases where I want to set the key of a Kafka message to be a 
> UUID. Currently, I need turn UUIDs into strings or byte arrays and use the 
> associated Serdes, but it would be more convenient to serialize and 
> deserialize UUIDs directly.
> I'd propose that the serializer and deserializer use the 36-byte string 
> representation, calling UUID.toString and UUID.fromString
> Optionally, we could also has the deserializer support a 16-byte 
> representation and it would check size of the input byte array to determine 
> whether it's a binary or string representation of the UUID. It's not well 
> defined whether the most significant bits or least significant go first, so 
> this deserializer would have to support only one or the other.
> Optionally, there could be two variants of the serializer, a 
> UUIDStringSerializer and a UUIDBytesSerializer.
> We would also wrap these in a Serde and modify the Serdes class to include 
> this in the list of supported types.
> I would be willing to write this PR, but am looking for feedback about 
> whether there are significant concerns here around ambiguity of what the byte 
> representation of a UUID should be, or if there's desire to keep to list of 
> built-in Serdes minimal such that a PR would be unlikely to be accepted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4932) Add UUID Serdes

2017-03-22 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-4932:
--

 Summary: Add UUID Serdes
 Key: KAFKA-4932
 URL: https://issues.apache.org/jira/browse/KAFKA-4932
 Project: Kafka
  Issue Type: Improvement
  Components: clients, streams
Reporter: Jeff Klukas
Priority: Minor


I propose adding serializers and deserializers for the java.util.UUID class.

I have many use cases where I want to set the key of a Kafka message to be a 
UUID. Currently, I need turn UUIDs into strings or byte arrays and use the 
associated Serdes, but it would be more convenient to serialize and deserialize 
UUIDs directly.

I'd propose that the serializer and deserializer use the 36-byte string 
representation, calling UUID.toString and UUID.fromString

Optionally, we could also has the deserializer support a 16-byte representation 
and it would check size of the input byte array to determine whether it's a 
binary or string representation of the UUID. It's not well defined whether the 
most significant bits or least significant go first, so this deserializer would 
have to support only one or the other.

Optionally, there could be two variants of the serializer, a 
UUIDStringSerializer and a UUIDBytesSerializer.

We would also wrap these in a Serde and modify the Serdes class to include this 
in the list of supported types.

I would be willing to write this PR, but am looking for feedback about whether 
there are significant concerns here around ambiguity of what the byte 
representation of a UUID should be, or if there's desire to keep to list of 
built-in Serdes minimal such that a PR would be unlikely to be accepted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs

2016-10-07 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas resolved KAFKA-4257.

Resolution: Fixed

Ismael's PR addresses these questions.

> Inconsistencies in 0.10.1 upgrade docs 
> ---
>
> Key: KAFKA-4257
> URL: https://issues.apache.org/jira/browse/KAFKA-4257
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.1.0
>    Reporter: Jeff Klukas
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> There are several inconsistencies in the 0.10.1.0 upgrade docs that make it 
> difficult to determine what client versions are compatible with what broker 
> versions.
> The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to 
> 0.10.1.0", but it includes clauses about versions prior to 0.10. Is the 
> intention for these instructions to be valid for upgrading from brokers as 
> far back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"?
> I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top 
> of 0.10.0.X brokers. In particular, step 5 of the upgrade instructions 
> mentions "Once all consumers have been upgraded to 0.10.0". Should that read 
> 0.10.1, or is the intention here truly that clients on 9.X or below need to 
> be at version 0.10.0.0 at a minimum?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs

2016-10-05 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548729#comment-15548729
 ] 

Jeff Klukas commented on KAFKA-4257:


The pull request definitely helps. Thanks for jumping in to make those changes. 
I left a comment on the PR.

> Inconsistencies in 0.10.1 upgrade docs 
> ---
>
> Key: KAFKA-4257
> URL: https://issues.apache.org/jira/browse/KAFKA-4257
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.1.0
>    Reporter: Jeff Klukas
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> There are several inconsistencies in the 0.10.1.0 upgrade docs that make it 
> difficult to determine what client versions are compatible with what broker 
> versions.
> The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to 
> 0.10.1.0", but it includes clauses about versions prior to 0.10. Is the 
> intention for these instructions to be valid for upgrading from brokers as 
> far back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"?
> I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top 
> of 0.10.0.X brokers. In particular, step 5 of the upgrade instructions 
> mentions "Once all consumers have been upgraded to 0.10.0". Should that read 
> 0.10.1, or is the intention here truly that clients on 9.X or below need to 
> be at version 0.10.0.0 at a minimum?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs

2016-10-05 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-4257:
--

 Summary: Inconsistencies in 0.10.1 upgrade docs 
 Key: KAFKA-4257
 URL: https://issues.apache.org/jira/browse/KAFKA-4257
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.10.1.0
Reporter: Jeff Klukas
Priority: Minor
 Fix For: 0.10.1.0


There are several inconsistencies in the 0.10.1.0 upgrade docs that make it 
difficult to determine what client versions are compatible with what broker 
versions.

The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to 
0.10.1.0", but it includes clauses about versions prior to 0.10. Is the 
intention for these instructions to be valid for upgrading from brokers as far 
back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"?

I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top of 
0.10.0.X brokers. In particular, step 5 of the upgrade instructions mentions 
"Once all consumers have been upgraded to 0.10.0". Should that read 0.10.1, or 
is the intention here truly that clients on 9.X or below need to be at version 
0.10.0.0 at a minimum?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-17 Thread Jeff Klukas
On Thu, Jun 16, 2016 at 5:20 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> On Thu, Jun 16, 2016 at 11:13 PM, Stephen Boesch <java...@gmail.com>
> wrote:
>
> > @Jeff Klukas What is the concern about scala 2.11 vs 2.12?   2.11 runs on
> > both java7 and java8
> >
>
> Scala 2.10.5 and 2.10.6 also support Java 8 for what it's worth.
>

I was under the impression that Scala 2.12 would be the first version
compatible with Java 8 bytecode, but looks like that was a misunderstanding
on my part.

+1


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-16 Thread Jeff Klukas
Would the move to Java 8 be for all modules? I'd have some concern about
removing Java 7 compatibility for kafka-clients and for kafka streams
(though less so since it's still so new). I don't know how hard it will be
to transition a Scala 2.11 application to Scala 2.12. Are we comfortable
with the idea of applications stuck on Scala 2.11 or otherwise unable to
update to Java 8 not having access to new client releases?

On Thu, Jun 16, 2016 at 5:05 PM, Philippe Derome  wrote:

> I strongly support motion having difficulty running (Apache Kafka as
> opposed to Confluent) Stream examples with JDK 8 today.
> On 16 Jun 2016 4:46 p.m., "Ismael Juma"  wrote:
>
> > Hi all,
> >
> > I would like to start a discussion on making Java 8 a minimum requirement
> > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This
> > is the first discussion on the topic so the idea is to understand how
> > people feel about it. If people feel it's too soon, then we can pick up
> the
> > conversation again after Kafka 0.10.1.0. If the feedback is mostly
> > positive, I will start a vote thread.
> >
> > Let's start with some dates. Java 7 hasn't received public updates since
> > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is
> scheduled
> > to be released in March 2017[3].
> >
> > The first argument for dropping support for Java 7 is that the last
> public
> > release by Oracle contains a large number of known security
> > vulnerabilities. The effectiveness of Kafka's security features is
> reduced
> > if the underlying runtime is not itself secure.
> >
> > The second argument for moving to Java 8 is that it adds a number of
> > compelling features:
> >
> > * Lambda expressions and method references (particularly useful for the
> > Kafka Streams DSL)
> > * Default methods (very useful for maintaining compatibility when adding
> > methods to interfaces)
> > * java.util.stream (helpful for making collection transformations more
> > concise)
> > * Lots of improvements to java.util.concurrent (CompletableFuture,
> > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator)
> > * Other nice things: SplittableRandom, Optional (and many others I have
> not
> > mentioned)
> >
> > The third argument is that it will simplify our testing matrix, we won't
> > have to test with Java 7 any longer (this is particularly useful for
> system
> > tests that take hours to run). It will also make it easier to support
> Scala
> > 2.12, which requires Java 8.
> >
> > The fourth argument is that many other open-source projects have taken
> the
> > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7],
> > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will
> > support Java 8 in the next version (although it will take a while before
> > most phones will use that version sadly). This reduces (but does not
> > eliminate) the chance that we would be the first project that would
> cause a
> > user to consider a Java upgrade.
> >
> > The main argument for not making the change is that a reasonable number
> of
> > users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> > More specifically, we care about the subset who would be able to upgrade
> to
> > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It
> would
> > be great if we could quantify this in some way.
> >
> > What do you think?
> >
> > Ismael
> >
> > [1] https://java.com/en/download/faq/java_7.xml
> > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > [3] http://openjdk.java.net/projects/jdk9/
> > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > [8] https://webtide.com/jetty-9-3-features/
> > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > [10]
> >
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > [11] http://markmail.org/message/l7s276y3xkga2eqf
> >
>


[jira] [Updated] (KAFKA-3753) Add approximateNumEntries() to the StateStore interface for metrics reporting

2016-06-10 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-3753:
---
Summary: Add approximateNumEntries() to the StateStore interface for 
metrics reporting  (was: Add size() to the StateStore interface for metrics 
reporting)

> Add approximateNumEntries() to the StateStore interface for metrics reporting
> -
>
> Key: KAFKA-3753
> URL: https://issues.apache.org/jira/browse/KAFKA-3753
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> As a developer building a Kafka Streams application, I'd like to have 
> visibility into what's happening with my state stores. How can I know if a 
> particular store is growing large? How can I know if a particular store is 
> frequently needing to hit disk?
> I'm interested to know if there are existing mechanisms for extracting this 
> information or if other people have thoughts on how we might approach this.
> I can't think of a way to provide metrics generically, so each state store 
> implementation would likely need to handle this separately. Given that the 
> default RocksDBStore will likely be the most-used, it would be a first target 
> for adding metrics.
> I'd be interested in knowing the total number of entries in the store, the 
> total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
> for the MemoryLRUCache. Some of these numbers are likely calculable through 
> the RocksDB API, others may simply not be accessible.
> Would there be value to the wider community in having state stores register 
> metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references

2016-06-10 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325270#comment-15325270
 ] 

Jeff Klukas commented on KAFKA-3801:


I like concise code that you get from a static method reference: 
{{mapValues(LongDeserializer::deserialize)}} as opposed to {{mapValues(bytes -> 
Serdes.Long().deserializer().deserialize(bytes))}}, but you're correct that 
it's possible to use {{Serdes}} here inline.

I'm fine to close this in deference to a more comprehensive solution down the 
line.

> Provide static serialize() and deserialize() for use as method references
> -
>
> Key: KAFKA-3801
> URL: https://issues.apache.org/jira/browse/KAFKA-3801
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} 
> are abstracted away in Kafka Streams through the use of `Serdes` classes, 
> there are some instances where developers may want to call them directly. The 
> serializers and deserializers for simple types don't require any 
> configuration and could be static, but currently it's necessary to create an 
> instance to use those methods.
> I'd propose moving serialization logic into a {{static public byte[] 
> serialize(? data)}} method and deserialization logic into a {{static public ? 
> deserialize(byte[] data)}} method. The existing instance methods would simply 
> call the static versions.
> See a full example for LongSerializer and LongDeserializer here:
> https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1
> In Java 8, these static methods then become available for method references 
> in code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the 
> user needing to create an instance of {{LongDeserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references

2016-06-09 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323718#comment-15323718
 ] 

Jeff Klukas commented on KAFKA-3801:


Let me give more context for the example. We have an application that produces 
JSON messages to a Kafka topic interleaved with occasional checkpoint messages 
that are of {{Long}} type.

If I want to create a KStream of just the checkpoint messages, I need to filter 
out the JSON messages before deserializing. Here's what it looks like:

{{KStream<Long, Long> checkpointStream = builder.stream(Serdes.Long(), 
Serdes.ByteArray(), inputTopicName)}}
{{.filter((key, bytes) -> bytes.length == 
8).mapValues(LongDeserializer::deserialize)}}

I need to use ByteArraySerde when calling {{stream}}, then I do the 
deserialization in a {{mapValues}} invocation after filtering out messages of 
the wrong type.

Another option would be to materialize the stream to a topic after the filter 
and then call {{builder.stream(Serdes.Long(), Serdes.Long(), newTopicName)}}, 
but I'd like to avoid unnecessary materialization.

So in the current scheme, I need to create an instance of {{LongDeserializer}} 
separately so that I can then call its {{deserialize}} method in {{mapValues}}.

This situation probably won't occur frequently, so I understand if it's decided 
not to bother considering this change.

> Provide static serialize() and deserialize() for use as method references
> -
>
> Key: KAFKA-3801
> URL: https://issues.apache.org/jira/browse/KAFKA-3801
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} 
> are abstracted away in Kafka Streams through the use of `Serdes` classes, 
> there are some instances where developers may want to call them directly. The 
> serializers and deserializers for simple types don't require any 
> configuration and could be static, but currently it's necessary to create an 
> instance to use those methods.
> I'd propose moving serialization logic into a {{static public byte[] 
> serialize(? data)}} method and deserialization logic into a {{static public ? 
> deserialize(byte[] data)}} method. The existing instance methods would simply 
> call the static versions.
> See a full example for LongSerializer and LongDeserializer here:
> https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1
> In Java 8, these static methods then become available for method references 
> in code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the 
> user needing to create an instance of {{LongDeserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3753) Add size() to the StateStore interface for metrics reporting

2016-06-09 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323342#comment-15323342
 ] 

Jeff Klukas commented on KAFKA-3753:


I agree that {{size}} is a bit ambiguous. I'd expect many folks would recognize 
the analogy to Java's {{Map.size}}, but there's no reason we need to stick with 
that.

RocksDB has nice consistency in their naming of properties, where {{size}} 
refers to bytes and {{num-entries}} refers to a count.

The PR discussion also brings up the point that since we can only get an 
estimated count from RocksDB, the name of this method should indicate that the 
result is not necessarily exact.

I'd be happy to see this method called {{estimatedCount}}, 
{{estimatedNumEntries}}, {{approximateCount}}, or some other variant.

> Add size() to the StateStore interface for metrics reporting
> 
>
> Key: KAFKA-3753
> URL: https://issues.apache.org/jira/browse/KAFKA-3753
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> As a developer building a Kafka Streams application, I'd like to have 
> visibility into what's happening with my state stores. How can I know if a 
> particular store is growing large? How can I know if a particular store is 
> frequently needing to hit disk?
> I'm interested to know if there are existing mechanisms for extracting this 
> information or if other people have thoughts on how we might approach this.
> I can't think of a way to provide metrics generically, so each state store 
> implementation would likely need to handle this separately. Given that the 
> default RocksDBStore will likely be the most-used, it would be a first target 
> for adding metrics.
> I'd be interested in knowing the total number of entries in the store, the 
> total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
> for the MemoryLRUCache. Some of these numbers are likely calculable through 
> the RocksDB API, others may simply not be accessible.
> Would there be value to the wider community in having state stores register 
> metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3817) KTableRepartitionMap should handle null inputs

2016-06-09 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3817:
--

 Summary: KTableRepartitionMap should handle null inputs
 Key: KAFKA-3817
 URL: https://issues.apache.org/jira/browse/KAFKA-3817
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Jeff Klukas
Assignee: Guozhang Wang
 Fix For: 0.10.0.1


When calling {{KTable.groupBy}} on the result of a KTable-KTable join, NPEs are 
raised:

{{org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$
> KTableMapProcessor.process(KTableRepartitionMap.java:88)}}

The root cause is that the join is expected to emit null values when no match 
is found, but KTableRepartitionMap is not set up to handle this case.

On the users email list, [~guozhang] described a plan of action:

I think this is actually a bug in KTableRepartitionMap
that it actually should expect null grouped keys; this would be a
straight-forward fix for this operator, but I can make a pass over all the
repartition operators just to make sure they are all gracefully handling
null keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3753) Metrics for StateStores

2016-06-09 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322683#comment-15322683
 ] 

Jeff Klukas commented on KAFKA-3753:


Since there's refactoring going on right now with the metrics interface for 
streams (https://issues.apache.org/jira/browse/KAFKA-3715), I think we should 
delay actually adding size metrics to a different issue.

The PR I attached here adds the size() method so that it can be used for a 
metric in the future.

> Metrics for StateStores
> ---
>
> Key: KAFKA-3753
> URL: https://issues.apache.org/jira/browse/KAFKA-3753
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> As a developer building a Kafka Streams application, I'd like to have 
> visibility into what's happening with my state stores. How can I know if a 
> particular store is growing large? How can I know if a particular store is 
> frequently needing to hit disk?
> I'm interested to know if there are existing mechanisms for extracting this 
> information or if other people have thoughts on how we might approach this.
> I can't think of a way to provide metrics generically, so each state store 
> implementation would likely need to handle this separately. Given that the 
> default RocksDBStore will likely be the most-used, it would be a first target 
> for adding metrics.
> I'd be interested in knowing the total number of entries in the store, the 
> total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
> for the MemoryLRUCache. Some of these numbers are likely calculable through 
> the RocksDB API, others may simply not be accessible.
> Would there be value to the wider community in having state stores register 
> metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references

2016-06-07 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3801:
--

 Summary: Provide static serialize() and deserialize() for use as 
method references
 Key: KAFKA-3801
 URL: https://issues.apache.org/jira/browse/KAFKA-3801
 Project: Kafka
  Issue Type: Improvement
  Components: clients, streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} 
are abstracted away in Kafka Streams through the use of `Serdes` classes, there 
are some instances where developers may want to call them directly. The 
serializers and deserializers for simple types don't require any configuration 
and could be static, but currently it's necessary to create an instance to use 
those methods.

I'd propose moving serialization logic into a {{static public byte[] 
serialize(? data)}} method and deserialization logic into a {{static public ? 
deserialize(byte[] data)}} method. The existing instance methods would simply 
call the static versions.

See a full example for LongSerializer and LongDeserializer here:

https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1

In Java 8, these static methods then become available for method references in 
code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the user 
needing to create an instance of {{LongDeserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3711) Allow configuration of MetricsReporter subclasses

2016-06-07 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318768#comment-15318768
 ] 

Jeff Klukas commented on KAFKA-3711:


I submitted a PR above to call {{originals()}} when getting configured 
instances. Documentation changes can be a separate issue.

> Allow configuration of MetricsReporter subclasses
> -
>
> Key: KAFKA-3711
> URL: https://issues.apache.org/jira/browse/KAFKA-3711
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The current interface for attaching metrics reporters to clients allows only 
> defining a list of class names, but provides no means for configuring those 
> reporters.
> There is at least one existing project 
> (https://github.com/apakulov/kafka-graphite) that solves this problem by 
> passing additional properties into the client, which then get passed on to 
> the reporter. This seems to work quite well, but it generates warnings like 
> {{The configuration kafka.graphite.metrics.prefix = foo was supplied but 
> isn't a known config.}}
> Should passing arbitrary additional parameters like this be officially 
> supported as the way to configure metrics reporters? Should these warnings 
> about unrecognized parameters be removed?
> Perhaps there should be some mechanism for registering additional 
> configuration parameters for clients to expect?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3711) Allow configuration of MetricsReporter subclasses

2016-06-07 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318546#comment-15318546
 ] 

Jeff Klukas commented on KAFKA-3711:


I would love to see this fixed as well. I'd also love to see some more 
documentation about Kafka's config framework and how various interfaces like 
{{MetricsReporter}} are intended to be used. Specifically, I'd like to see the 
developer docs describe that it's intended that user-provided classes can 
define configuration options and provide some advice about how to name those 
options.

I'd be interested in contributing docs and cleaning up the config code if it's 
clear what needs to be done. Am I correct in understanding that the code change 
needed here is to ensure that {{originals()}} is called rather than 
{{this.originals}} everywhere that we're passing configs on in 
{{AbstractConfig}}?

> Allow configuration of MetricsReporter subclasses
> -
>
> Key: KAFKA-3711
> URL: https://issues.apache.org/jira/browse/KAFKA-3711
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The current interface for attaching metrics reporters to clients allows only 
> defining a list of class names, but provides no means for configuring those 
> reporters.
> There is at least one existing project 
> (https://github.com/apakulov/kafka-graphite) that solves this problem by 
> passing additional properties into the client, which then get passed on to 
> the reporter. This seems to work quite well, but it generates warnings like 
> {{The configuration kafka.graphite.metrics.prefix = foo was supplied but 
> isn't a known config.}}
> Should passing arbitrary additional parameters like this be officially 
> supported as the way to configure metrics reporters? Should these warnings 
> about unrecognized parameters be removed?
> Perhaps there should be some mechanism for registering additional 
> configuration parameters for clients to expect?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3753) Metrics for StateStores

2016-06-07 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318521#comment-15318521
 ] 

Jeff Klukas commented on KAFKA-3753:


It's good to know about MeteredKeyValueStore. And holding off on cache-related 
metrics for the redesign sounds reasonable.

It looks like the KeyValueStore interface does not provide any method related 
to number of entries (besides calling `all()` and iterating over all values, 
which doesn't seem reasonable), so there's no way for MeteredKeyValueStore to 
access the number of entries in the wrapped store. Could we consider adding a 
`size` method to the KeyValueStore interface?

> Metrics for StateStores
> ---
>
> Key: KAFKA-3753
> URL: https://issues.apache.org/jira/browse/KAFKA-3753
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: api
> Fix For: 0.10.1.0
>
>
> As a developer building a Kafka Streams application, I'd like to have 
> visibility into what's happening with my state stores. How can I know if a 
> particular store is growing large? How can I know if a particular store is 
> frequently needing to hit disk?
> I'm interested to know if there are existing mechanisms for extracting this 
> information or if other people have thoughts on how we might approach this.
> I can't think of a way to provide metrics generically, so each state store 
> implementation would likely need to handle this separately. Given that the 
> default RocksDBStore will likely be the most-used, it would be a first target 
> for adding metrics.
> I'd be interested in knowing the total number of entries in the store, the 
> total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
> for the MemoryLRUCache. Some of these numbers are likely calculable through 
> the RocksDB API, others may simply not be accessible.
> Would there be value to the wider community in having state stores register 
> metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3753) Metrics for StateStores

2016-05-25 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3753:
--

 Summary: Metrics for StateStores
 Key: KAFKA-3753
 URL: https://issues.apache.org/jira/browse/KAFKA-3753
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


As a developer building a Kafka Streams application, I'd like to have 
visibility into what's happening with my state stores. How can I know if a 
particular store is growing large? How can I know if a particular store is 
frequently needing to hit disk?

I'm interested to know if there are existing mechanisms for extracting this 
information or if other people have thoughts on how we might approach this.

I can't think of a way to provide metrics generically, so each state store 
implementation would likely need to handle this separately. Given that the 
default RocksDBStore will likely be the most-used, it would be a first target 
for adding metrics.

I'd be interested in knowing the total number of entries in the store, the 
total size on disk and in memory, rates of gets and puts, and hit/miss ratio 
for the MemoryLRUCache. Some of these numbers are likely calculable through the 
RocksDB API, others may simply not be accessible.

Would there be value to the wider community in having state stores register 
metrics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3714) Allow users greater access to register custom streams metrics

2016-05-16 Thread Jeff Klukas (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas updated KAFKA-3714:
---
Issue Type: Improvement  (was: Bug)

> Allow users greater access to register custom streams metrics
> -
>
> Key: KAFKA-3714
> URL: https://issues.apache.org/jira/browse/KAFKA-3714
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> Copying in some discussion that originally appeared in 
> https://github.com/apache/kafka/pull/1362#issuecomment-219064302
> Kafka Streams is largely a higher-level abstraction on top of producers and 
> consumers, and it seems sensible to match the KafkaStreams interface to that 
> of KafkaProducer and KafkaConsumer where possible. For producers and 
> consumers, the metric registry is internal and metrics are only exposed as an 
> unmodifiable map. This allows users to access client metric values for use in 
> application health checks, etc., but doesn't allow them to register new 
> metrics.
> That approach seems reasonable if we assume that a user interested in 
> defining custom metrics is already going to be using a separate metrics 
> library. In such a case, users will likely find it easier to define metrics 
> using whatever library they're familiar with rather than learning the API for 
> Kafka's Metrics class. Is this a reasonable assumption?
> If we want to expose the Metrics instance so that users can define arbitrary 
> metrics, I'd argue that there's need for documentation updates. In 
> particular, I find the notion of metric tags confusing. Tags can be defined 
> in a MetricConfig when the Metrics instance is constructed, 
> StreamsMetricsImpl is maintaining its own set of tags, and users can set tag 
> overrides.
> If a user were to get access to the Metrics instance, they would be missing 
> the tags defined in StreamsMetricsImpl. I'm imagining that users would want 
> their custom metrics to sit alongside the predefined metrics with the same 
> tags, and users shouldn't be expected to manage those additional tags 
> themselves.
> So, why are we allowing users to define their own metrics via the 
> StreamsMetrics interface in the first place? Is it that we'd like to be able 
> to provide a built-in latency metric, but the definition depends on the 
> details of the use case so there's no generic solution? That would be 
> sufficient motivation for this special case of addLatencySensor. If we want 
> to continue down that path and give users access to define a wider range of 
> custom metrics, I'd prefer to extend the StreamsMetrics interface so that 
> users can call methods on that object, automatically getting the tags 
> appropriate for that instance rather than interacting with the raw Metrics 
> instance.
> ---
> Guozhang had the following comments:
> 1) For the producer/consumer cases, all internal metrics are provided and 
> abstracted from users, and they just need to read the documentation to poll 
> whatever provided metrics that they are interested; and if they want to 
> define more metrics, they are likely to be outside the clients themselves and 
> they can use whatever methods they like, so Metrics do not need to be exposed 
> to users.
> 2) For streams, things are a bit different: users define the computational 
> logic, which becomes part of the "Streams Client" processing and may be of 
> interests to be monitored by user themselves; think of a customized processor 
> that sends an email to some address based on a condition, and users want to 
> monitor the average rate of emails sent. Hence it is worth considering 
> whether or not they should be able to access the Metrics instance to define 
> their own along side the pre-defined metrics provided by the library.
> 3) Now, since the Metrics class was not previously designed for public usage, 
> it is not designed to be very user-friendly for defining sensors, especially 
> the semantics differences between name / scope / tags. StreamsMetrics tries 
> to hide some of these semantics confusion from users, but it still expose 
> tags and hence is not perfect in doing so. We need to think of a better 
> approach so that: 1) user defined metrics will be "aligned" (i.e. with the 
> same name prefix within a single application, with similar scope hierarchy 
> definition, etc) with library provided metrics, 2) natural APIs to do so.
> I do not have concrete ideas about 3) above on top of my head,

[jira] [Commented] (KAFKA-3715) Higher granularity streams metrics

2016-05-16 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284706#comment-15284706
 ] 

Jeff Klukas commented on KAFKA-3715:


It would be interesting to work on this, but I won't likely have time in the 
near future, so others should feel free to implement these ideas.

> Higher granularity streams metrics 
> ---
>
> Key: KAFKA-3715
> URL: https://issues.apache.org/jira/browse/KAFKA-3715
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> Originally proposed by [~guozhang] in 
> https://github.com/apache/kafka/pull/1362#issuecomment-218326690
> We can consider adding metrics for process / punctuate / commit rate at the 
> granularity of each processor node in addition to the global rate mentioned 
> above. This is very helpful in debugging.
> We can consider adding rate / total cumulated metrics for context.forward 
> indicating how many records were forwarded downstream from this processor 
> node as well. This is helpful in debugging.
> We can consider adding metrics for each stream partition's timestamp. This is 
> helpful in debugging.
> Besides the latency metrics, we can also add throughput latency in terms of 
> source records consumed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3715) Higher granularity streams metrics

2016-05-16 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3715:
--

 Summary: Higher granularity streams metrics 
 Key: KAFKA-3715
 URL: https://issues.apache.org/jira/browse/KAFKA-3715
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


Originally proposed by [~guozhang] in 
https://github.com/apache/kafka/pull/1362#issuecomment-218326690

We can consider adding metrics for process / punctuate / commit rate at the 
granularity of each processor node in addition to the global rate mentioned 
above. This is very helpful in debugging.

We can consider adding rate / total cumulated metrics for context.forward 
indicating how many records were forwarded downstream from this processor node 
as well. This is helpful in debugging.

We can consider adding metrics for each stream partition's timestamp. This is 
helpful in debugging.

Besides the latency metrics, we can also add throughput latency in terms of 
source records consumed.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3714) Allow users greater access to register custom streams metrics

2016-05-16 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3714:
--

 Summary: Allow users greater access to register custom streams 
metrics
 Key: KAFKA-3714
 URL: https://issues.apache.org/jira/browse/KAFKA-3714
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


Copying in some discussion that originally appeared in 
https://github.com/apache/kafka/pull/1362#issuecomment-219064302

Kafka Streams is largely a higher-level abstraction on top of producers and 
consumers, and it seems sensible to match the KafkaStreams interface to that of 
KafkaProducer and KafkaConsumer where possible. For producers and consumers, 
the metric registry is internal and metrics are only exposed as an unmodifiable 
map. This allows users to access client metric values for use in application 
health checks, etc., but doesn't allow them to register new metrics.

That approach seems reasonable if we assume that a user interested in defining 
custom metrics is already going to be using a separate metrics library. In such 
a case, users will likely find it easier to define metrics using whatever 
library they're familiar with rather than learning the API for Kafka's Metrics 
class. Is this a reasonable assumption?

If we want to expose the Metrics instance so that users can define arbitrary 
metrics, I'd argue that there's need for documentation updates. In particular, 
I find the notion of metric tags confusing. Tags can be defined in a 
MetricConfig when the Metrics instance is constructed, StreamsMetricsImpl is 
maintaining its own set of tags, and users can set tag overrides.

If a user were to get access to the Metrics instance, they would be missing the 
tags defined in StreamsMetricsImpl. I'm imagining that users would want their 
custom metrics to sit alongside the predefined metrics with the same tags, and 
users shouldn't be expected to manage those additional tags themselves.

So, why are we allowing users to define their own metrics via the 
StreamsMetrics interface in the first place? Is it that we'd like to be able to 
provide a built-in latency metric, but the definition depends on the details of 
the use case so there's no generic solution? That would be sufficient 
motivation for this special case of addLatencySensor. If we want to continue 
down that path and give users access to define a wider range of custom metrics, 
I'd prefer to extend the StreamsMetrics interface so that users can call 
methods on that object, automatically getting the tags appropriate for that 
instance rather than interacting with the raw Metrics instance.

---

Guozhang had the following comments:

1) For the producer/consumer cases, all internal metrics are provided and 
abstracted from users, and they just need to read the documentation to poll 
whatever provided metrics that they are interested; and if they want to define 
more metrics, they are likely to be outside the clients themselves and they can 
use whatever methods they like, so Metrics do not need to be exposed to users.

2) For streams, things are a bit different: users define the computational 
logic, which becomes part of the "Streams Client" processing and may be of 
interests to be monitored by user themselves; think of a customized processor 
that sends an email to some address based on a condition, and users want to 
monitor the average rate of emails sent. Hence it is worth considering whether 
or not they should be able to access the Metrics instance to define their own 
along side the pre-defined metrics provided by the library.

3) Now, since the Metrics class was not previously designed for public usage, 
it is not designed to be very user-friendly for defining sensors, especially 
the semantics differences between name / scope / tags. StreamsMetrics tries to 
hide some of these semantics confusion from users, but it still expose tags and 
hence is not perfect in doing so. We need to think of a better approach so 
that: 1) user defined metrics will be "aligned" (i.e. with the same name prefix 
within a single application, with similar scope hierarchy definition, etc) with 
library provided metrics, 2) natural APIs to do so.

I do not have concrete ideas about 3) above on top of my head, comments are 
more than welcomed.

---

I'm not sure that I agree that 1) and 2) are truly different situations. A user 
might choose to send email messages within a bare consumer rather than a 
streams application, and still want to maintain a metric of sent emails. In 
this bare consumer case, we'd expect the user to define that email-sent metric 
outside of Kafka's metrics machinery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3711) Allow configuration of MetricsReporter subclasses

2016-05-13 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3711:
--

 Summary: Allow configuration of MetricsReporter subclasses
 Key: KAFKA-3711
 URL: https://issues.apache.org/jira/browse/KAFKA-3711
 Project: Kafka
  Issue Type: Improvement
  Components: clients, streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


The current interface for attaching metrics reporters to clients allows only 
defining a list of class names, but provides no means for configuring those 
reporters.

There is at least one existing project 
(https://github.com/apakulov/kafka-graphite) that solves this problem by 
passing additional properties into the client, which then get passed on to the 
reporter. This seems to work quite well, but it generates warnings like {{The 
configuration kafka.graphite.metrics.prefix = foo was supplied but isn't a 
known config.}}

Should passing arbitrary additional parameters like this be officially 
supported as the way to configure metrics reporters? Should these warnings 
about unrecognized parameters be removed?

Perhaps there should be some mechanism for registering additional configuration 
parameters for clients to expect?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3701) Expose KafkaStreams metrics in public API

2016-05-11 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3701:
--

 Summary: Expose KafkaStreams metrics in public API
 Key: KAFKA-3701
 URL: https://issues.apache.org/jira/browse/KAFKA-3701
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.1.0


The Kafka clients expose their metrics registries through a `metrics` method 
presenting an unmodifiable collection, but `KafkaStreams` does not expose its 
registry. Currently, applications can access a StreamsMetrics instance via the 
ProcessorContext within a Processor, but this limits flexibility.

Having read-only access to a KafkaStreams.metrics() method would allow a 
developer to define a health check for their application based on the metrics 
that KafkaStreams is collecting. Or a developer might want to define a metric 
in some other framework based on KafkaStreams' metrics.

I am imagining that an application would build and register KafkaStreams-based 
health checks after building a KafkaStreams instance but before calling the 
start() method. Are metrics added to the registry at the time a KafkaStreams 
instance is constructed, or only after calling the start() method? If metrics 
are registered only after application startup, then this approach may not be 
sufficient.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3625) Move kafka-streams test fixtures into a published package

2016-04-26 Thread Jeff Klukas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258338#comment-15258338
 ] 

Jeff Klukas commented on KAFKA-3625:


I would be willing to submit this patch if folks think it has merit and there's 
agreement as to how the package should be named.

> Move kafka-streams test fixtures into a published package
> -
>
> Key: KAFKA-3625
> URL: https://issues.apache.org/jira/browse/KAFKA-3625
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>    Reporter: Jeff Klukas
>Assignee: Guozhang Wang
>Priority: Minor
> Fix For: 0.10.0.0
>
>
> The KStreamTestDriver and related fixtures defined in 
> streams/src/test/java/org/apache/kafka/test would be useful to developers 
> building applications on top of Kafka Streams, but they are not currently 
> exposed in a package.
> I propose moving this directory to live under streams/fixtures/src/main and 
> creating a new 'streams:fixtures' project in the gradle configuration to 
> publish these as a separate package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3625) Move kafka-streams test fixtures into a published package

2016-04-26 Thread Jeff Klukas (JIRA)
Jeff Klukas created KAFKA-3625:
--

 Summary: Move kafka-streams test fixtures into a published package
 Key: KAFKA-3625
 URL: https://issues.apache.org/jira/browse/KAFKA-3625
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jeff Klukas
Assignee: Guozhang Wang
Priority: Minor
 Fix For: 0.10.0.0


The KStreamTestDriver and related fixtures defined in 
streams/src/test/java/org/apache/kafka/test would be useful to developers 
building applications on top of Kafka Streams, but they are not currently 
exposed in a package.

I propose moving this directory to live under streams/fixtures/src/main and 
creating a new 'streams:fixtures' project in the gradle configuration to 
publish these as a separate package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)