Re: [DISCUSS] KIP-247: Add public test utils for Kafka Streams
>From what I can tell, global state stores are managed separately from other state stores and are accessed via different methods. Do the proposed methods on TopologyTestDriver (such as getStateStore) cover global stores? If not, can we add an interface for accessing and testing global stores in the scope of this KIP? On Thu, Jan 11, 2018 at 9:06 PM, Matthias J. Saxwrote: > Dear Kafka community, > > I want to propose KIP-247 to add public test utils to the Streams API. > The goal is to simplify testing of Kafka Streams applications. > > Please find details in the wiki: > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 247%3A+Add+public+test+utils+for+Kafka+Streams > > This is an initial KIP, and we hope to add more utility functions later. > Thus, this KIP is not comprehensive but a first step. Of course, we can > enrich this initial KIP if we think it falls too short. But we should > not aim to be comprehensive to keep the scope manageable. > > In fact, I think we should add some more helpers to simplify result > verification. I will update the KIP with this asap. Just wanted to start > the discussion early on. > > An initial WIP PR can be found here: > https://github.com/apache/kafka/pull/4402 > > I also included the user-list (please hit "reply-all" to include both > lists in this KIP discussion). > > Thanks a lot. > > > -Matthias > > >
Re: [VOTE] KIP-215: Add topic regex support for Connect sinks
I've made all the changes on the KIP-215 and general KIP page to mark this as approved. Just waiting on approval and merging of the PR at this point. On Thu, Nov 9, 2017 at 9:36 AM, Randall Hauch <rha...@gmail.com> wrote: > Jeff, > > This KIP does pass with 3 binding +1s and no other binding votes, and with > more than 72 hour for voting. Do you want to update the KIP-215 and KIP > list pages accordingly? We can also merge the PR. > > Thanks, and very nice work! > > Randall > > On Fri, Nov 3, 2017 at 8:12 PM, Stephane Maarek < > steph...@simplemachines.com.au> wrote: > > > +1! The S3 or hdfs connector will now be super powerful ! > > > > On 4 Nov. 2017 11:27 am, "Konstantine Karantasis" < > > konstant...@confluent.io> > > wrote: > > > > > Nice addition! > > > > > > +1 (non-binding) > > > > > > Konstantine > > > > > > On Fri, Nov 3, 2017 at 4:52 PM, Jeff Klukas <jklu...@simple.com> > wrote: > > > > > > > So sorry for skirting the process there. I wasn't aware of the 72 > hour > > > > window and I don't see that mentioned in in > > > > https://cwiki.apache.org/confluence/display/KAFKA/ > Bylaws#Bylaws-Voting > > > > > > > > Should I feel free to update that wiki page with a note about the > > window? > > > > > > > > On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava < > > e...@confluent.io > > > > > > > > wrote: > > > > > > > > > Jeff, > > > > > > > > > > Just FYI re: process, I think you're pretty much definitely in the > > > clear > > > > > hear since this one is a straightforward design I doubt anybody > would > > > > > object to, but voting normally stays open 72h to ensure everyone > has > > a > > > > > chance to weigh in. > > > > > > > > > > Again thanks for the KIP and we can move any final discussion over > to > > > the > > > > > PR! > > > > > > > > > > -Ewen > > > > > > > > > > On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas <jklu...@simple.com> > > > wrote: > > > > > > > > > > > Looks like we've achieved lazy majority, so I'll move the KIP to > > > > > approved. > > > > > > > > > > > > Thanks all for looking this over. > > > > > > > > > > > > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson < > > ja...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > +1. Thanks for the KIP! > > > > > > > > > > > > > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang < > > wangg...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > +1 binding > > > > > > > > > > > > > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava < > > > > > > e...@confluent.io > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > +1 binding > > > > > > > > > > > > > > > > > > Thanks Jeff! > > > > > > > > > > > > > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch < > > > rha...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > Thanks for pushing this through. Great work! > > > > > > > > > > > > > > > > > > > > Randall Hauch > > > > > > > > > > > > > > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas < > > > > jklu...@simple.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > I haven't heard any additional concerns over the > > proposal, > > > so > > > > > I'd > > > > > > > > like > > > > > > > > > to > > > > > > > > > > > get the voting process started for: > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks > > > > > > > > > > > > > > > > > > > > > > It adds a topics.regex option for Kafka Connect sinks > as > > an > > > > > > > > alternative > > > > > > > > > > to > > > > > > > > > > > the existing required topics option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [VOTE] KIP-215: Add topic regex support for Connect sinks
So sorry for skirting the process there. I wasn't aware of the 72 hour window and I don't see that mentioned in in https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Voting Should I feel free to update that wiki page with a note about the window? On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > Jeff, > > Just FYI re: process, I think you're pretty much definitely in the clear > hear since this one is a straightforward design I doubt anybody would > object to, but voting normally stays open 72h to ensure everyone has a > chance to weigh in. > > Again thanks for the KIP and we can move any final discussion over to the > PR! > > -Ewen > > On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas <jklu...@simple.com> wrote: > > > Looks like we've achieved lazy majority, so I'll move the KIP to > approved. > > > > Thanks all for looking this over. > > > > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <ja...@confluent.io> > > wrote: > > > > > +1. Thanks for the KIP! > > > > > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > +1 binding > > > > > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava < > > e...@confluent.io > > > > > > > > wrote: > > > > > > > > > +1 binding > > > > > > > > > > Thanks Jeff! > > > > > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <rha...@gmail.com> > > > wrote: > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > Thanks for pushing this through. Great work! > > > > > > > > > > > > Randall Hauch > > > > > > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <jklu...@simple.com> > > > > wrote: > > > > > > > > > > > > > I haven't heard any additional concerns over the proposal, so > I'd > > > > like > > > > > to > > > > > > > get the voting process started for: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks > > > > > > > > > > > > > > It adds a topics.regex option for Kafka Connect sinks as an > > > > alternative > > > > > > to > > > > > > > the existing required topics option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > >
Re: [VOTE] KIP-215: Add topic regex support for Connect sinks
Looks like we've achieved lazy majority, so I'll move the KIP to approved. Thanks all for looking this over. On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <ja...@confluent.io> wrote: > +1. Thanks for the KIP! > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > +1 binding > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <e...@confluent.io > > > > wrote: > > > > > +1 binding > > > > > > Thanks Jeff! > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <rha...@gmail.com> > wrote: > > > > > > > +1 (non-binding) > > > > > > > > Thanks for pushing this through. Great work! > > > > > > > > Randall Hauch > > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <jklu...@simple.com> > > wrote: > > > > > > > > > I haven't heard any additional concerns over the proposal, so I'd > > like > > > to > > > > > get the voting process started for: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks > > > > > > > > > > It adds a topics.regex option for Kafka Connect sinks as an > > alternative > > > > to > > > > > the existing required topics option. > > > > > > > > > > > > > > > > > > > > -- > > -- Guozhang > > >
[VOTE] KIP-215: Add topic regex support for Connect sinks
I haven't heard any additional concerns over the proposal, so I'd like to get the voting process started for: https://cwiki.apache.org/confluence/display/KAFKA/KIP-215%3A+Add+topic+regex+support+for+Connect+sinks It adds a topics.regex option for Kafka Connect sinks as an alternative to the existing required topics option.
Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks
I responded to Ewen's suggestions in the PR and went back to using ConfigException. If I don't hear any other concerns today, I'll start a [VOTE] thread for the KIP. On Mon, Oct 30, 2017 at 9:29 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > I took a quick pass at the PR, looks good so far. ConfigException would > still be fine in the case you're highlighting as it's inside the framework > anyway and we'd expect a ConfigException from configure() if connectors try > to use their ConfigDef to parse an invalid config. But here I don't feel > strongly about which to use since the error message is clear anyway and > will just end up in logs / the REST API for the user to sort out. > > -Ewen > > On Fri, Oct 27, 2017 at 6:39 PM, Jeff Klukas <jklu...@simple.com> wrote: > > > I've updated the KIP to use the topics.regex name and opened a WIP PR > with > > an implementation that shows some additional complexity in how the > > configuration option gets passed through, affecting various public > function > > signatures. > > > > I would appreciate any eyes on that for feedback on whether more design > > discussion needs to happen in the KIP. > > > > https://github.com/apache/kafka/pull/4151 > > > > On Fri, Oct 27, 2017 at 7:50 AM, Jeff Klukas <jklu...@simple.com> wrote: > > > > > I added a note in the KIP about ConfigException being thrown. I also > > > changed the proposed default for the new config to empty string rather > > than > > > null. > > > > > > Absent a clear definition of what "common" regex syntax is, it seems an > > > undue burden to ask the user to guess at what Pattern features are > safe. > > If > > > we do end up implementing a different regex style, I think it will be > > > necessary to still support the Java Pattern style long-term as an > option. > > > If we want to use a different regex style as default down the road, we > > > could require "power users" of Java Pattern to enable an additional > > config > > > option to maintain compatibility. > > > > > > One additional change I might make to the KIP is that 'topics.regex' > > might > > > be a better choice for config name than 'topics.pattern'. That would be > > in > > > keeping with RegexRouter that has a 'regex' configuration option rather > > > than 'pattern'. > > > > > > On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava < > > e...@confluent.io > > > > wrote: > > > > > >> It's fine to be more detailed, but ConfigException is already implied > > for > > >> all other config issues as well. > > >> > > >> Default could be either null or just empty string. re: alternatives, > if > > >> you > > >> wanted to be slightly more detailed (though still a bit vague) re: > > >> supported syntax, you could just say that while Pattern is used, we > only > > >> guarantee support for common regular expression syntax. Not sure if > > >> there's > > >> a good way of defining what "common" syntax is. > > >> > > >> Otherwise LGTM, and thanks for helping fill in a longstanding gap! > > >> > > >> -Ewen > > >> > > >> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > >> > > >> > bq. Users may specify only one of 'topics' or 'topics.pattern'. > > >> > > > >> > Can you fill in which exception would be thrown if both of them are > > >> > specified > > >> > ? > > >> > > > >> > Cheers > > >> > > > >> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net> > wrote: > > >> > > > >> > > Looking for feedback on > > >> > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >> > > 215%3A+Add+topic+regex+support+for+Connect+sinks > > >> > > > > >> > > > >> > > > > > > > > >
Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks
I've updated the KIP to use the topics.regex name and opened a WIP PR with an implementation that shows some additional complexity in how the configuration option gets passed through, affecting various public function signatures. I would appreciate any eyes on that for feedback on whether more design discussion needs to happen in the KIP. https://github.com/apache/kafka/pull/4151 On Fri, Oct 27, 2017 at 7:50 AM, Jeff Klukas <jklu...@simple.com> wrote: > I added a note in the KIP about ConfigException being thrown. I also > changed the proposed default for the new config to empty string rather than > null. > > Absent a clear definition of what "common" regex syntax is, it seems an > undue burden to ask the user to guess at what Pattern features are safe. If > we do end up implementing a different regex style, I think it will be > necessary to still support the Java Pattern style long-term as an option. > If we want to use a different regex style as default down the road, we > could require "power users" of Java Pattern to enable an additional config > option to maintain compatibility. > > One additional change I might make to the KIP is that 'topics.regex' might > be a better choice for config name than 'topics.pattern'. That would be in > keeping with RegexRouter that has a 'regex' configuration option rather > than 'pattern'. > > On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <e...@confluent.io > > wrote: > >> It's fine to be more detailed, but ConfigException is already implied for >> all other config issues as well. >> >> Default could be either null or just empty string. re: alternatives, if >> you >> wanted to be slightly more detailed (though still a bit vague) re: >> supported syntax, you could just say that while Pattern is used, we only >> guarantee support for common regular expression syntax. Not sure if >> there's >> a good way of defining what "common" syntax is. >> >> Otherwise LGTM, and thanks for helping fill in a longstanding gap! >> >> -Ewen >> >> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > bq. Users may specify only one of 'topics' or 'topics.pattern'. >> > >> > Can you fill in which exception would be thrown if both of them are >> > specified >> > ? >> > >> > Cheers >> > >> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net> wrote: >> > >> > > Looking for feedback on >> > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > 215%3A+Add+topic+regex+support+for+Connect+sinks >> > > >> > >> > >
Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks
I added a note in the KIP about ConfigException being thrown. I also changed the proposed default for the new config to empty string rather than null. Absent a clear definition of what "common" regex syntax is, it seems an undue burden to ask the user to guess at what Pattern features are safe. If we do end up implementing a different regex style, I think it will be necessary to still support the Java Pattern style long-term as an option. If we want to use a different regex style as default down the road, we could require "power users" of Java Pattern to enable an additional config option to maintain compatibility. One additional change I might make to the KIP is that 'topics.regex' might be a better choice for config name than 'topics.pattern'. That would be in keeping with RegexRouter that has a 'regex' configuration option rather than 'pattern'. On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > It's fine to be more detailed, but ConfigException is already implied for > all other config issues as well. > > Default could be either null or just empty string. re: alternatives, if you > wanted to be slightly more detailed (though still a bit vague) re: > supported syntax, you could just say that while Pattern is used, we only > guarantee support for common regular expression syntax. Not sure if there's > a good way of defining what "common" syntax is. > > Otherwise LGTM, and thanks for helping fill in a longstanding gap! > > -Ewen > > On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > bq. Users may specify only one of 'topics' or 'topics.pattern'. > > > > Can you fill in which exception would be thrown if both of them are > > specified > > ? > > > > Cheers > > > > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas <j...@klukas.net> wrote: > > > > > Looking for feedback on > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 215%3A+Add+topic+regex+support+for+Connect+sinks > > > > > >
[DISCUSS] KIP-215: Add topic regex support for Connect sinks
Looking for feedback on https://cwiki.apache.org/confluence/display/KAFKA/KIP-215%3A+Add+topic+regex+support+for+Connect+sinks
[jira] [Updated] (KAFKA-4932) Add UUID Serde
[ https://issues.apache.org/jira/browse/KAFKA-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Klukas updated KAFKA-4932: --- Description: I propose adding serializers and deserializers for the java.util.UUID class. I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and use their associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly. I'd propose that the serializer and deserializer use the 36-byte string representation, calling UUID.toString and UUID.fromString. We would also wrap these in a Serde and modify the streams Serdes class to include this in the list of supported types. Optionally, we could have the deserializer support a 16-byte representation and it would check the size of the input byte array to determine whether it's a binary or string representation of the UUID. It's not well defined whether the most significant bits or least significant go first, so this deserializer would have to support only one or the other. Similary, if the deserializer supported a 16-byte representation, there could be two variants of the serializer, a UUIDStringSerializer and a UUIDBytesSerializer. I would be willing to write this PR, but am looking for feedback about whether there are significant concerns here around ambiguity of what the byte representation of a UUID should be, or if there's desire to keep to list of built-in Serdes minimal such that a PR would be unlikely to be accepted. was: I propose adding serializers and deserializers for the java.util.UUID class. I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need turn UUIDs into strings or byte arrays and use the associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly. I'd propose that the serializer and deserializer use the 36-byte string representation, calling UUID.toString and UUID.fromString Optionally, we could also has the deserializer support a 16-byte representation and it would check size of the input byte array to determine whether it's a binary or string representation of the UUID. It's not well defined whether the most significant bits or least significant go first, so this deserializer would have to support only one or the other. Optionally, there could be two variants of the serializer, a UUIDStringSerializer and a UUIDBytesSerializer. We would also wrap these in a Serde and modify the Serdes class to include this in the list of supported types. I would be willing to write this PR, but am looking for feedback about whether there are significant concerns here around ambiguity of what the byte representation of a UUID should be, or if there's desire to keep to list of built-in Serdes minimal such that a PR would be unlikely to be accepted. > Add UUID Serde > -- > > Key: KAFKA-4932 > URL: https://issues.apache.org/jira/browse/KAFKA-4932 > Project: Kafka > Issue Type: Improvement > Components: clients, streams > Reporter: Jeff Klukas >Priority: Minor > > I propose adding serializers and deserializers for the java.util.UUID class. > I have many use cases where I want to set the key of a Kafka message to be a > UUID. Currently, I need to turn UUIDs into strings or byte arrays and use > their associated Serdes, but it would be more convenient to serialize and > deserialize UUIDs directly. > I'd propose that the serializer and deserializer use the 36-byte string > representation, calling UUID.toString and UUID.fromString. We would also wrap > these in a Serde and modify the streams Serdes class to include this in the > list of supported types. > Optionally, we could have the deserializer support a 16-byte representation > and it would check the size of the input byte array to determine whether it's > a binary or string representation of the UUID. It's not well defined whether > the most significant bits or least significant go first, so this deserializer > would have to support only one or the other. > Similary, if the deserializer supported a 16-byte representation, there could > be two variants of the serializer, a UUIDStringSerializer and a > UUIDBytesSerializer. > I would be willing to write this PR, but am looking for feedback about > whether there are significant concerns here around ambiguity of what the byte > representation of a UUID should be, or if there's desire to keep to list of > built-in Serdes minimal such that a PR would be unlikely to be accepted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-4932) Add UUID Serde
[ https://issues.apache.org/jira/browse/KAFKA-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Klukas updated KAFKA-4932: --- Summary: Add UUID Serde (was: Add UUID Serdes) > Add UUID Serde > -- > > Key: KAFKA-4932 > URL: https://issues.apache.org/jira/browse/KAFKA-4932 > Project: Kafka > Issue Type: Improvement > Components: clients, streams > Reporter: Jeff Klukas >Priority: Minor > > I propose adding serializers and deserializers for the java.util.UUID class. > I have many use cases where I want to set the key of a Kafka message to be a > UUID. Currently, I need turn UUIDs into strings or byte arrays and use the > associated Serdes, but it would be more convenient to serialize and > deserialize UUIDs directly. > I'd propose that the serializer and deserializer use the 36-byte string > representation, calling UUID.toString and UUID.fromString > Optionally, we could also has the deserializer support a 16-byte > representation and it would check size of the input byte array to determine > whether it's a binary or string representation of the UUID. It's not well > defined whether the most significant bits or least significant go first, so > this deserializer would have to support only one or the other. > Optionally, there could be two variants of the serializer, a > UUIDStringSerializer and a UUIDBytesSerializer. > We would also wrap these in a Serde and modify the Serdes class to include > this in the list of supported types. > I would be willing to write this PR, but am looking for feedback about > whether there are significant concerns here around ambiguity of what the byte > representation of a UUID should be, or if there's desire to keep to list of > built-in Serdes minimal such that a PR would be unlikely to be accepted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-4932) Add UUID Serdes
Jeff Klukas created KAFKA-4932: -- Summary: Add UUID Serdes Key: KAFKA-4932 URL: https://issues.apache.org/jira/browse/KAFKA-4932 Project: Kafka Issue Type: Improvement Components: clients, streams Reporter: Jeff Klukas Priority: Minor I propose adding serializers and deserializers for the java.util.UUID class. I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need turn UUIDs into strings or byte arrays and use the associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly. I'd propose that the serializer and deserializer use the 36-byte string representation, calling UUID.toString and UUID.fromString Optionally, we could also has the deserializer support a 16-byte representation and it would check size of the input byte array to determine whether it's a binary or string representation of the UUID. It's not well defined whether the most significant bits or least significant go first, so this deserializer would have to support only one or the other. Optionally, there could be two variants of the serializer, a UUIDStringSerializer and a UUIDBytesSerializer. We would also wrap these in a Serde and modify the Serdes class to include this in the list of supported types. I would be willing to write this PR, but am looking for feedback about whether there are significant concerns here around ambiguity of what the byte representation of a UUID should be, or if there's desire to keep to list of built-in Serdes minimal such that a PR would be unlikely to be accepted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs
[ https://issues.apache.org/jira/browse/KAFKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Klukas resolved KAFKA-4257. Resolution: Fixed Ismael's PR addresses these questions. > Inconsistencies in 0.10.1 upgrade docs > --- > > Key: KAFKA-4257 > URL: https://issues.apache.org/jira/browse/KAFKA-4257 > Project: Kafka > Issue Type: Bug > Components: documentation >Affects Versions: 0.10.1.0 > Reporter: Jeff Klukas >Priority: Minor > Fix For: 0.10.1.0 > > > There are several inconsistencies in the 0.10.1.0 upgrade docs that make it > difficult to determine what client versions are compatible with what broker > versions. > The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to > 0.10.1.0", but it includes clauses about versions prior to 0.10. Is the > intention for these instructions to be valid for upgrading from brokers as > far back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"? > I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top > of 0.10.0.X brokers. In particular, step 5 of the upgrade instructions > mentions "Once all consumers have been upgraded to 0.10.0". Should that read > 0.10.1, or is the intention here truly that clients on 9.X or below need to > be at version 0.10.0.0 at a minimum? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs
[ https://issues.apache.org/jira/browse/KAFKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548729#comment-15548729 ] Jeff Klukas commented on KAFKA-4257: The pull request definitely helps. Thanks for jumping in to make those changes. I left a comment on the PR. > Inconsistencies in 0.10.1 upgrade docs > --- > > Key: KAFKA-4257 > URL: https://issues.apache.org/jira/browse/KAFKA-4257 > Project: Kafka > Issue Type: Bug > Components: documentation >Affects Versions: 0.10.1.0 > Reporter: Jeff Klukas >Priority: Minor > Fix For: 0.10.1.0 > > > There are several inconsistencies in the 0.10.1.0 upgrade docs that make it > difficult to determine what client versions are compatible with what broker > versions. > The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to > 0.10.1.0", but it includes clauses about versions prior to 0.10. Is the > intention for these instructions to be valid for upgrading from brokers as > far back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"? > I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top > of 0.10.0.X brokers. In particular, step 5 of the upgrade instructions > mentions "Once all consumers have been upgraded to 0.10.0". Should that read > 0.10.1, or is the intention here truly that clients on 9.X or below need to > be at version 0.10.0.0 at a minimum? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4257) Inconsistencies in 0.10.1 upgrade docs
Jeff Klukas created KAFKA-4257: -- Summary: Inconsistencies in 0.10.1 upgrade docs Key: KAFKA-4257 URL: https://issues.apache.org/jira/browse/KAFKA-4257 Project: Kafka Issue Type: Bug Components: documentation Affects Versions: 0.10.1.0 Reporter: Jeff Klukas Priority: Minor Fix For: 0.10.1.0 There are several inconsistencies in the 0.10.1.0 upgrade docs that make it difficult to determine what client versions are compatible with what broker versions. The initial heading in the upgrade docs is "Upgrading from 0.10.0.X to 0.10.1.0", but it includes clauses about versions prior to 0.10. Is the intention for these instructions to be valid for upgrading from brokers as far back as 0.8? Should this section simply be called "Upgrading to 0.10.1.0"? I cannot tell from the docs whether I can upgrade to 0.10.1.0 clients on top of 0.10.0.X brokers. In particular, step 5 of the upgrade instructions mentions "Once all consumers have been upgraded to 0.10.0". Should that read 0.10.1, or is the intention here truly that clients on 9.X or below need to be at version 0.10.0.0 at a minimum? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Java 8 as a minimum requirement
On Thu, Jun 16, 2016 at 5:20 PM, Ismael Juma <ism...@juma.me.uk> wrote: > On Thu, Jun 16, 2016 at 11:13 PM, Stephen Boesch <java...@gmail.com> > wrote: > > > @Jeff Klukas What is the concern about scala 2.11 vs 2.12? 2.11 runs on > > both java7 and java8 > > > > Scala 2.10.5 and 2.10.6 also support Java 8 for what it's worth. > I was under the impression that Scala 2.12 would be the first version compatible with Java 8 bytecode, but looks like that was a misunderstanding on my part. +1
Re: [DISCUSS] Java 8 as a minimum requirement
Would the move to Java 8 be for all modules? I'd have some concern about removing Java 7 compatibility for kafka-clients and for kafka streams (though less so since it's still so new). I don't know how hard it will be to transition a Scala 2.11 application to Scala 2.12. Are we comfortable with the idea of applications stuck on Scala 2.11 or otherwise unable to update to Java 8 not having access to new client releases? On Thu, Jun 16, 2016 at 5:05 PM, Philippe Deromewrote: > I strongly support motion having difficulty running (Apache Kafka as > opposed to Confluent) Stream examples with JDK 8 today. > On 16 Jun 2016 4:46 p.m., "Ismael Juma" wrote: > > > Hi all, > > > > I would like to start a discussion on making Java 8 a minimum requirement > > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This > > is the first discussion on the topic so the idea is to understand how > > people feel about it. If people feel it's too soon, then we can pick up > the > > conversation again after Kafka 0.10.1.0. If the feedback is mostly > > positive, I will start a vote thread. > > > > Let's start with some dates. Java 7 hasn't received public updates since > > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is > scheduled > > to be released in March 2017[3]. > > > > The first argument for dropping support for Java 7 is that the last > public > > release by Oracle contains a large number of known security > > vulnerabilities. The effectiveness of Kafka's security features is > reduced > > if the underlying runtime is not itself secure. > > > > The second argument for moving to Java 8 is that it adds a number of > > compelling features: > > > > * Lambda expressions and method references (particularly useful for the > > Kafka Streams DSL) > > * Default methods (very useful for maintaining compatibility when adding > > methods to interfaces) > > * java.util.stream (helpful for making collection transformations more > > concise) > > * Lots of improvements to java.util.concurrent (CompletableFuture, > > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator) > > * Other nice things: SplittableRandom, Optional (and many others I have > not > > mentioned) > > > > The third argument is that it will simplify our testing matrix, we won't > > have to test with Java 7 any longer (this is particularly useful for > system > > tests that take hours to run). It will also make it easier to support > Scala > > 2.12, which requires Java 8. > > > > The fourth argument is that many other open-source projects have taken > the > > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7], > > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will > > support Java 8 in the next version (although it will take a while before > > most phones will use that version sadly). This reduces (but does not > > eliminate) the chance that we would be the first project that would > cause a > > user to consider a Java upgrade. > > > > The main argument for not making the change is that a reasonable number > of > > users may still be using Java 7 by the time Kafka 0.10.1.0 is released. > > More specifically, we care about the subset who would be able to upgrade > to > > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It > would > > be great if we could quantify this in some way. > > > > What do you think? > > > > Ismael > > > > [1] https://java.com/en/download/faq/java_7.xml > > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released > > [3] http://openjdk.java.net/projects/jdk9/ > > [4] https://github.com/apache/cassandra/blob/trunk/README.asc > > [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include > > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html > > [7] https://issues.apache.org/jira/browse/HADOOP-11858 > > [8] https://webtide.com/jetty-9-3-features/ > > [9] http://markmail.org/message/l7s276y3xkga2eqf > > [10] > > > > > https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under > > [11] http://markmail.org/message/l7s276y3xkga2eqf > > >
[jira] [Updated] (KAFKA-3753) Add approximateNumEntries() to the StateStore interface for metrics reporting
[ https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Klukas updated KAFKA-3753: --- Summary: Add approximateNumEntries() to the StateStore interface for metrics reporting (was: Add size() to the StateStore interface for metrics reporting) > Add approximateNumEntries() to the StateStore interface for metrics reporting > - > > Key: KAFKA-3753 > URL: https://issues.apache.org/jira/browse/KAFKA-3753 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Labels: api > Fix For: 0.10.1.0 > > > As a developer building a Kafka Streams application, I'd like to have > visibility into what's happening with my state stores. How can I know if a > particular store is growing large? How can I know if a particular store is > frequently needing to hit disk? > I'm interested to know if there are existing mechanisms for extracting this > information or if other people have thoughts on how we might approach this. > I can't think of a way to provide metrics generically, so each state store > implementation would likely need to handle this separately. Given that the > default RocksDBStore will likely be the most-used, it would be a first target > for adding metrics. > I'd be interested in knowing the total number of entries in the store, the > total size on disk and in memory, rates of gets and puts, and hit/miss ratio > for the MemoryLRUCache. Some of these numbers are likely calculable through > the RocksDB API, others may simply not be accessible. > Would there be value to the wider community in having state stores register > metrics? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references
[ https://issues.apache.org/jira/browse/KAFKA-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325270#comment-15325270 ] Jeff Klukas commented on KAFKA-3801: I like concise code that you get from a static method reference: {{mapValues(LongDeserializer::deserialize)}} as opposed to {{mapValues(bytes -> Serdes.Long().deserializer().deserialize(bytes))}}, but you're correct that it's possible to use {{Serdes}} here inline. I'm fine to close this in deference to a more comprehensive solution down the line. > Provide static serialize() and deserialize() for use as method references > - > > Key: KAFKA-3801 > URL: https://issues.apache.org/jira/browse/KAFKA-3801 > Project: Kafka > Issue Type: Improvement > Components: clients, streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} > are abstracted away in Kafka Streams through the use of `Serdes` classes, > there are some instances where developers may want to call them directly. The > serializers and deserializers for simple types don't require any > configuration and could be static, but currently it's necessary to create an > instance to use those methods. > I'd propose moving serialization logic into a {{static public byte[] > serialize(? data)}} method and deserialization logic into a {{static public ? > deserialize(byte[] data)}} method. The existing instance methods would simply > call the static versions. > See a full example for LongSerializer and LongDeserializer here: > https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1 > In Java 8, these static methods then become available for method references > in code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the > user needing to create an instance of {{LongDeserializer}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references
[ https://issues.apache.org/jira/browse/KAFKA-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323718#comment-15323718 ] Jeff Klukas commented on KAFKA-3801: Let me give more context for the example. We have an application that produces JSON messages to a Kafka topic interleaved with occasional checkpoint messages that are of {{Long}} type. If I want to create a KStream of just the checkpoint messages, I need to filter out the JSON messages before deserializing. Here's what it looks like: {{KStream<Long, Long> checkpointStream = builder.stream(Serdes.Long(), Serdes.ByteArray(), inputTopicName)}} {{.filter((key, bytes) -> bytes.length == 8).mapValues(LongDeserializer::deserialize)}} I need to use ByteArraySerde when calling {{stream}}, then I do the deserialization in a {{mapValues}} invocation after filtering out messages of the wrong type. Another option would be to materialize the stream to a topic after the filter and then call {{builder.stream(Serdes.Long(), Serdes.Long(), newTopicName)}}, but I'd like to avoid unnecessary materialization. So in the current scheme, I need to create an instance of {{LongDeserializer}} separately so that I can then call its {{deserialize}} method in {{mapValues}}. This situation probably won't occur frequently, so I understand if it's decided not to bother considering this change. > Provide static serialize() and deserialize() for use as method references > - > > Key: KAFKA-3801 > URL: https://issues.apache.org/jira/browse/KAFKA-3801 > Project: Kafka > Issue Type: Improvement > Components: clients, streams >Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} > are abstracted away in Kafka Streams through the use of `Serdes` classes, > there are some instances where developers may want to call them directly. The > serializers and deserializers for simple types don't require any > configuration and could be static, but currently it's necessary to create an > instance to use those methods. > I'd propose moving serialization logic into a {{static public byte[] > serialize(? data)}} method and deserialization logic into a {{static public ? > deserialize(byte[] data)}} method. The existing instance methods would simply > call the static versions. > See a full example for LongSerializer and LongDeserializer here: > https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1 > In Java 8, these static methods then become available for method references > in code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the > user needing to create an instance of {{LongDeserializer}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3753) Add size() to the StateStore interface for metrics reporting
[ https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323342#comment-15323342 ] Jeff Klukas commented on KAFKA-3753: I agree that {{size}} is a bit ambiguous. I'd expect many folks would recognize the analogy to Java's {{Map.size}}, but there's no reason we need to stick with that. RocksDB has nice consistency in their naming of properties, where {{size}} refers to bytes and {{num-entries}} refers to a count. The PR discussion also brings up the point that since we can only get an estimated count from RocksDB, the name of this method should indicate that the result is not necessarily exact. I'd be happy to see this method called {{estimatedCount}}, {{estimatedNumEntries}}, {{approximateCount}}, or some other variant. > Add size() to the StateStore interface for metrics reporting > > > Key: KAFKA-3753 > URL: https://issues.apache.org/jira/browse/KAFKA-3753 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Labels: api > Fix For: 0.10.1.0 > > > As a developer building a Kafka Streams application, I'd like to have > visibility into what's happening with my state stores. How can I know if a > particular store is growing large? How can I know if a particular store is > frequently needing to hit disk? > I'm interested to know if there are existing mechanisms for extracting this > information or if other people have thoughts on how we might approach this. > I can't think of a way to provide metrics generically, so each state store > implementation would likely need to handle this separately. Given that the > default RocksDBStore will likely be the most-used, it would be a first target > for adding metrics. > I'd be interested in knowing the total number of entries in the store, the > total size on disk and in memory, rates of gets and puts, and hit/miss ratio > for the MemoryLRUCache. Some of these numbers are likely calculable through > the RocksDB API, others may simply not be accessible. > Would there be value to the wider community in having state stores register > metrics? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3817) KTableRepartitionMap should handle null inputs
Jeff Klukas created KAFKA-3817: -- Summary: KTableRepartitionMap should handle null inputs Key: KAFKA-3817 URL: https://issues.apache.org/jira/browse/KAFKA-3817 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.0.0 Reporter: Jeff Klukas Assignee: Guozhang Wang Fix For: 0.10.0.1 When calling {{KTable.groupBy}} on the result of a KTable-KTable join, NPEs are raised: {{org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$ > KTableMapProcessor.process(KTableRepartitionMap.java:88)}} The root cause is that the join is expected to emit null values when no match is found, but KTableRepartitionMap is not set up to handle this case. On the users email list, [~guozhang] described a plan of action: I think this is actually a bug in KTableRepartitionMap that it actually should expect null grouped keys; this would be a straight-forward fix for this operator, but I can make a pass over all the repartition operators just to make sure they are all gracefully handling null keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3753) Metrics for StateStores
[ https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322683#comment-15322683 ] Jeff Klukas commented on KAFKA-3753: Since there's refactoring going on right now with the metrics interface for streams (https://issues.apache.org/jira/browse/KAFKA-3715), I think we should delay actually adding size metrics to a different issue. The PR I attached here adds the size() method so that it can be used for a metric in the future. > Metrics for StateStores > --- > > Key: KAFKA-3753 > URL: https://issues.apache.org/jira/browse/KAFKA-3753 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Labels: api > Fix For: 0.10.1.0 > > > As a developer building a Kafka Streams application, I'd like to have > visibility into what's happening with my state stores. How can I know if a > particular store is growing large? How can I know if a particular store is > frequently needing to hit disk? > I'm interested to know if there are existing mechanisms for extracting this > information or if other people have thoughts on how we might approach this. > I can't think of a way to provide metrics generically, so each state store > implementation would likely need to handle this separately. Given that the > default RocksDBStore will likely be the most-used, it would be a first target > for adding metrics. > I'd be interested in knowing the total number of entries in the store, the > total size on disk and in memory, rates of gets and puts, and hit/miss ratio > for the MemoryLRUCache. Some of these numbers are likely calculable through > the RocksDB API, others may simply not be accessible. > Would there be value to the wider community in having state stores register > metrics? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3801) Provide static serialize() and deserialize() for use as method references
Jeff Klukas created KAFKA-3801: -- Summary: Provide static serialize() and deserialize() for use as method references Key: KAFKA-3801 URL: https://issues.apache.org/jira/browse/KAFKA-3801 Project: Kafka Issue Type: Improvement Components: clients, streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 While most calls to {{Serializer.serialize}} and {{Deserializer.deserialize}} are abstracted away in Kafka Streams through the use of `Serdes` classes, there are some instances where developers may want to call them directly. The serializers and deserializers for simple types don't require any configuration and could be static, but currently it's necessary to create an instance to use those methods. I'd propose moving serialization logic into a {{static public byte[] serialize(? data)}} method and deserialization logic into a {{static public ? deserialize(byte[] data)}} method. The existing instance methods would simply call the static versions. See a full example for LongSerializer and LongDeserializer here: https://github.com/apache/kafka/compare/trunk...jklukas:static-serde-methods?expand=1 In Java 8, these static methods then become available for method references in code like {{kstream.mapValues(LongDeserializer::deserialize)}} without the user needing to create an instance of {{LongDeserializer}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3711) Allow configuration of MetricsReporter subclasses
[ https://issues.apache.org/jira/browse/KAFKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318768#comment-15318768 ] Jeff Klukas commented on KAFKA-3711: I submitted a PR above to call {{originals()}} when getting configured instances. Documentation changes can be a separate issue. > Allow configuration of MetricsReporter subclasses > - > > Key: KAFKA-3711 > URL: https://issues.apache.org/jira/browse/KAFKA-3711 > Project: Kafka > Issue Type: Improvement > Components: clients, streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > The current interface for attaching metrics reporters to clients allows only > defining a list of class names, but provides no means for configuring those > reporters. > There is at least one existing project > (https://github.com/apakulov/kafka-graphite) that solves this problem by > passing additional properties into the client, which then get passed on to > the reporter. This seems to work quite well, but it generates warnings like > {{The configuration kafka.graphite.metrics.prefix = foo was supplied but > isn't a known config.}} > Should passing arbitrary additional parameters like this be officially > supported as the way to configure metrics reporters? Should these warnings > about unrecognized parameters be removed? > Perhaps there should be some mechanism for registering additional > configuration parameters for clients to expect? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3711) Allow configuration of MetricsReporter subclasses
[ https://issues.apache.org/jira/browse/KAFKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318546#comment-15318546 ] Jeff Klukas commented on KAFKA-3711: I would love to see this fixed as well. I'd also love to see some more documentation about Kafka's config framework and how various interfaces like {{MetricsReporter}} are intended to be used. Specifically, I'd like to see the developer docs describe that it's intended that user-provided classes can define configuration options and provide some advice about how to name those options. I'd be interested in contributing docs and cleaning up the config code if it's clear what needs to be done. Am I correct in understanding that the code change needed here is to ensure that {{originals()}} is called rather than {{this.originals}} everywhere that we're passing configs on in {{AbstractConfig}}? > Allow configuration of MetricsReporter subclasses > - > > Key: KAFKA-3711 > URL: https://issues.apache.org/jira/browse/KAFKA-3711 > Project: Kafka > Issue Type: Improvement > Components: clients, streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > The current interface for attaching metrics reporters to clients allows only > defining a list of class names, but provides no means for configuring those > reporters. > There is at least one existing project > (https://github.com/apakulov/kafka-graphite) that solves this problem by > passing additional properties into the client, which then get passed on to > the reporter. This seems to work quite well, but it generates warnings like > {{The configuration kafka.graphite.metrics.prefix = foo was supplied but > isn't a known config.}} > Should passing arbitrary additional parameters like this be officially > supported as the way to configure metrics reporters? Should these warnings > about unrecognized parameters be removed? > Perhaps there should be some mechanism for registering additional > configuration parameters for clients to expect? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3753) Metrics for StateStores
[ https://issues.apache.org/jira/browse/KAFKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318521#comment-15318521 ] Jeff Klukas commented on KAFKA-3753: It's good to know about MeteredKeyValueStore. And holding off on cache-related metrics for the redesign sounds reasonable. It looks like the KeyValueStore interface does not provide any method related to number of entries (besides calling `all()` and iterating over all values, which doesn't seem reasonable), so there's no way for MeteredKeyValueStore to access the number of entries in the wrapped store. Could we consider adding a `size` method to the KeyValueStore interface? > Metrics for StateStores > --- > > Key: KAFKA-3753 > URL: https://issues.apache.org/jira/browse/KAFKA-3753 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Labels: api > Fix For: 0.10.1.0 > > > As a developer building a Kafka Streams application, I'd like to have > visibility into what's happening with my state stores. How can I know if a > particular store is growing large? How can I know if a particular store is > frequently needing to hit disk? > I'm interested to know if there are existing mechanisms for extracting this > information or if other people have thoughts on how we might approach this. > I can't think of a way to provide metrics generically, so each state store > implementation would likely need to handle this separately. Given that the > default RocksDBStore will likely be the most-used, it would be a first target > for adding metrics. > I'd be interested in knowing the total number of entries in the store, the > total size on disk and in memory, rates of gets and puts, and hit/miss ratio > for the MemoryLRUCache. Some of these numbers are likely calculable through > the RocksDB API, others may simply not be accessible. > Would there be value to the wider community in having state stores register > metrics? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3753) Metrics for StateStores
Jeff Klukas created KAFKA-3753: -- Summary: Metrics for StateStores Key: KAFKA-3753 URL: https://issues.apache.org/jira/browse/KAFKA-3753 Project: Kafka Issue Type: Improvement Components: streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 As a developer building a Kafka Streams application, I'd like to have visibility into what's happening with my state stores. How can I know if a particular store is growing large? How can I know if a particular store is frequently needing to hit disk? I'm interested to know if there are existing mechanisms for extracting this information or if other people have thoughts on how we might approach this. I can't think of a way to provide metrics generically, so each state store implementation would likely need to handle this separately. Given that the default RocksDBStore will likely be the most-used, it would be a first target for adding metrics. I'd be interested in knowing the total number of entries in the store, the total size on disk and in memory, rates of gets and puts, and hit/miss ratio for the MemoryLRUCache. Some of these numbers are likely calculable through the RocksDB API, others may simply not be accessible. Would there be value to the wider community in having state stores register metrics? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3714) Allow users greater access to register custom streams metrics
[ https://issues.apache.org/jira/browse/KAFKA-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Klukas updated KAFKA-3714: --- Issue Type: Improvement (was: Bug) > Allow users greater access to register custom streams metrics > - > > Key: KAFKA-3714 > URL: https://issues.apache.org/jira/browse/KAFKA-3714 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > Copying in some discussion that originally appeared in > https://github.com/apache/kafka/pull/1362#issuecomment-219064302 > Kafka Streams is largely a higher-level abstraction on top of producers and > consumers, and it seems sensible to match the KafkaStreams interface to that > of KafkaProducer and KafkaConsumer where possible. For producers and > consumers, the metric registry is internal and metrics are only exposed as an > unmodifiable map. This allows users to access client metric values for use in > application health checks, etc., but doesn't allow them to register new > metrics. > That approach seems reasonable if we assume that a user interested in > defining custom metrics is already going to be using a separate metrics > library. In such a case, users will likely find it easier to define metrics > using whatever library they're familiar with rather than learning the API for > Kafka's Metrics class. Is this a reasonable assumption? > If we want to expose the Metrics instance so that users can define arbitrary > metrics, I'd argue that there's need for documentation updates. In > particular, I find the notion of metric tags confusing. Tags can be defined > in a MetricConfig when the Metrics instance is constructed, > StreamsMetricsImpl is maintaining its own set of tags, and users can set tag > overrides. > If a user were to get access to the Metrics instance, they would be missing > the tags defined in StreamsMetricsImpl. I'm imagining that users would want > their custom metrics to sit alongside the predefined metrics with the same > tags, and users shouldn't be expected to manage those additional tags > themselves. > So, why are we allowing users to define their own metrics via the > StreamsMetrics interface in the first place? Is it that we'd like to be able > to provide a built-in latency metric, but the definition depends on the > details of the use case so there's no generic solution? That would be > sufficient motivation for this special case of addLatencySensor. If we want > to continue down that path and give users access to define a wider range of > custom metrics, I'd prefer to extend the StreamsMetrics interface so that > users can call methods on that object, automatically getting the tags > appropriate for that instance rather than interacting with the raw Metrics > instance. > --- > Guozhang had the following comments: > 1) For the producer/consumer cases, all internal metrics are provided and > abstracted from users, and they just need to read the documentation to poll > whatever provided metrics that they are interested; and if they want to > define more metrics, they are likely to be outside the clients themselves and > they can use whatever methods they like, so Metrics do not need to be exposed > to users. > 2) For streams, things are a bit different: users define the computational > logic, which becomes part of the "Streams Client" processing and may be of > interests to be monitored by user themselves; think of a customized processor > that sends an email to some address based on a condition, and users want to > monitor the average rate of emails sent. Hence it is worth considering > whether or not they should be able to access the Metrics instance to define > their own along side the pre-defined metrics provided by the library. > 3) Now, since the Metrics class was not previously designed for public usage, > it is not designed to be very user-friendly for defining sensors, especially > the semantics differences between name / scope / tags. StreamsMetrics tries > to hide some of these semantics confusion from users, but it still expose > tags and hence is not perfect in doing so. We need to think of a better > approach so that: 1) user defined metrics will be "aligned" (i.e. with the > same name prefix within a single application, with similar scope hierarchy > definition, etc) with library provided metrics, 2) natural APIs to do so. > I do not have concrete ideas about 3) above on top of my head,
[jira] [Commented] (KAFKA-3715) Higher granularity streams metrics
[ https://issues.apache.org/jira/browse/KAFKA-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284706#comment-15284706 ] Jeff Klukas commented on KAFKA-3715: It would be interesting to work on this, but I won't likely have time in the near future, so others should feel free to implement these ideas. > Higher granularity streams metrics > --- > > Key: KAFKA-3715 > URL: https://issues.apache.org/jira/browse/KAFKA-3715 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.1.0 > > > Originally proposed by [~guozhang] in > https://github.com/apache/kafka/pull/1362#issuecomment-218326690 > We can consider adding metrics for process / punctuate / commit rate at the > granularity of each processor node in addition to the global rate mentioned > above. This is very helpful in debugging. > We can consider adding rate / total cumulated metrics for context.forward > indicating how many records were forwarded downstream from this processor > node as well. This is helpful in debugging. > We can consider adding metrics for each stream partition's timestamp. This is > helpful in debugging. > Besides the latency metrics, we can also add throughput latency in terms of > source records consumed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3715) Higher granularity streams metrics
Jeff Klukas created KAFKA-3715: -- Summary: Higher granularity streams metrics Key: KAFKA-3715 URL: https://issues.apache.org/jira/browse/KAFKA-3715 Project: Kafka Issue Type: Improvement Components: streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 Originally proposed by [~guozhang] in https://github.com/apache/kafka/pull/1362#issuecomment-218326690 We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3714) Allow users greater access to register custom streams metrics
Jeff Klukas created KAFKA-3714: -- Summary: Allow users greater access to register custom streams metrics Key: KAFKA-3714 URL: https://issues.apache.org/jira/browse/KAFKA-3714 Project: Kafka Issue Type: Bug Components: streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 Copying in some discussion that originally appeared in https://github.com/apache/kafka/pull/1362#issuecomment-219064302 Kafka Streams is largely a higher-level abstraction on top of producers and consumers, and it seems sensible to match the KafkaStreams interface to that of KafkaProducer and KafkaConsumer where possible. For producers and consumers, the metric registry is internal and metrics are only exposed as an unmodifiable map. This allows users to access client metric values for use in application health checks, etc., but doesn't allow them to register new metrics. That approach seems reasonable if we assume that a user interested in defining custom metrics is already going to be using a separate metrics library. In such a case, users will likely find it easier to define metrics using whatever library they're familiar with rather than learning the API for Kafka's Metrics class. Is this a reasonable assumption? If we want to expose the Metrics instance so that users can define arbitrary metrics, I'd argue that there's need for documentation updates. In particular, I find the notion of metric tags confusing. Tags can be defined in a MetricConfig when the Metrics instance is constructed, StreamsMetricsImpl is maintaining its own set of tags, and users can set tag overrides. If a user were to get access to the Metrics instance, they would be missing the tags defined in StreamsMetricsImpl. I'm imagining that users would want their custom metrics to sit alongside the predefined metrics with the same tags, and users shouldn't be expected to manage those additional tags themselves. So, why are we allowing users to define their own metrics via the StreamsMetrics interface in the first place? Is it that we'd like to be able to provide a built-in latency metric, but the definition depends on the details of the use case so there's no generic solution? That would be sufficient motivation for this special case of addLatencySensor. If we want to continue down that path and give users access to define a wider range of custom metrics, I'd prefer to extend the StreamsMetrics interface so that users can call methods on that object, automatically getting the tags appropriate for that instance rather than interacting with the raw Metrics instance. --- Guozhang had the following comments: 1) For the producer/consumer cases, all internal metrics are provided and abstracted from users, and they just need to read the documentation to poll whatever provided metrics that they are interested; and if they want to define more metrics, they are likely to be outside the clients themselves and they can use whatever methods they like, so Metrics do not need to be exposed to users. 2) For streams, things are a bit different: users define the computational logic, which becomes part of the "Streams Client" processing and may be of interests to be monitored by user themselves; think of a customized processor that sends an email to some address based on a condition, and users want to monitor the average rate of emails sent. Hence it is worth considering whether or not they should be able to access the Metrics instance to define their own along side the pre-defined metrics provided by the library. 3) Now, since the Metrics class was not previously designed for public usage, it is not designed to be very user-friendly for defining sensors, especially the semantics differences between name / scope / tags. StreamsMetrics tries to hide some of these semantics confusion from users, but it still expose tags and hence is not perfect in doing so. We need to think of a better approach so that: 1) user defined metrics will be "aligned" (i.e. with the same name prefix within a single application, with similar scope hierarchy definition, etc) with library provided metrics, 2) natural APIs to do so. I do not have concrete ideas about 3) above on top of my head, comments are more than welcomed. --- I'm not sure that I agree that 1) and 2) are truly different situations. A user might choose to send email messages within a bare consumer rather than a streams application, and still want to maintain a metric of sent emails. In this bare consumer case, we'd expect the user to define that email-sent metric outside of Kafka's metrics machinery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3711) Allow configuration of MetricsReporter subclasses
Jeff Klukas created KAFKA-3711: -- Summary: Allow configuration of MetricsReporter subclasses Key: KAFKA-3711 URL: https://issues.apache.org/jira/browse/KAFKA-3711 Project: Kafka Issue Type: Improvement Components: clients, streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 The current interface for attaching metrics reporters to clients allows only defining a list of class names, but provides no means for configuring those reporters. There is at least one existing project (https://github.com/apakulov/kafka-graphite) that solves this problem by passing additional properties into the client, which then get passed on to the reporter. This seems to work quite well, but it generates warnings like {{The configuration kafka.graphite.metrics.prefix = foo was supplied but isn't a known config.}} Should passing arbitrary additional parameters like this be officially supported as the way to configure metrics reporters? Should these warnings about unrecognized parameters be removed? Perhaps there should be some mechanism for registering additional configuration parameters for clients to expect? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3701) Expose KafkaStreams metrics in public API
Jeff Klukas created KAFKA-3701: -- Summary: Expose KafkaStreams metrics in public API Key: KAFKA-3701 URL: https://issues.apache.org/jira/browse/KAFKA-3701 Project: Kafka Issue Type: Improvement Components: streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.1.0 The Kafka clients expose their metrics registries through a `metrics` method presenting an unmodifiable collection, but `KafkaStreams` does not expose its registry. Currently, applications can access a StreamsMetrics instance via the ProcessorContext within a Processor, but this limits flexibility. Having read-only access to a KafkaStreams.metrics() method would allow a developer to define a health check for their application based on the metrics that KafkaStreams is collecting. Or a developer might want to define a metric in some other framework based on KafkaStreams' metrics. I am imagining that an application would build and register KafkaStreams-based health checks after building a KafkaStreams instance but before calling the start() method. Are metrics added to the registry at the time a KafkaStreams instance is constructed, or only after calling the start() method? If metrics are registered only after application startup, then this approach may not be sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3625) Move kafka-streams test fixtures into a published package
[ https://issues.apache.org/jira/browse/KAFKA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258338#comment-15258338 ] Jeff Klukas commented on KAFKA-3625: I would be willing to submit this patch if folks think it has merit and there's agreement as to how the package should be named. > Move kafka-streams test fixtures into a published package > - > > Key: KAFKA-3625 > URL: https://issues.apache.org/jira/browse/KAFKA-3625 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Jeff Klukas >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.0.0 > > > The KStreamTestDriver and related fixtures defined in > streams/src/test/java/org/apache/kafka/test would be useful to developers > building applications on top of Kafka Streams, but they are not currently > exposed in a package. > I propose moving this directory to live under streams/fixtures/src/main and > creating a new 'streams:fixtures' project in the gradle configuration to > publish these as a separate package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3625) Move kafka-streams test fixtures into a published package
Jeff Klukas created KAFKA-3625: -- Summary: Move kafka-streams test fixtures into a published package Key: KAFKA-3625 URL: https://issues.apache.org/jira/browse/KAFKA-3625 Project: Kafka Issue Type: Improvement Components: streams Reporter: Jeff Klukas Assignee: Guozhang Wang Priority: Minor Fix For: 0.10.0.0 The KStreamTestDriver and related fixtures defined in streams/src/test/java/org/apache/kafka/test would be useful to developers building applications on top of Kafka Streams, but they are not currently exposed in a package. I propose moving this directory to live under streams/fixtures/src/main and creating a new 'streams:fixtures' project in the gradle configuration to publish these as a separate package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)