Jenkins build is back to normal : kafka-trunk-jdk8 #2514

2018-03-29 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-29 Thread Jun Rao
Hi, Allen,

Thanks for the KIP.

It seems the main motivation of the KIP is to estimate the ratio of the
clients doing down conversion. I am wondering if we can do that by relying
on the metrics we added in 1.0.0 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+Add+new+metrics+to+support+health+checks#KIP-188-Addnewmetricstosupporthealthchecks-Messageconversionrateandtime).
This metric reports the message rate of down conversion. By comparing this
to the message rate of all consumers, you can roughly estimate the ratio of
consumers still needing down conversion. Does that cover the main thing
that you want from this KIP?

Jun



On Wed, Mar 28, 2018 at 9:55 AM, Allen Wang  wrote:

> Hi All,
>
> I would like to start voting for KIP-272:  Add API version tag to broker's
> RequestsPerSec metric.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
>
> Thanks,
> Allen
>


Jenkins build is back to normal : kafka-trunk-jdk9 #518

2018-03-29 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Guozhang Wang
> One thing you mention is the notion of setting alerts on coarser metrics
being easier than finer ones. All the metric alerting systems I have used
make it equally easy to alert on metrics by-tag or over tags. So my
experience doesn't say that this is a use case. Were you thinking of an
alerting system that makes such a pre-aggregation valuable?

For the commonly used JMX reporter tags will be encoded directly as part of
the object name, and if users wants to monitor them they need to know these
values before hand. That is also why I think we do want to list all the
possible values of the reason tags in the KIP, since

> In my email in response to Matthias, I gave an example of the kind of
scenario that would lead me as an operator to run with DEBUG on all the
time, since I wouldn't be sure, having seen a skipped record once, that it
would ever happen again. The solution is to capture all the available
information about the reason and location of skips all the time.

That is a good point. I think we can either expose all levels metrics as by
default, or only expose the most lower-level metrics and get rid of other
levels to let users do roll-ups themselves (which will be a much larger
scope for discussion), or we can encourage users to not purely depend on
metrics for such trouble shooting: that is to say, users only be alerted
based on metrics, and we can log a info / warn log4j entry each time we are
about to skip a record all over the places, so that upon being notified
users can look into the logs to find the details on where / when it
happens. WDYT?


Guozhang



On Thu, Mar 29, 2018 at 3:57 PM, John Roesler  wrote:

> Hey Guozhang,
>
> Thanks for the review.
>
> 1.
> Matthias raised the same question about the "reason" tag values. I can list
> all possible values of the "reason" tag, but I'm thinking this level of
> detail may not be KIP-worthy, maybe the code and documentation review would
> be sufficient. If you all disagree and would like it included in the KIP, I
> can certainly do that.
>
> If we do provide roll-up metrics, I agree with the pattern of keeping the
> same name but eliminating the tags for the dimensions that were rolled-up.
>
> 2.
> I'm not too sure that implementation efficiency really becomes a factor in
> choosing whether to (by default) update one coarse metric at the thread
> level or one granular metric at the processor-node level, since it's just
> one metric being updated either way. I do agree that if we were to update
> the granular metrics and multiple roll-ups, then we should consider the
> efficiency.
>
> I agree it's probably not necessary to surface the metrics for all nodes
> regardless of whether they can or do skip records. Perhaps we can lazily
> register the metrics.
>
> In my email in response to Matthias, I gave an example of the kind of
> scenario that would lead me as an operator to run with DEBUG on all the
> time, since I wouldn't be sure, having seen a skipped record once, that it
> would ever happen again. The solution is to capture all the available
> information about the reason and location of skips all the time.
>
>
>
> One thing you mention is the notion of setting alerts on coarser metrics
> being easier than finer ones. All the metric alerting systems I have used
> make it equally easy to alert on metrics by-tag or over tags. So my
> experience doesn't say that this is a use case. Were you thinking of an
> alerting system that makes such a pre-aggregation valuable?
>
> Thanks again,
> -John
>
>
> On Thu, Mar 29, 2018 at 5:24 PM, Guozhang Wang  wrote:
>
> > Hello John,
> >
> > Thanks for the KIP. Some comments:
> >
> > 1. Could you list all the possible values of the "reason" tag? In the
> JIRA
> > ticket I left some potential reasons but I'm not clear if you're going to
> > categorize each of them as a separate reason, or is there any additional
> > ones you have in mind.
> >
> > Also I'm wondering if we should add another metric that do not have the
> > reason tag but aggregates among all possible reasons? This is for users
> to
> > easily set their alerting notifications (otherwise they have to write on
> > notification rule per reason) in their monitoring systems.
> >
> > 2. Note that the processor-node metrics is actually "per-thread,
> per-task,
> > per-processor-node", and today we only set the per-thread metrics as INFO
> > while leaving the lower two layers as DEBUG. I agree with your argument
> > that we are missing the per-client roll-up metrics today, but I'm
> convinced
> > that the right way to approach it would be "just-providing-the-lowest-
> > level
> > metrics only".
> >
> > Note the recoding implementation of these three levels are different
> > internally today: we did not just do the rolling up to generate the
> > higher-level metrics from the lower level ones, but we just record them
> > separately, which means that, if we turn on multiple levels of metrics,
> we
> > maybe duplicate 

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread John Roesler
Hey Guozhang,

Thanks for the review.

1.
Matthias raised the same question about the "reason" tag values. I can list
all possible values of the "reason" tag, but I'm thinking this level of
detail may not be KIP-worthy, maybe the code and documentation review would
be sufficient. If you all disagree and would like it included in the KIP, I
can certainly do that.

If we do provide roll-up metrics, I agree with the pattern of keeping the
same name but eliminating the tags for the dimensions that were rolled-up.

2.
I'm not too sure that implementation efficiency really becomes a factor in
choosing whether to (by default) update one coarse metric at the thread
level or one granular metric at the processor-node level, since it's just
one metric being updated either way. I do agree that if we were to update
the granular metrics and multiple roll-ups, then we should consider the
efficiency.

I agree it's probably not necessary to surface the metrics for all nodes
regardless of whether they can or do skip records. Perhaps we can lazily
register the metrics.

In my email in response to Matthias, I gave an example of the kind of
scenario that would lead me as an operator to run with DEBUG on all the
time, since I wouldn't be sure, having seen a skipped record once, that it
would ever happen again. The solution is to capture all the available
information about the reason and location of skips all the time.



One thing you mention is the notion of setting alerts on coarser metrics
being easier than finer ones. All the metric alerting systems I have used
make it equally easy to alert on metrics by-tag or over tags. So my
experience doesn't say that this is a use case. Were you thinking of an
alerting system that makes such a pre-aggregation valuable?

Thanks again,
-John


On Thu, Mar 29, 2018 at 5:24 PM, Guozhang Wang  wrote:

> Hello John,
>
> Thanks for the KIP. Some comments:
>
> 1. Could you list all the possible values of the "reason" tag? In the JIRA
> ticket I left some potential reasons but I'm not clear if you're going to
> categorize each of them as a separate reason, or is there any additional
> ones you have in mind.
>
> Also I'm wondering if we should add another metric that do not have the
> reason tag but aggregates among all possible reasons? This is for users to
> easily set their alerting notifications (otherwise they have to write on
> notification rule per reason) in their monitoring systems.
>
> 2. Note that the processor-node metrics is actually "per-thread, per-task,
> per-processor-node", and today we only set the per-thread metrics as INFO
> while leaving the lower two layers as DEBUG. I agree with your argument
> that we are missing the per-client roll-up metrics today, but I'm convinced
> that the right way to approach it would be "just-providing-the-lowest-
> level
> metrics only".
>
> Note the recoding implementation of these three levels are different
> internally today: we did not just do the rolling up to generate the
> higher-level metrics from the lower level ones, but we just record them
> separately, which means that, if we turn on multiple levels of metrics, we
> maybe duplicate collecting some metrics. One can argue that is not the best
> way to represent multi-level metrics collecting and reporting, but by only
> enabling thread-level metrics as INFO today, that implementation could be
> more efficient than only collecting the metrics at the lowest level, and
> then do the roll-up calculations outside of the metrics classes.
>
> Plus, today not all processor-nodes may possibly skip records, AFAIK we
> will only skip records at the source, sink, window and aggregation
> processor nodes, so adding a metric per processor looks like an overkill to
> me as well. On the other hand, from user's perspective the "reason" tag may
> be sufficient for them to narrow down where inside the topology is causing
> records to be dropped on the floor. So I think the "per-thread, per-task"
> level metrics should be sufficient for them in trouble shoot in DEBUG mode,
> and we can add another "per-thread" level metrics as INFO which is turned
> on by default. So under normal execution users still only need INFO level
> metrics for alerting (e.g. set alerts on all skipped-records metrics as
> non-zero), and then upon trouble shooting they can turn on DEBUG metrics to
> look into which task is actually causing the skipped records.
>
>
> Guozhang
>
>
>
>
> On Thu, Mar 29, 2018 at 2:03 PM, Matthias J. Sax 
> wrote:
>
> > Thanks for the KIP John.
> >
> > Reading the material on the related Jiras, I am wondering what `reason`
> > tags you want to introduce? Can you elaborate? The KIP should list those
> > IMHO.
> >
> > About the fine grained metrics vs the roll-up: you say that
> >
> > > the coarse metric aggregates across two dimensions simultaneously
> >
> > Can you elaborate why this is an issue? I am not convinced atm that we
> > should put the fine grained 

Re: [DISCUSS] KIP-255: OAuth Authentication via SASL/OAUTHBEARER

2018-03-29 Thread Rajini Sivaram
Hi Ron,

Thanks for the updates. I had a quick look and it is looking good.

I have updated KIP-86 and the associated PR to with a new config
sasl.login.callback.handler.class
that matches what you are using in this KIP.

On Thu, Mar 29, 2018 at 6:27 AM, Ron Dagostino  wrote:

> Hi Rajini.  I have adjusted the KIP to use callbacks and callback handlers
> throughout.  I also clarified that production implementations of the
> retrieval and validation callback handlers will require the use of an open
> source JWT library, and the unsecured implementations are as far as
> SASL/OAUTHBEARER will go out-of-the-box. Your suggestions, plus this
> clarification, has allowed much of the code to move into the ".internal"
> java package; the public-facing API now consists of just 8 Java classes, 1
> Java interface, and a set of configuration requirements.  I also added a
> section outlinng those configuration requirements since they are extensive
> (not onerously so -- just not something one can easily remember).
>
> Ron
>
> On Tue, Mar 13, 2018 at 11:44 AM, Rajini Sivaram 
> wrote:
>
> > Hi Ron,
> >
> > Thanks for the response. All sound good, I think the only outstanding
> > question is around callbacks vs classes provided through the login
> context.
> > As you have pointed out, there are advantages of both approaches. Even
> > though my preference is for callbacks, it is not a blocker since the
> > current approach works fine too. I will make the case for callbacks
> anyway,
> > using OAuthTokenValidator as an example:
> >
> >
> >- As you mentioned, the main advantage of using callbacks is
> >consistency. It is the standard plug-in mechanism for SASL
> > implementations
> >in Java and keeps code consistent with built-in mechanisms like
> > Kerberos as
> >well as our own implementations like PLAIN and SCRAM.
> >- With the current approach, there are two classes OAuthTokenValidator
> >and a default implementation OAuthBearerUnsecuredJwtValidator. I was
> >thinking that we would have a public callback class
> > OAuthTokenValidatorCallback
> >instead and a default callback handler
> >OAuthBearerUnsecuredJwtValidatorCallbackHandler. So it would be two
> >classes either way?
> >- JAAS config is very opaque, we don't log it because it could contain
> >passwords. Your option substitution classes could help here, but it
> has
> >generally made it difficult to diagnose failures in the past. Callback
> >handlers on the the other hand are logged as part of the broker
> configs
> > and
> >can be easily made dynamically updatable.
> >- In the current implementation, an instance of  OAuthTokenValidator
> >is created and configured for every SaslServer, i.e every connection.
> We
> >create one server callback handler instance per mechanism and cache
> it.
> >This is useful if we need to make an external connection or load trust
> >stores etc.
> >
> > For token retriever, I think either approach is fine, since it is tied in
> > with login anyway and would benefit from login manager cache either way.
> >
> > Regards,
> >
> > Rajini
> >
> > On Sat, Mar 10, 2018 at 4:19 AM, Ron Dagostino 
> wrote:
> >
> > > Hi Rajini.  Thanks for the great feedback.  See below for my
> > > thoughts/conclusions.  I haven't implemented any of it yet or changed
> the
> > > KIP, but I will start to work on the areas where we are in agreement
> > > immediately, and I will await your feedback on the areas where an
> > > additional iteration is needed to arrive at a conclusion.
> > >
> > > Regarding (1), yes, we can and should eliminate some public API.  See
> > > below.
> > >
> > > Regarding (2), I will change the exception hierarchy so that it is
> > > unchecked.
> > >
> > > Regarding (3) and (4), yes, I agree, the expiring/refresh code can and
> > > should be simplified.  The name of the Login class (I called it
> > > ExpiringCredentialRefreshingLogin) must be part of the public API
> > because
> > > it is the class that must be set via the oauthbearer.sasl.login.class
> > > property.  Its underlying implementation doesn't have to be public, but
> > the
> > > fully-qualified name has to be well-known and fixed so that it can be
> > > associated with that configuration property.  As you point out, we are
> > not
> > > unifying the refresh logic for OAUTHBEARER and GSSAPI, though it could
> be
> > > undertaken at some point in the future; the name "
> > > ExpiringCredentialRefreshingLogin" should probably be used if/when
> that
> > > unification happens.  In the meantime, the class that we expose should
> > > probably be called "OAuthBearerLogin", and it's fully-qualified name
> and
> > > the fact that it recognizes several refresh-related property names in
> the
> > > config, with certain min/max/default values, are the only things that
> > > should be public.  I also agree from (4) that we can stipulate that
> > > 

Re: [DISCUSS] KIP-86: Configurable SASL callback handlers

2018-03-29 Thread Rajini Sivaram
To support extensibility for SASL/OAuth as described in KIP-255
,
I have added an extra config sasl.login.callback.handler.class. This
implements the same interface as the other callback handlers. Default
behaviour will be unchanged.

Please let me know if you have any concerns.

Thank you,

Rajini

On Mon, Jan 29, 2018 at 7:07 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> To simplify dynamic update of SASL configs in future (e.g add a new SASL
> mechanism with a new callback handler or Login), I have separated out the
> broker-side configs with a mechanism prefix in the property name (similar
> to listener prefix) instead of including all the classes together as a map.
> This will also make it easier to configure new Login classes when new
> mechanisms are introduced alongside other mechanisms on a single listener.
>
> Please let me know if you have any concerns.
>
> Thank you,
>
> Rajini
>
>
> On Wed, Jan 17, 2018 at 6:54 AM, Rajini Sivaram 
> wrote:
>
>> Hi all,
>>
>> I have made some updates to this KIP to simplify addition of new SASL
>> mechanisms:
>>
>>1. The Login interface has been made configurable as well (we have
>>had this interface for quite some time and it has been stable).
>>2.  The callback handler properties for client-side and server-side
>>have been separated out since we would never use the same classes for 
>> both.
>>
>> Any feedback or suggestions are welcome.
>>
>> Thank you,
>>
>> Rajini
>>
>>
>> On Mon, Apr 3, 2017 at 12:55 PM, Rajini Sivaram 
>> wrote:
>>
>>> If there are no other concerns or suggestions on this KIP, I will start
>>> vote later this week.
>>>
>>> Thank you...
>>>
>>> Regards,
>>>
>>> Rajini
>>>
>>> On Thu, Mar 30, 2017 at 9:42 PM, Rajini Sivaram >> > wrote:
>>>
 I have made a minor change to the callback handler interface to pass in
 the JAAS configuration entries in *configure,* to work with the
 multiple listener configuration introduced in KIP-103. I have also renamed
 the interface to AuthenticateCallbackHandler instead of AuthCallbackHandler
 to avoid confusion with authorization.

 I have rebased and updated the PR (https://github.com/apache/kaf
 ka/pull/2022) as well. Please let me know if there are any other
 comments or suggestions to move this forward.

 Thank you...

 Regards,

 Rajini


 On Thu, Dec 15, 2016 at 3:11 PM, Rajini Sivaram 
 wrote:

> Ismael,
>
> The reason for choosing CallbackHandler interface as the configurable
> interface is flexibility. As you say, we could instead define a simpler
> PlainCredentialProvider and ScramCredentialProvider. But that would tie
> users to Kafka's SaslServer implementation for PLAIN and SCRAM.
> SaslServer/SaslClient implementations are already pluggable using
> standard
> Java security provider mechanism. Callback handlers are the
> configuration
> mechanism for SaslServer/SaslClient. By making the handlers
> configurable,
> SASL becomes fully configurable for mechanisms supported by Kafka as
> well
> as custom mechanisms. From the 'Scenarios' section in the KIP, a
> simpler
> PLAIN/SCRAM interface satisfies the first two, but configurable
> callback
> handlers enables all five. I agree that most users don't require this
> level
> of flexibility, but we have had discussions about custom mechanisms in
> the
> past for integration with existing authentication servers. So I think
> it is
> a feature worth supporting.
>
> On Thu, Dec 15, 2016 at 2:21 PM, Ismael Juma 
> wrote:
>
> > Thanks Rajini, your answers make sense to me. One more general
> point: we
> > are following the JAAS callback architecture and exposing that to
> the user
> > where the user has to write code like:
> >
> > @Override
> > public void handle(Callback[] callbacks) throws IOException,
> > UnsupportedCallbackException {
> > String username = null;
> > for (Callback callback: callbacks) {
> > if (callback instanceof NameCallback)
> > username = ((NameCallback)
> callback).getDefaultName();
> > else if (callback instanceof PlainAuthenticateCallback) {
> > PlainAuthenticateCallback plainCallback =
> > (PlainAuthenticateCallback) callback;
> > boolean authenticated = authenticate(username,
> > plainCallback.password());
> > plainCallback.authenticated(authenticated);
> > } else
> > throw new UnsupportedCallbackException(callback);
> > }
> > }
> >
> > protected boolean 

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread John Roesler
Hi Matthias,

Thanks for the review.

Regarding the "reason" tags, I listed "deserialization-error" and
"negative-timestamp" to capture the two skips we already account for. More
generally, I would like to establish a pattern in which we could add new
values for the "reason" tags without needing a KIP to do so.

For example, pulling from Guozhang's list in the jira, if we discovered the
"null key in aggregation" skip after merging the work for this KIP, instead
of creating a new KIP to add a new value for the reason tag, we'd just
create a PR like "MINOR: add null key in aggregation to skipped-records
metric". In that PR, we would just record the new metric with the same
metric name, but something like "null-key-in-aggregation" as the value.
That's not to say we wouldn't document the full list of tag values, just
that we don't need the KIProcess to propose and ratify the tag name.

If you agree with this proposal, then by extension we also do not need to
agree on the full list of "reason"s during this KIP discussion and can
instead deal with that during code review.


Next:

> About the fine grained metrics vs the roll-up: you say that
>
> > the coarse metric aggregates across two dimensions simultaneously
>
> Can you elaborate why this is an issue? I am not convinced atm that we
> should put the fine grained metrics into INFO level and remove the
> roll-up at thread level.
>

Sure. The issue is the assumption that our library's users would prefer to
see a skip counter with no context by default and then optionally see the
information about the reason and the location in the topology of the skip.
I'm not sure I share this assumption. For example, if I run my Streams app
happily for a long time, and then see the skip counter go to 1, I can't be
sure that turning on DEBUG after the fact will be sufficient to help me
figure out what happened. I would have to try to re-wind my application to
re-process the same data with DEBUG on to investigate the issue, which
could be complex indeed. As a result, I would personally just run with
DEBUG on all the time.

I guess what I'm saying is that, to me, the reason why my stream processing
framework drops some data is important information that we should expose by
default. Given access to this data, I don't see why we would also provide a
partial roll-up of it at the same level of verbosity.


On whether we provide an optional top-level roll-up, the thought did cross
my mind that we could register a gauge that recursively aggregates all the
granular skip metrics. The gauge wouldn't introduce any contention, since
it wouldn't do anything until it's invoked. But we still couldn't aggregate
above the level of the process, so I figure it's better to avoid the
complexity and let users report and aggregate metrics on their own.

I'll update the KIP with my thoughts as you suggest.

Thanks again,
-John


On Thu, Mar 29, 2018 at 4:03 PM, Matthias J. Sax 
wrote:

> Thanks for the KIP John.
>
> Reading the material on the related Jiras, I am wondering what `reason`
> tags you want to introduce? Can you elaborate? The KIP should list those
> IMHO.
>
> About the fine grained metrics vs the roll-up: you say that
>
> > the coarse metric aggregates across two dimensions simultaneously
>
> Can you elaborate why this is an issue? I am not convinced atm that we
> should put the fine grained metrics into INFO level and remove the
> roll-up at thread level.
>
> > Given that they have to do this sum to get a usable top-level view
>
> This is a fair concern, but I don't share the conclusion. Offering a
> built-in `KafkaStreams` "client" roll-up out of the box might be a
> better solution. In the past we did not offer this due to performance
> concerns, but we could allow an "opt-in" mechanism. If you disagree, can
> you provide some reasoning and add them to the "Rejected alternatives"
> section.
>
> To rephrase: I understand the issue about missing top-level view, but
> instead of going more fine grained, we should consider to add this
> top-level view and add/keep the fine grained metrics at DEBUG level only
>
> I am +1 to add TopologyTestDriver#metrics() and to remove old metrics
> directly as you suggested.
>
>
> -Matthias
>
>
>
> On 3/28/18 6:42 PM, Ted Yu wrote:
> > Looks good to me.
> >
> > On Wed, Mar 28, 2018 at 3:11 PM, John Roesler  wrote:
> >
> >> Hello all,
> >>
> >> I am proposing KIP-274 to improve the metrics around skipped records in
> >> Streams.
> >>
> >> Please find the details here:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 274%3A+Kafka+Streams+Skipped+Records+Metrics
> >>
> >> Please let me know what you think!
> >>
> >> Thanks,
> >> -John
> >>
> >
>
>


Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Guozhang Wang
Hello John,

Thanks for the KIP. Some comments:

1. Could you list all the possible values of the "reason" tag? In the JIRA
ticket I left some potential reasons but I'm not clear if you're going to
categorize each of them as a separate reason, or is there any additional
ones you have in mind.

Also I'm wondering if we should add another metric that do not have the
reason tag but aggregates among all possible reasons? This is for users to
easily set their alerting notifications (otherwise they have to write on
notification rule per reason) in their monitoring systems.

2. Note that the processor-node metrics is actually "per-thread, per-task,
per-processor-node", and today we only set the per-thread metrics as INFO
while leaving the lower two layers as DEBUG. I agree with your argument
that we are missing the per-client roll-up metrics today, but I'm convinced
that the right way to approach it would be "just-providing-the-lowest-level
metrics only".

Note the recoding implementation of these three levels are different
internally today: we did not just do the rolling up to generate the
higher-level metrics from the lower level ones, but we just record them
separately, which means that, if we turn on multiple levels of metrics, we
maybe duplicate collecting some metrics. One can argue that is not the best
way to represent multi-level metrics collecting and reporting, but by only
enabling thread-level metrics as INFO today, that implementation could be
more efficient than only collecting the metrics at the lowest level, and
then do the roll-up calculations outside of the metrics classes.

Plus, today not all processor-nodes may possibly skip records, AFAIK we
will only skip records at the source, sink, window and aggregation
processor nodes, so adding a metric per processor looks like an overkill to
me as well. On the other hand, from user's perspective the "reason" tag may
be sufficient for them to narrow down where inside the topology is causing
records to be dropped on the floor. So I think the "per-thread, per-task"
level metrics should be sufficient for them in trouble shoot in DEBUG mode,
and we can add another "per-thread" level metrics as INFO which is turned
on by default. So under normal execution users still only need INFO level
metrics for alerting (e.g. set alerts on all skipped-records metrics as
non-zero), and then upon trouble shooting they can turn on DEBUG metrics to
look into which task is actually causing the skipped records.


Guozhang




On Thu, Mar 29, 2018 at 2:03 PM, Matthias J. Sax 
wrote:

> Thanks for the KIP John.
>
> Reading the material on the related Jiras, I am wondering what `reason`
> tags you want to introduce? Can you elaborate? The KIP should list those
> IMHO.
>
> About the fine grained metrics vs the roll-up: you say that
>
> > the coarse metric aggregates across two dimensions simultaneously
>
> Can you elaborate why this is an issue? I am not convinced atm that we
> should put the fine grained metrics into INFO level and remove the
> roll-up at thread level.
>
> > Given that they have to do this sum to get a usable top-level view
>
> This is a fair concern, but I don't share the conclusion. Offering a
> built-in `KafkaStreams` "client" roll-up out of the box might be a
> better solution. In the past we did not offer this due to performance
> concerns, but we could allow an "opt-in" mechanism. If you disagree, can
> you provide some reasoning and add them to the "Rejected alternatives"
> section.
>
> To rephrase: I understand the issue about missing top-level view, but
> instead of going more fine grained, we should consider to add this
> top-level view and add/keep the fine grained metrics at DEBUG level only
>
> I am +1 to add TopologyTestDriver#metrics() and to remove old metrics
> directly as you suggested.
>
>
> -Matthias
>
>
>
> On 3/28/18 6:42 PM, Ted Yu wrote:
> > Looks good to me.
> >
> > On Wed, Mar 28, 2018 at 3:11 PM, John Roesler  wrote:
> >
> >> Hello all,
> >>
> >> I am proposing KIP-274 to improve the metrics around skipped records in
> >> Streams.
> >>
> >> Please find the details here:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 274%3A+Kafka+Streams+Skipped+Records+Metrics
> >>
> >> Please let me know what you think!
> >>
> >> Thanks,
> >> -John
> >>
> >
>
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Matt Farmer
Thanks for this KIP. I can think of some ways we would apply this.
I, too, am ~ on the compatibility story though, however I'm not sure
which way I'd prefer we go at this moment.

On Thu, Mar 29, 2018 at 4:36 PM, Ismael Juma  wrote:

> Thanks for the KIP. I think this is going in the right direction, but we
> need a better compatibility story. Also, it's worth considering whether we
> want to tackle better wildcard support at the same time.
>
> Ismael
>
> On Thu, Mar 29, 2018 at 6:51 AM, Edoardo Comar  wrote:
>
> > Hi all,
> >
> > We have submitted KIP-277 to give users permission to manage the
> lifecycle
> > of a defined set of topics;
> > the current ACL checks are for permission to create *any* topic and on
> > delete for permission against the *named* topics.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 277+-+Fine+Grained+ACL+for+CreateTopics+API
> >
> > Feedback and suggestions are welcome, thanks.
> >
> > Edo & Mickael
> > --
> >
> > Edoardo Comar
> >
> > IBM Message Hub
> >
> > IBM UK Ltd, Hursley Park, SO21 2JN
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
>


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Edoardo Comar
Thanks Ismael,

in the 'rejected' section we describe an approach that would allow for 
compatibility but we'd keep the authz checks simple rather than establish 
a precedent 
of a logical OR between different checks (cluster or topic(t)) .

What would be your ideal story for compatibility ?

I prefer to think of wildcard support as a separate concern.
And keep this KIP as simple as it is.

Wildcard support could come from a more sophisticated implementation of 
the Authorizer;
one that interprets the Resource name associated with ACLs in
kafka.security.auth.Authorizer.addAcls(Set[Acl], Resource)
as a regex to be matched against the actual Resource name in
kafka.security.auth.Authorizer.authorize(RequestChannel.Session, 
Operation, Resource)

Thoughts?

cheers
Edo
--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN



From:   Ismael Juma 
To: dev 
Date:   29/03/2018 21:37
Subject:Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics 
API
Sent by:isma...@gmail.com



Thanks for the KIP. I think this is going in the right direction, but we
need a better compatibility story. Also, it's worth considering whether we
want to tackle better wildcard support at the same time.

Ismael

On Thu, Mar 29, 2018 at 6:51 AM, Edoardo Comar  wrote:

> Hi all,
>
> We have submitted KIP-277 to give users permission to manage the 
lifecycle
> of a defined set of topics;
> the current ACL checks are for permission to create *any* topic and on
> delete for permission against the *named* topics.
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=q1hztQq7OjhpHNkewaZcug47eAE6Rc0HTDHwzte4zyg=Pyt5ScR9iFsNT5o7O33csRafz0nbZEMckyHZgHWGp5w=

> 277+-+Fine+Grained+ACL+for+CreateTopics+API
>
> Feedback and suggestions are welcome, thanks.
>
> Edo & Mickael
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Edoardo Comar
Thanks for your comments Stephane.

I too would like to get 'wildcard' support. 
I'd like to associate an acl to a regex instead of a specific resource 
name (or everything),
and then the authz check is then performed against the actual resource 
name.

This could be attainable with a more sophisticated implementation of 
Authorizer.

One immediate use case is to allow an authenticated user to manage 
topics/groups and txns that have a given prefix.
This would be the case of Streams applications too, wouldn't it ?

Our Kip is simply about giving the ability for a user to clean up after 
himself :-)
It is bad practice to be able to create resources but not to delete them, 
and the only current alternative is to give a user the ability to create 
and delete any topic,
and that authority may be too broad in some organizations.

cheers
Edo
--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN



From:   Stephane Maarek 
To: dev@kafka.apache.org
Date:   29/03/2018 18:11
Subject:Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics 
API



Not against, but this needs to support regex for support for Kafka streams
application that create many topics with complex names

On Thu., 29 Mar. 2018, 7:21 pm Edoardo Comar,  wrote:

> Hi all,
>
> We have submitted KIP-277 to give users permission to manage the 
lifecycle
> of a defined set of topics;
> the current ACL checks are for permission to create *any* topic and on
> delete for permission against the *named* topics.
>
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D277-2B-2D-2BFine-2BGrained-2BACL-2Bfor-2BCreateTopics-2BAPI=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=uZGGpiYQPMpPZ2QpZfv5GWdjwWiTIu7Oox8zoBEo-3E=y8kJf6lUAsyU6SVgaXy39LCL0JJ35aqg793SxC88PaQ=

>
> Feedback and suggestions are welcome, thanks.
>
> Edo & Mickael
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


[jira] [Resolved] (KAFKA-6612) Added logic to prevent increasing partition counts during topic deletion

2018-03-29 Thread Lucas Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Wang resolved KAFKA-6612.
---
Resolution: Fixed

> Added logic to prevent increasing partition counts during topic deletion
> 
>
> Key: KAFKA-6612
> URL: https://issues.apache.org/jira/browse/KAFKA-6612
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Major
>
> Problem: trying to increase the partition count of a topic while the topic 
> deletion is in progress can cause the topic to be never deleted.
> In the current code base, if a topic deletion is still in progress and the 
> partition count is increased,
> the new partition and its replica assignment be created on zookeeper as data 
> of the path /brokers/topics/.
> Upon detecting the change, the controller sees the topic is being deleted, 
> and therefore ignores the partition change. Therefore the zk path 
> /brokers/topics//partitions/ will NOT be created.
> If a controller switch happens next, the added partition will be detected by 
> the new controller and stored in the 
> controllerContext.partitionReplicaAssignment. The new controller then tries 
> to delete the topic by first transitioning its replicas to OfflineReplica. 
> However the transition to OfflineReplica state will NOT succeed since there 
> is no leader for the partition. Since the only state change path for a 
> replica to be successfully deleted is OfflineReplica -> 
> ReplicaDeletionStarted -> ReplicaDeletionSuccessful, not being able to enter 
> the OfflineReplica state means the replica can never be successfully deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-03-29 Thread John Roesler
Hey Dong,

Congrats on becoming a committer!!!

Since I just sent a novel-length email, I'll try and keep this one brief ;)

Regarding producer coordination, I'll grant that in that case, producers
may coordinate among themselves to produce into the same topic or to
produce co-partitioned topics. Nothing in KStreams or the Kafka ecosystem
in general requires such coordination for correctness or in fact for any
optional features, though, so I would not say that we require producer
coordination of partition logic. If producers currently coordinate, it's
completely optional and their own choice.

Regarding the portability of partition algorithms, my observation is that
systems requiring independent implementations of the same algorithm with
100% correctness are a large source of risk and also a burden on those who
have to maintain them. If people could flawlessly implement algorithms in
actual software, the world would be a wonderful place indeed! For a system
as important and widespread as Kafka, I would recommend restricting
limiting such requirements as aggressively as possible.

I'd agree that we can always revisit decisions like allowing arbitrary
partition functions, but of course, we shouldn't do that in a vacuum. That
feels like the kind of thing we'd need to proactively seek guidance from
the users list about. I do think that the general approach of saying that
"if you use a custom partitioner, you cannot do partition expansion" is
very reasonable (but I don't think we need to go that far with the current
proposal). It's similar to my statement in my email to Jun that in
principle KStreams doesn't *need* backfill, we only need it if we want to
employ partition expansion.

I reckon that the main motivation for backfill is to support KStreams use
cases and also any other use cases involving stateful consumers.

Thanks for your response, and congrats again!
-John


On Wed, Mar 28, 2018 at 1:34 AM, Dong Lin  wrote:

> Hey John,
>
> Great! Thanks for all the comment. It seems that we agree that the current
> KIP is in good shape for core Kafka. IMO, what we have been discussing in
> the recent email exchanges is mostly about the second step, i.e. how to
> address problem for the stream use-case (or stateful processing in
> general).
>
> I will comment inline.
>
>
>
>
> On Tue, Mar 27, 2018 at 4:38 PM, John Roesler  wrote:
>
> > Thanks for the response, Dong.
> >
> > Here are my answers to your questions:
> >
> > - "Asking producers and consumers, or even two different producers, to
> > > share code like the partition function is a pretty huge ask. What if
> they
> > > are using different languages?". It seems that today we already require
> > > different producer's to use the same hash function -- otherwise
> messages
> > > with the same key will go to different partitions of the same topic
> which
> > > may cause problem for downstream consumption. So not sure if it adds
> any
> > > more constraint by assuming consumers know the hash function of
> producer.
> > > Could you explain more why user would want to use a cusmtom partition
> > > function? Maybe we can check if this is something that can be supported
> > in
> > > the default Kafka hash function. Also, can you explain more why it is
> > > difficuilt to implement the same hash function in different languages?
> >
> >
> > Sorry, I meant two different producers as in producers to two different
> > topics. This was in response to the suggestion that we already require
> > coordination among producers to different topics in order to achieve
> > co-partitioning. I was saying that we do not (and should not).
>
>
> It is probably common for producers of different team to produce message to
> the same topic. In order to ensure that messages with the same key go to
> same partition, we need producers of different team to share the same
> partition algorithm, which by definition requires coordination among
> producers of different teams in an organization. Even for producers of
> different topics, it may be common to require producers to use the same
> partition algorithm in order to join two topics for stream processing. Does
> this make it reasonable to say we already require coordination across
> producers?
>
>
> > By design, consumers are currently ignorant of the partitioning scheme.
> It
> > suffices to trust that the producer has partitioned the topic by key, if
> > they claim to have done so. If you don't trust that, or even if you just
> > need some other partitioning scheme, then you must re-partition it
> > yourself. Nothing we're discussing can or should change that. The value
> of
> > backfill is that it preserves the ability for consumers to avoid
> > re-partitioning before consuming, in the case where they don't need to
> > today.
>
>
> > Regarding shared "hash functions", note that it's a bit inaccurate to
> talk
> > about the "hash function" of the producer. Properly speaking, the
> producer
> > has 

[jira] [Resolved] (KAFKA-6395) KIP: Add Record Header support to Kafka Streams

2018-03-29 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-6395.
--
Resolution: Duplicate

> KIP: Add Record Header support to Kafka Streams 
> 
>
> Key: KAFKA-6395
> URL: https://issues.apache.org/jira/browse/KAFKA-6395
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jorge Quilcate
>Assignee: Jorge Quilcate
>Priority: Major
>  Labels: kip
>
> KIP documentation: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-03-29 Thread John Roesler
Hi Jun,

Thanks for the response. I'm very new to this project, but I will share my
perspective. I'm going to say a bunch of stuff that I know you know
already, but just so we're on the same page...

This may also be a good time to get feedback from the other KStreams folks.

Using KStreams as a reference implementation for how stream processing
frameworks may interact with Kafka, I think it's important to eschew
knowledge about how KStreams currently handles internal communication,
making state durable, etc. Both because these details may change, and
because they won't be shared with other stream processors.

=
Background

We are looking at a picture like this:

 input input input
 \   |   /
  +-+
+-+ Consumer(s) +---+
| +-+   |
|   |
|KStreams Application   |
|   |
| +-+   |
+-+ Producer(s) +---+
  +-+
   /\
output output

The inputs and outputs are Kafka topics (and therefore have 1 or more
partitions). We'd have at least 1 input and 0 or more outputs. The
Consumers and Producers are both the official KafkaConsumer and
KafkaProducer.

In general, we'll assume that the input topics are provided by actors over
which we have no control, although we may as well assume they are friendly
and amenable to requests, and also that their promises are trustworthy.
This is important because we must depend on them to uphold some promises:
* That they tell us the schema of the data they publish, and abide by that
schema. Without this, the inputs are essentially garbage.
* That they tell us some defining characteristics of the topics (more on
this in a sec.) and again strictly abide by that promise.

What are the topic characteristics we care about?
1. The name (or name pattern)
2. How the messages are keyed (if at all)
3. Whether the message timestamps are meaningful, and if so, what their
meaning is
4. Assuming the records have identity, whether the partitions partition the
records' identity space
5. Whether the topic completely contains the data set
6. Whether the messages in the topic are ordered

#1 is obvious: without this information, we cannot access the data at all.

For #2, #3, #4, and #6, we may or may not need this information, depending
on the logic of the application. For example, a trivial application that
simply counts all events it sees doesn't care about #2, #3, #4, or #6. But
an application that groups by some attribute can take advantage of #2 and
#4 if the topic data is already keyed and partitioned over that attribute.
Likewise, if the application includes some temporal semantics on a temporal
dimension that is already captured in #3, it may take advantage of that
fact.

Note that #2, #3, #4, and #6 are all optional. If they are not promised, we
can do extra work inside the application to accomplish what we need.
However, if they are promised (and if we depend on that promise), it is
essential that the topic providers uphold those promises, as we may not be
in a position to verify them.

Note also that if they make a promise, but it doesn't happen to line up
with our needs (data is keyed by attr1, but we need it by attr2, or
timestamp is produce-time, but we need it by event-time, etc.), then we
will have to go ahead and do that extra work internally anyway. This also
captures the situation in which two inputs are produced by different
providers, one of which meets our needs, and the other does not. The fact
that we can cope with this situation is the basis for my statement that we
do not require coordination among producers.

(Key Point A): In terms of optimization, #4 and #6 are the most valuable.
If these characteristics happen to line up with our needs, then KStreams
can be incredibly efficient in both time and computational resources.

 #5 is similar to knowing the schema in that it tells us whether the
computation we want to do is possible or not. For example, suppose we have
a topic of "users", and we want to construct a table for querying. If the
user topic doesn't completely contain the dataset, we cannot construct the
table. Note that it doesn't matter whether the topic is compacted or not.
If the topic is complete, I can consume it starting at "earliest" and build
my table. If it is not complete, I can do other computations on it. In both
cases, it may or may not be compacted; it just doesn't matter.

On the output side, the roles are reversed. We provide (or not) exactly the
same set of guarantees to consumers of our outputs, and we likewise must
abide by the promises we make.


=
Partition Expansion

With this formation in place, let's talk about partition expansion.

Why do we have partitions in the first place? (let me know if I miss
something here)
* For logical data streams that are themselves partitionable, it allows
producers to operate 

Re: [DISCUSS] KIP-274: Kafka Streams Skipped Records Metrics

2018-03-29 Thread Matthias J. Sax
Thanks for the KIP John.

Reading the material on the related Jiras, I am wondering what `reason`
tags you want to introduce? Can you elaborate? The KIP should list those
IMHO.

About the fine grained metrics vs the roll-up: you say that

> the coarse metric aggregates across two dimensions simultaneously

Can you elaborate why this is an issue? I am not convinced atm that we
should put the fine grained metrics into INFO level and remove the
roll-up at thread level.

> Given that they have to do this sum to get a usable top-level view

This is a fair concern, but I don't share the conclusion. Offering a
built-in `KafkaStreams` "client" roll-up out of the box might be a
better solution. In the past we did not offer this due to performance
concerns, but we could allow an "opt-in" mechanism. If you disagree, can
you provide some reasoning and add them to the "Rejected alternatives"
section.

To rephrase: I understand the issue about missing top-level view, but
instead of going more fine grained, we should consider to add this
top-level view and add/keep the fine grained metrics at DEBUG level only

I am +1 to add TopologyTestDriver#metrics() and to remove old metrics
directly as you suggested.


-Matthias



On 3/28/18 6:42 PM, Ted Yu wrote:
> Looks good to me.
> 
> On Wed, Mar 28, 2018 at 3:11 PM, John Roesler  wrote:
> 
>> Hello all,
>>
>> I am proposing KIP-274 to improve the metrics around skipped records in
>> Streams.
>>
>> Please find the details here:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 274%3A+Kafka+Streams+Skipped+Records+Metrics
>>
>> Please let me know what you think!
>>
>> Thanks,
>> -John
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Ismael Juma
Thanks for the KIP. I think this is going in the right direction, but we
need a better compatibility story. Also, it's worth considering whether we
want to tackle better wildcard support at the same time.

Ismael

On Thu, Mar 29, 2018 at 6:51 AM, Edoardo Comar  wrote:

> Hi all,
>
> We have submitted KIP-277 to give users permission to manage the lifecycle
> of a defined set of topics;
> the current ACL checks are for permission to create *any* topic and on
> delete for permission against the *named* topics.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 277+-+Fine+Grained+ACL+for+CreateTopics+API
>
> Feedback and suggestions are welcome, thanks.
>
> Edo & Mickael
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Stephane Maarek
Not against, but this needs to support regex for support for Kafka streams
application that create many topics with complex names

On Thu., 29 Mar. 2018, 7:21 pm Edoardo Comar,  wrote:

> Hi all,
>
> We have submitted KIP-277 to give users permission to manage the lifecycle
> of a defined set of topics;
> the current ACL checks are for permission to create *any* topic and on
> delete for permission against the *named* topics.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API
>
> Feedback and suggestions are welcome, thanks.
>
> Edo & Mickael
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Dong Lin
Great news! Thanks Damian and Rajini for running this release!

On Thu, Mar 29, 2018 at 8:47 AM, James Cheng  wrote:

> Thanks Damian and Rajini for running the release! Congrats and good job
> everyone!
>
> -James
>
> Sent from my iPhone
>
> > On Mar 29, 2018, at 2:27 AM, Rajini Sivaram  wrote:
> >
> > The Apache Kafka community is pleased to announce the release for
> >
> > Apache Kafka 1.1.0.
> >
> >
> > Kafka 1.1.0 includes a number of significant new features.
> >
> > Here is a summary of some notable changes:
> >
> >
> > ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
> >
> >   that speed up controlled shutdown. ZooKeeper session expiration edge
> > cases
> >
> >   have also been fixed as part of this effort.
> >
> >
> > ** Controller improvements also enable more partitions to be supported
> on a
> >
> >   single cluster. KIP-227 introduced incremental fetch requests,
> providing
> >
> >   more efficient replication when the number of partitions is large.
> >
> >
> > ** KIP-113 added support for replica movement between log directories to
> >
> >   enable data balancing with JBOD.
> >
> >
> > ** Some of the broker configuration options like SSL keystores can now be
> >
> >   updated dynamically without restarting the broker. See KIP-226 for
> > details
> >
> >   and the full list of dynamic configs.
> >
> >
> > ** Delegation token based authentication (KIP-48) has been added to Kafka
> >
> >   brokers to support large number of clients without overloading Kerberos
> >
> >   KDCs or other authentication servers.
> >
> >
> > ** Several new features have been added to Kafka Connect, including
> header
> >
> >   support (KIP-145), SSL and Kafka cluster identifiers in the Connect
> REST
> >
> >   interface (KIP-208 and KIP-238), validation of connector names
> (KIP-212)
> >
> >   and support for topic regex in sink connectors (KIP-215). Additionally,
> >
> >   the default maximum heap size for Connect workers was increased to 2GB.
> >
> >
> > ** Several improvements have been added to the Kafka Streams API,
> including
> >
> >   reducing repartition topic partitions footprint, customizable error
> >
> >   handling for produce failures and enhanced resilience to broker
> >
> >   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
> >
> >
> > All of the changes in this release can be found in the release notes:
> >
> >
> >
> > https://dist.apache.org/repos/dist/release/kafka/1.1.0/
> RELEASE_NOTES.html
> >
> >
> >
> >
> > You can download the source release from:
> >
> >
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka-1.1.0-src.tgz
> >
> >
> >
> > and binary releases from:
> >
> >
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.11-1.1.0.tgz
> >
> > (Scala 2.11)
> >
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.12-1.1.0.tgz
> >
> > (Scala 2.12)
> >
> >
> > 
> --
> >
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> >
> > ** The Producer API allows an application to publish a stream records to
> >
> > one or more Kafka topics.
> >
> >
> >
> > ** The Consumer API allows an application to subscribe to one or more
> >
> > topics and process the stream of records produced to them.
> >
> >
> >
> > ** The Streams API allows an application to act as a stream processor,
> >
> > consuming an input stream from one or more topics and producing an output
> >
> > stream to one or more output topics, effectively transforming the input
> >
> > streams to output streams.
> >
> >
> >
> > ** The Connector API allows building and running reusable producers or
> >
> > consumers that connect Kafka topics to existing applications or data
> >
> > systems. For example, a connector to a relational database might capture
> >
> > every change to a table.three key capabilities:
> >
> >
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> >
> > between systems or applications.
> >
> >
> >
> > ** Building real-time streaming applications that transform or react to
> the
> >
> > streams of data.
> >
> >
> >
> >
> > Apache Kafka is in use at large and small companies worldwide, including
> >
> > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> >
> > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> >
> >
> >
> >
> > A big thank you for the following 120 contributors to this release!
> >
> >
> > Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
> >
> > Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
> >
> > Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,
> >
> > Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
> >
> > Daniel Wojda, 

Re: Expiring record(s) for topic: 30011 ms has passed since last append

2018-03-29 Thread cosmin . ciobanu12
We're consistently seeing this on Kafka Streams 1.0.0, even when not under 
load, exclusively for state change log topics. Tried changing 
`request.timeout.ms` to 60s but it doesn't help. The exception is thrown 
continuously for what looks to be every attempt to update any of the state 
change logs.

We're tried running the Kafka Streams on a 2 to 20 node cluster, connecting to 
a remote broker cluster. The topology is quite simple (7 processors, 2 state 
stores). The brokers have been handling significant loads from other sources 
(librdkafka & Java producers) without any issues, so I doubt there is any 
problem there. 

Another interesting fact we see after these timeout exceptions is that calling 
`.close(5, TimeUnit.Seconds)` on the application (catching the 
`TimeoutException` via a `GlobalExceptionHandler`), makes the JVM process hang 
every time, and we need to manually kill it.

Also tried updating to 1.1.0 to check if the new config param `retries` set to 
1 makes any difference, and it doesn't. `default.production.exception.handler` 
does trigger for this situation but doesn't offer anything different from the 
`GlobalExceptionHandler`.  

Something that comes to mind is that the input and output topics are long lived 
(i.e. the broker cluster along with the ZK quorum) while the Kafka Streams 
cluster is an ephemeral cluster which we take down and up in minutes (and we do 
this quite often, for config changes). When recreating this cluster the hosts 
are changed completely (the apps run in Docker containers in an AWS autoscaling 
group). Is there any long-lived relation between Kafka Streams state change log 
partitions and the Kafka Streams process hosts that breaks when recreating the 
Kafka Streams nodes? The first time we create the Kafka Streams cluster on a 
new broker cluster, things seem to run relatively well. 

Here's an example stack trace:
2018-03-29 13:15:15,614 [kafka-producer-network-thread | 
app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1-producer] ERROR 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl - task [0_19] 
Error sending record (key ... value [...] timestamp 1522327436038) to topic 
app-sessionStateStore-changelog due to {}; No more records will be sent and no 
more offsets will be recorded for this task.
org.apache.kafka.common.errors.TimeoutException: Expiring 23 record(s) for 
app-sessionStateStore-changelog-19: 30002 ms has passed since last append
2018-03-29 13:15:19,470 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] ERROR 
org.apache.kafka.streams.processor.internals.AssignedTasks - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] Failed to commit 
stream task 0_19 due to the following error:
org.apache.kafka.streams.errors.StreamsException: task [0_19] Abort sending 
since an error caught with a previous record (key ... value ... timestamp 
1522327436038) to topic app-sessionStateStore-changelog due to 
org.apache.kafka.common.errors.TimeoutException: Expiring 23 record(s) for 
app-sessionStateStore-changelog-19: 30002 ms has passed since last append.
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at 
org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:287)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 23 
record(s) for app-sessionStateStore-changelog-19: 30002 ms has passed since 
last append
2018-03-29 13:15:19,471 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] State transition from 
PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2018-03-29 13:15:19,471 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] Shutting down
2018-03-29 13:15:19,885 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.clients.producer.KafkaProducer - [Producer 
clientId=app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1-producer] 
Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2018-03-29 13:15:19,896 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.streams.processor.internals.StreamThread - 

Persistent TimeoutException when updating Kafka Streams state change logs

2018-03-29 Thread cosmin . ciobanu12
We're consistently seeing TimeoutExceptions on Kafka Streams 1.0.0, even when 
not under load, exclusively for state change log topic updates. Tried changing 
`request.timeout.ms` to 60s but it doesn't help. The exception is thrown 
continuously for what looks to be every attempt to update any of the state 
change logs.

We're tried running the Kafka Streams on a 2 to 20 node cluster, connecting to 
a remote broker cluster. The topology is quite simple (7 processors, 2 state 
stores). The brokers have been handling significant loads from other sources 
(librdkafka & Java producers) without any issues, so I doubt there is any 
problem there. 

Another interesting fact we see (likely unrelated) after these timeout 
exceptions is that calling `.close(5, TimeUnit.Seconds)` on the application 
(catching the `TimeoutException` via a `GlobalExceptionHandler`), makes the JVM 
process hang every time, and we need to manually kill it.

Also tried updating to 1.1.0 to check if the new config param `retries` set to 
1 makes any difference, and it doesn't. `default.production.exception.handler` 
does trigger for this situation but doesn't offer anything different from the 
`GlobalExceptionHandler`.  

One possible source of the problem that comes to mind is that the input and 
output topics are long lived (i.e. the broker cluster along with the ZK quorum) 
while the Kafka Streams cluster is an ephemeral cluster which we take down and 
up in minutes (and we do this quite often, for config changes). When recreating 
this cluster the hosts are changed completely (the apps run in Docker 
containers in an AWS autoscaling group). Is there any long-lived relation 
between Kafka Streams state change logs and the Kafka Streams process hosts 
that breaks when recreating the Kafka Streams nodes? The first time we create 
the Kafka Streams cluster on a new broker cluster, things seem to run 
relatively well. 

Here's an example stack trace:
2018-03-29 13:15:15,614 [kafka-producer-network-thread | 
app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1-producer] ERROR 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl - task [0_19] 
Error sending record (key ... value [...] timestamp 1522327436038) to topic 
app-sessionStateStore-changelog due to {}; No more records will be sent and no 
more offsets will be recorded for this task.
org.apache.kafka.common.errors.TimeoutException: Expiring 23 record(s) for 
app-sessionStateStore-changelog-19: 30002 ms has passed since last append
2018-03-29 13:15:19,470 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] ERROR 
org.apache.kafka.streams.processor.internals.AssignedTasks - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] Failed to commit 
stream task 0_19 due to the following error:
org.apache.kafka.streams.errors.StreamsException: task [0_19] Abort sending 
since an error caught with a previous record (key ... value ... timestamp 
1522327436038) to topic app-sessionStateStore-changelog due to 
org.apache.kafka.common.errors.TimeoutException: Expiring 23 record(s) for 
app-sessionStateStore-changelog-19: 30002 ms has passed since last append.
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:118)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at 
org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:287)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 23 
record(s) for app-sessionStateStore-changelog-19: 30002 ms has passed since 
last append
2018-03-29 13:15:19,471 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] State transition from 
PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2018-03-29 13:15:19,471 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] Shutting down
2018-03-29 13:15:19,885 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  
org.apache.kafka.clients.producer.KafkaProducer - [Producer 
clientId=app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1-producer] 
Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2018-03-29 13:15:19,896 
[app-ef44224d-ab04-47ea-b2f3-2b74bac22133-StreamThread-1] INFO  

Re: [VOTE] KIP-249: Add Delegation Token Operations to Kafka Admin Client

2018-03-29 Thread Gwen Shapira
+1

Thank you and sorry for missing it the first time around.

On Thu, Mar 29, 2018 at 3:05 AM, Manikumar 
wrote:

> I'm bumping this up to get some attention.
>
>
> On Wed, Jan 24, 2018 at 3:36 PM, Satish Duggana 
> wrote:
>
> >  +1, thanks for the KIP.
> >
> > ~Satish.
> >
> > On Wed, Jan 24, 2018 at 5:09 AM, Jun Rao  wrote:
> >
> > > Hi, Mani,
> > >
> > > Thanks for the KIP. +1
> > >
> > > Jun
> > >
> > > On Sun, Jan 21, 2018 at 7:44 AM, Manikumar 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start a vote on KIP-249 which would add delegation
> > token
> > > > operations
> > > > to Java Admin Client.
> > > >
> > > > We have merged DelegationToken API PR recently. We want to include
> > admin
> > > > client changes in the upcoming release. This will make the feature
> > > > complete.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > > >
> > > > Thanks,
> > > >
> > >
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [DISCUSS] KIP-273 Kafka to support using ETCD beside Zookeeper

2018-03-29 Thread Gwen Shapira
Few other concerns that were raised in the previous discussion were around
the challenges both to maintainers and users in making this API pluggable
and how does making the interface pluggable aligns with future goals for
the project. At the time this was difficult to discuss because there wasn't
a concrete proposal. I want to discuss these points in the context of this
specific proposal:

1. Problem: Pluggable APIs mean larger surface testing area and multiple
implementations to cover.
In this case: At the time, the Kafka project didn't have much
experience with pluggable APIs and components, so the concerns were very
valid. Right now Kafka has multiple pluggable components - Connectors,
converters, transformations, authentication protocols, authorization
database, coordination protocol, serializers, etc. I think that as a
community we gotten better at testing the interface, testing the very few
implementations that are included in Apache Kafka itself and allowing the
community to innovate and validate outside of the Kafka project. I don't
recall major issues either from lack of testing or from usability
perspective.

2. Problem: Users don't want to choose a consensus implementation, they
just don't want ZK.
In this case: I agree that users don't actually want to spend time
choosing consensus implementation and a simpler deployment model would
serve them better. IMO, if Apache Kafka ships with our well-tested ZK
implementation, 99% of the users will choose to use that (a vast majority
uses our less-than-amazing authorization plugin), and the few that really
need something else for whatever reason, will be able to get what they
need. As Jake said, we need to face the fact that development trajectory of
ZK isn't amazing at the moment, that it is lacking features our users need
(SSL) and it will be good to allow the user community to explore
alternatives.

3. Problem: Why got to the effort of refactoring if we know we want to get
rid of ZK.
In this case: This change isn't huge, it doesn't rewrite large portions
of Kafka and it does not make the future direction any more difficult. If
in 2 weeks or 2 month or 2 years we'll have a ZK-less solution, applying it
on Kafka with this KIP isn't any more challenging than applying it to Kafka
without this KIP. It is a step in an orthogonal direction, but not opposite
direction. I think that letting the perfect become the enemy of the good is
a repeated failure mode in this community. Can we discuss whether this
proposal is good even if there is a more complex proposal that may be
better? As long as they don't conflict?

Gwen

On Thu, Mar 29, 2018 at 8:31 AM, Molnár Bálint 
wrote:

> Thanks, for the feedback.
>
> Developing an internal consensus service inside Kafka would require a team
> dedicated to this task.
> We second what Flavio said in
> https://lists.apache.org/thread.html/24ae56e073104c4531cf64f7a1f1c0
> a84f895d139d334c88e9fe6028@1449008733@%3Cdev.kafka.apache.org%3E
> that
> getting an implementation which really works and is maintainable is
> difficult.
> We think that Kafka being able to use another widely used consensus system
> beside Zookeeper its a safer and workable solution.
> It will be easier for users to use a consensus system with Kafka that they
> are already familiar with.
>
>
> The implementation found here:
> https://github.com/banzaicloud/apache-kafka-on-
> k8s/tree/kafka-on-etcd/core/src/main/scala/kafka/etcd
> is a first version of enabling Etcd in Kafka.
> This implementation hooked in Etcd with a slight change in the existing
> interfaces. While this implementation works its far from being complete.
> Ideally existing interfaces should be reworked to abstract out the used
> consensus system.
> We opened this KIP to start a discussion and the community to have a look
> at the initial implementation and receive feedback if this initiative is
> viable.
>
> Balint
>
> 2018-03-29 11:23 GMT+02:00 Jakub Scholz :
>
> > I can understand the concerns about the plugability of Zookeeper/Etcd. It
> > would not be good for Kafka community if it splits into several groups
> > using different implementations.
> >
> > On the other hand, Zookeeper development seems to be a bit stalled. So
> > maybe there should be at least a discussion whether it makes sense to
> > replace Zookeeper with something like Etcd or not.
> >
> > JAkub
> >
> > On Wed, Mar 28, 2018 at 6:18 PM, Molnár Bálint 
> > wrote:
> >
> > > Hi all,
> > >
> > > I have created KIP-273: Kafka to support using ETCD beside Zookeeper
> > >
> > > Here is the link to the KIP:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 273+-+Kafka+to+support+using+ETCD+beside+Zookeeper
> > >
> > > Looking forward to the discussion.
> > >
> > > Thanks,
> > > Balint
> > >
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog

Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread James Cheng
Thanks Damian and Rajini for running the release! Congrats and good job 
everyone!

-James

Sent from my iPhone

> On Mar 29, 2018, at 2:27 AM, Rajini Sivaram  wrote:
> 
> The Apache Kafka community is pleased to announce the release for
> 
> Apache Kafka 1.1.0.
> 
> 
> Kafka 1.1.0 includes a number of significant new features.
> 
> Here is a summary of some notable changes:
> 
> 
> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
> 
>   that speed up controlled shutdown. ZooKeeper session expiration edge
> cases
> 
>   have also been fixed as part of this effort.
> 
> 
> ** Controller improvements also enable more partitions to be supported on a
> 
>   single cluster. KIP-227 introduced incremental fetch requests, providing
> 
>   more efficient replication when the number of partitions is large.
> 
> 
> ** KIP-113 added support for replica movement between log directories to
> 
>   enable data balancing with JBOD.
> 
> 
> ** Some of the broker configuration options like SSL keystores can now be
> 
>   updated dynamically without restarting the broker. See KIP-226 for
> details
> 
>   and the full list of dynamic configs.
> 
> 
> ** Delegation token based authentication (KIP-48) has been added to Kafka
> 
>   brokers to support large number of clients without overloading Kerberos
> 
>   KDCs or other authentication servers.
> 
> 
> ** Several new features have been added to Kafka Connect, including header
> 
>   support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST
> 
>   interface (KIP-208 and KIP-238), validation of connector names (KIP-212)
> 
>   and support for topic regex in sink connectors (KIP-215). Additionally,
> 
>   the default maximum heap size for Connect workers was increased to 2GB.
> 
> 
> ** Several improvements have been added to the Kafka Streams API, including
> 
>   reducing repartition topic partitions footprint, customizable error
> 
>   handling for produce failures and enhanced resilience to broker
> 
>   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
> 
> 
> All of the changes in this release can be found in the release notes:
> 
> 
> 
> https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html
> 
> 
> 
> 
> You can download the source release from:
> 
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz
> 
> 
> 
> and binary releases from:
> 
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz
> 
> (Scala 2.11)
> 
> 
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz
> 
> (Scala 2.12)
> 
> 
> --
> 
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> 
> ** The Producer API allows an application to publish a stream records to
> 
> one or more Kafka topics.
> 
> 
> 
> ** The Consumer API allows an application to subscribe to one or more
> 
> topics and process the stream of records produced to them.
> 
> 
> 
> ** The Streams API allows an application to act as a stream processor,
> 
> consuming an input stream from one or more topics and producing an output
> 
> stream to one or more output topics, effectively transforming the input
> 
> streams to output streams.
> 
> 
> 
> ** The Connector API allows building and running reusable producers or
> 
> consumers that connect Kafka topics to existing applications or data
> 
> systems. For example, a connector to a relational database might capture
> 
> every change to a table.three key capabilities:
> 
> 
> 
> 
> With these APIs, Kafka can be used for two broad classes of application:
> 
> ** Building real-time streaming data pipelines that reliably get data
> 
> between systems or applications.
> 
> 
> 
> ** Building real-time streaming applications that transform or react to the
> 
> streams of data.
> 
> 
> 
> 
> Apache Kafka is in use at large and small companies worldwide, including
> 
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> 
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
> 
> 
> 
> 
> A big thank you for the following 120 contributors to this release!
> 
> 
> Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
> 
> Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
> 
> Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,
> 
> Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
> 
> Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,
> 
> ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,
> 
> fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, Hugo Louro,
> 
> huxi, huxihx, Igor Kostiakov, Ismael Juma, Ivan Babrou, Jacek Laskowski,
> 
> Jakub Scholz, Jason Gustafson, Jeff Klukas, Jeff Widman, Jeremy
> Custenborder,
> 
> Jeyhun Karimov, Jiangjie (Becket) Qin, 

Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Ismael Juma
Thanks to Damian and Rajini for running the release and thanks to everyone
who helped make it happen!

Ismael

On Thu, Mar 29, 2018 at 2:27 AM, Rajini Sivaram  wrote:

> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 1.1.0.
>
>
> Kafka 1.1.0 includes a number of significant new features.
>
> Here is a summary of some notable changes:
>
>
> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
>
>that speed up controlled shutdown. ZooKeeper session expiration edge
> cases
>
>have also been fixed as part of this effort.
>
>
> ** Controller improvements also enable more partitions to be supported on a
>
>single cluster. KIP-227 introduced incremental fetch requests, providing
>
>more efficient replication when the number of partitions is large.
>
>
> ** KIP-113 added support for replica movement between log directories to
>
>enable data balancing with JBOD.
>
>
> ** Some of the broker configuration options like SSL keystores can now be
>
>updated dynamically without restarting the broker. See KIP-226 for
> details
>
>and the full list of dynamic configs.
>
>
> ** Delegation token based authentication (KIP-48) has been added to Kafka
>
>brokers to support large number of clients without overloading Kerberos
>
>KDCs or other authentication servers.
>
>
> ** Several new features have been added to Kafka Connect, including header
>
>support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST
>
>interface (KIP-208 and KIP-238), validation of connector names (KIP-212)
>
>and support for topic regex in sink connectors (KIP-215). Additionally,
>
>the default maximum heap size for Connect workers was increased to 2GB.
>
>
> ** Several improvements have been added to the Kafka Streams API, including
>
>reducing repartition topic partitions footprint, customizable error
>
>handling for produce failures and enhanced resilience to broker
>
>unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
>
>
> All of the changes in this release can be found in the release notes:
>
>
>
> https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html
>
>
>
>
> You can download the source release from:
>
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka-1.1.0-src.tgz
>
>
>
> and binary releases from:
>
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.11-1.1.0.tgz
>
> (Scala 2.11)
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.12-1.1.0.tgz
>
> (Scala 2.12)
>
>
> 
> --
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an application to act as a stream processor,
>
> consuming an input stream from one or more topics and producing an output
>
> stream to one or more output topics, effectively transforming the input
>
> streams to output streams.
>
>
>
> ** The Connector API allows building and running reusable producers or
>
> consumers that connect Kafka topics to existing applications or data
>
> systems. For example, a connector to a relational database might capture
>
> every change to a table.three key capabilities:
>
>
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
>
> between systems or applications.
>
>
>
> ** Building real-time streaming applications that transform or react to the
>
> streams of data.
>
>
>
>
> Apache Kafka is in use at large and small companies worldwide, including
>
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
>
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
>
>
>
> A big thank you for the following 120 contributors to this release!
>
>
> Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
>
> Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
>
> Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,
>
> Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
>
> Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,
>
> ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,
>
> fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, Hugo Louro,
>
> huxi, huxihx, Igor Kostiakov, Ismael Juma, Ivan Babrou, Jacek Laskowski,
>
> Jakub Scholz, Jason Gustafson, Jeff Klukas, Jeff Widman, Jeremy
> Custenborder,
>
> Jeyhun Karimov, Jiangjie (Becket) Qin, Jiangjie Qin, Jimin Hsieh, Joel
> Hamill,
>
> John Roesler, Jorge Quilcate Otoya, Jun 

Re: [DISCUSS] KIP-273 Kafka to support using ETCD beside Zookeeper

2018-03-29 Thread Molnár Bálint
Thanks, for the feedback.

Developing an internal consensus service inside Kafka would require a team
dedicated to this task.
We second what Flavio said in
https://lists.apache.org/thread.html/24ae56e073104c4531cf64f7a1f1c0a84f895d139d334c88e9fe6028@1449008733@%3Cdev.kafka.apache.org%3E
that
getting an implementation which really works and is maintainable is
difficult.
We think that Kafka being able to use another widely used consensus system
beside Zookeeper its a safer and workable solution.
It will be easier for users to use a consensus system with Kafka that they
are already familiar with.


The implementation found here:
https://github.com/banzaicloud/apache-kafka-on-k8s/tree/kafka-on-etcd/core/src/main/scala/kafka/etcd
is a first version of enabling Etcd in Kafka.
This implementation hooked in Etcd with a slight change in the existing
interfaces. While this implementation works its far from being complete.
Ideally existing interfaces should be reworked to abstract out the used
consensus system.
We opened this KIP to start a discussion and the community to have a look
at the initial implementation and receive feedback if this initiative is
viable.

Balint

2018-03-29 11:23 GMT+02:00 Jakub Scholz :

> I can understand the concerns about the plugability of Zookeeper/Etcd. It
> would not be good for Kafka community if it splits into several groups
> using different implementations.
>
> On the other hand, Zookeeper development seems to be a bit stalled. So
> maybe there should be at least a discussion whether it makes sense to
> replace Zookeeper with something like Etcd or not.
>
> JAkub
>
> On Wed, Mar 28, 2018 at 6:18 PM, Molnár Bálint 
> wrote:
>
> > Hi all,
> >
> > I have created KIP-273: Kafka to support using ETCD beside Zookeeper
> >
> > Here is the link to the KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 273+-+Kafka+to+support+using+ETCD+beside+Zookeeper
> >
> > Looking forward to the discussion.
> >
> > Thanks,
> > Balint
> >
>


[DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Edoardo Comar
Hi all,

We have submitted KIP-277 to give users permission to manage the lifecycle 
of a defined set of topics;
the current ACL checks are for permission to create *any* topic and on 
delete for permission against the *named* topics.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API

Feedback and suggestions are welcome, thanks.

Edo & Mickael
--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: 答复: [ANNOUNCE] New Committer: Dong Lin

2018-03-29 Thread Viktor Somogyi
Congrats Dong! :)

On Thu, Mar 29, 2018 at 2:12 PM, Satish Duggana 
wrote:

> Congratulations Dong!
>
>
> On Thu, Mar 29, 2018 at 5:12 PM, Sandor Murakozi 
> wrote:
>
> > Congrats, Dong!
> >
> >
> > On Thu, Mar 29, 2018 at 2:15 AM, Dong Lin  wrote:
> >
> > > Thanks everyone!!
> > >
> > > It is my great pleasure to be part of the Apache Kafka community and
> help
> > > make Apache Kafka more useful to its users. I am super excited to be a
> > > Kafka committer and I am hoping to contribute more to its design,
> > > implementation and review etc in the future.
> > >
> > > Thanks!
> > > Dong
> > >
> > > On Wed, Mar 28, 2018 at 4:04 PM, Hu Xi  wrote:
> > >
> > > > Congrats, Dong Lin!
> > > >
> > > >
> > > > 
> > > > 发件人: Matthias J. Sax 
> > > > 发送时间: 2018年3月29日 6:37
> > > > 收件人: us...@kafka.apache.org; dev@kafka.apache.org
> > > > 主题: Re: [ANNOUNCE] New Committer: Dong Lin
> > > >
> > > > Congrats!
> > > >
> > > > On 3/28/18 1:16 PM, James Cheng wrote:
> > > > > Congrats, Dong!
> > > > >
> > > > > -James
> > > > >
> > > > >> On Mar 28, 2018, at 10:58 AM, Becket Qin 
> > > wrote:
> > > > >>
> > > > >> Hello everyone,
> > > > >>
> > > > >> The PMC of Apache Kafka is pleased to announce that Dong Lin has
> > > > accepted
> > > > >> our invitation to be a new Kafka committer.
> > > > >>
> > > > >> Dong started working on Kafka about four years ago, since which he
> > has
> > > > >> contributed numerous features and patches. His work on Kafka core
> > has
> > > > been
> > > > >> consistent and important. Among his contributions, most
> noticeably,
> > > Dong
> > > > >> developed JBOD (KIP-112, KIP-113) to handle disk failures and to
> > > reduce
> > > > >> overall cost, added deleteDataBefore() API (KIP-107) to allow
> users
> > > > >> actively remove old messages. Dong has also been active in the
> > > > community,
> > > > >> participating in KIP discussions and doing code reviews.
> > > > >>
> > > > >> Congratulations and looking forward to your future contribution,
> > Dong!
> > > > >>
> > > > >> Jiangjie (Becket) Qin, on behalf of Apache Kafka PMC
> > > > >
> > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-249: Add Delegation Token Operations to Kafka Admin Client

2018-03-29 Thread Viktor Somogyi
+1 (non-binding)
Thanks for the KIP, Manikumar

On Thu, Mar 29, 2018 at 12:05 PM, Manikumar 
wrote:

> I'm bumping this up to get some attention.
>
>
> On Wed, Jan 24, 2018 at 3:36 PM, Satish Duggana 
> wrote:
>
> >  +1, thanks for the KIP.
> >
> > ~Satish.
> >
> > On Wed, Jan 24, 2018 at 5:09 AM, Jun Rao  wrote:
> >
> > > Hi, Mani,
> > >
> > > Thanks for the KIP. +1
> > >
> > > Jun
> > >
> > > On Sun, Jan 21, 2018 at 7:44 AM, Manikumar 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start a vote on KIP-249 which would add delegation
> > token
> > > > operations
> > > > to Java Admin Client.
> > > >
> > > > We have merged DelegationToken API PR recently. We want to include
> > admin
> > > > client changes in the upcoming release. This will make the feature
> > > > complete.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > > >
> > > > Thanks,
> > > >
> > >
> >
>


[jira] [Created] (KAFKA-6727) org.apache.kafka.clients.admin.Config has broken equals and hashCode method.

2018-03-29 Thread Andy Coates (JIRA)
Andy Coates created KAFKA-6727:
--

 Summary: org.apache.kafka.clients.admin.Config has broken equals 
and hashCode method.
 Key: KAFKA-6727
 URL: https://issues.apache.org/jira/browse/KAFKA-6727
 Project: Kafka
  Issue Type: Improvement
  Components: clients, tools
Affects Versions: 1.1.0
Reporter: Andy Coates
Assignee: Andy Coates


`Config` makes use of `Collections.unmodifiableCollection` to wrap the supplied 
entries to make it immutable. Unfortunately, this breaks `hashCode` and 
`equals`.

>From Java docs:

> The returned collection does _not_ pass the hashCode and equals operations 
>through to the backing collection, but relies on {{Object}}'s {{equals}} and 
>{{hashCode}} methods.

See: 
https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#unmodifiableCollection(java.util.Collection)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-29 Thread Manikumar
+1 (non-binding)

On Thu, Mar 29, 2018 at 6:16 PM, Thomas Crayford 
wrote:

> +1 (non-binding)
>
> On Wed, Mar 28, 2018 at 9:15 PM, Ted Yu  wrote:
>
> > +1
> >
> > On Wed, Mar 28, 2018 at 12:05 PM, Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > wrote:
> >
> > > +1 (non binding)
> > > Thanks for the KIP
> > >
> > > On Wed, Mar 28, 2018 at 6:25 PM, Gwen Shapira 
> wrote:
> > > > +1 (binding)
> > > >
> > > > On Wed, Mar 28, 2018 at 9:55 AM, Allen Wang 
> > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I would like to start voting for KIP-272:  Add API version tag to
> > > broker's
> > > >> RequestsPerSec metric.
> > > >>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
> > > >>
> > > >> Thanks,
> > > >> Allen
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *Gwen Shapira*
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter  | blog
> > > > 
> > >
> >
>


Re: [VOTE] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-29 Thread Thomas Crayford
+1 (non-binding)

On Wed, Mar 28, 2018 at 9:15 PM, Ted Yu  wrote:

> +1
>
> On Wed, Mar 28, 2018 at 12:05 PM, Mickael Maison  >
> wrote:
>
> > +1 (non binding)
> > Thanks for the KIP
> >
> > On Wed, Mar 28, 2018 at 6:25 PM, Gwen Shapira  wrote:
> > > +1 (binding)
> > >
> > > On Wed, Mar 28, 2018 at 9:55 AM, Allen Wang 
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I would like to start voting for KIP-272:  Add API version tag to
> > broker's
> > >> RequestsPerSec metric.
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
> > >>
> > >> Thanks,
> > >> Allen
> > >>
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter  | blog
> > > 
> >
>


[jira] [Created] (KAFKA-6726) KIP-277 - Fine Grained ACL for CreateTopics A

2018-03-29 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-6726:


 Summary: KIP-277 - Fine Grained ACL for CreateTopics A
 Key: KAFKA-6726
 URL: https://issues.apache.org/jira/browse/KAFKA-6726
 Project: Kafka
  Issue Type: Improvement
  Components: core, tools
Reporter: Edoardo Comar
Assignee: Edoardo Comar


issue to track implementation of 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-03-29 Thread Jan Filipiak

Hi Dong.

I do disagree here. I think we are fooling ourselfs!

This KIP lays the foundation for the wrong path and
I am heavily against it. We should have state full producers in mind at this
very moment.

One can clearly see that you are on a wrong path when you talk about taking
away custom partitioner for this usecase!

This is s wrong!

Please reiterate my idea, try to understand it or ask me what is not clear
but don't continue with this assumption.

best Jan



On 28.03.2018 08:34, Dong Lin wrote:

Hey John,

Great! Thanks for all the comment. It seems that we agree that the current
KIP is in good shape for core Kafka. IMO, what we have been discussing in
the recent email exchanges is mostly about the second step, i.e. how to
address problem for the stream use-case (or stateful processing in general).

I will comment inline.




On Tue, Mar 27, 2018 at 4:38 PM, John Roesler  wrote:


Thanks for the response, Dong.

Here are my answers to your questions:

- "Asking producers and consumers, or even two different producers, to

share code like the partition function is a pretty huge ask. What if they
are using different languages?". It seems that today we already require
different producer's to use the same hash function -- otherwise messages
with the same key will go to different partitions of the same topic which
may cause problem for downstream consumption. So not sure if it adds any
more constraint by assuming consumers know the hash function of producer.
Could you explain more why user would want to use a cusmtom partition
function? Maybe we can check if this is something that can be supported

in

the default Kafka hash function. Also, can you explain more why it is
difficuilt to implement the same hash function in different languages?


Sorry, I meant two different producers as in producers to two different
topics. This was in response to the suggestion that we already require
coordination among producers to different topics in order to achieve
co-partitioning. I was saying that we do not (and should not).


It is probably common for producers of different team to produce message to
the same topic. In order to ensure that messages with the same key go to
same partition, we need producers of different team to share the same
partition algorithm, which by definition requires coordination among
producers of different teams in an organization. Even for producers of
different topics, it may be common to require producers to use the same
partition algorithm in order to join two topics for stream processing. Does
this make it reasonable to say we already require coordination across
producers?



By design, consumers are currently ignorant of the partitioning scheme. It
suffices to trust that the producer has partitioned the topic by key, if
they claim to have done so. If you don't trust that, or even if you just
need some other partitioning scheme, then you must re-partition it
yourself. Nothing we're discussing can or should change that. The value of
backfill is that it preserves the ability for consumers to avoid
re-partitioning before consuming, in the case where they don't need to
today.



Regarding shared "hash functions", note that it's a bit inaccurate to talk
about the "hash function" of the producer. Properly speaking, the producer
has only a "partition function". We do not know that it is a hash. The
producer can use any method at their disposal to assign a partition to a
record. The partition function obviously may we written in any programming
language, so in general it's not something that can be shared around
without a formal spec or the ability to execute arbitrary executables in
arbitrary runtime environments.


Yeah it is probably better to say partition algorithm. I guess it should
not be difficult to implement same partition algorithms in different
languages, right? Yes we would need a formal specification of the default
partition algorithm in the producer. I think that can be documented as part
of the producer interface.



Why would a producer want a custom partition function? I don't know... why
did we design the interface so that our users can provide one? In general,
such systems provide custom partitioners because some data sets may be
unbalanced under the default or because they can provide some interesting
functionality built on top of the partitioning scheme, etc. Having provided
this ability, I don't know why we would remove it.


Yeah it is reasonable to assume that there was reason to support custom
partition function in producer. On the other hand it may also be reasonable
to revisit this interface and discuss whether we actually need to support
custom partition function. If we don't have a good reason, we can choose
not to support custom partition function in this KIP in a backward
compatible manner, i.e. user can still use custom partition function but
they would not get the benefit of in-order delivery when there is partition
expansion. What 

Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-03-29 Thread zhenya Sun


+1 (non-binding)


| |
zhenya Sun
邮箱:toke...@126.com
|

签名由 网易邮箱大师 定制

On 03/29/2018 19:40, Sandor Murakozi wrote:
+1 (non-binding)

Thanks for the KIP, Viktor

On Wed, Mar 21, 2018 at 5:41 PM, Viktor Somogyi 
wrote:

> Hi Everyone,
>
> I've started a vote on KIP-248
>  ConfigCommand+That+Uses+The+New+AdminClient#KIP-248-
> CreateNewConfigCommandThatUsesTheNewAdminClient-DescribeQuotas>
> a few weeks ago but at the time I got a couple more comments and it was
> very close to 1.1 feature freeze, people were occupied with that, so I
> wanted to restart the vote on this.
>
>
> *Summary of the KIP*
> For those who don't have context I thought I'd summarize it in a few
> sentence.
> *Problem & Motivation: *The basic problem that the KIP tries to solve is
> that kafka-configs.sh (which in turn uses the ConfigCommand class) uses a
> direct zookeeper connection. This is not desirable as getting around the
> broker opens up security issues and prevents the tool from being used in
> deployments where only the brokers are exposed to clients. Also a somewhat
> smaller motivation is to rewrite the tool in java as part of the tools
> component so we can get rid of requiring the core module on the classpath
> for the kafka-configs tool.
> *Solution:*
> - I've designed new 2 protocols: DescribeQuotas and AlterQuotas.
> - Also redesigned the output format of the command line tool so it provides
> a nicer result.
> - kafka-configs.[sh/bat] will use a new java based ConfigCommand that is
> placed in tools.
>
>
> I'd be happy to receive any votes or feedback on this.
>
> Regards,
> Viktor
>


Re: [VOTE] #2 KIP-248: Create New ConfigCommand That Uses The New AdminClient

2018-03-29 Thread Sandor Murakozi
+1 (non-binding)

Thanks for the KIP, Viktor

On Wed, Mar 21, 2018 at 5:41 PM, Viktor Somogyi 
wrote:

> Hi Everyone,
>
> I've started a vote on KIP-248
>  ConfigCommand+That+Uses+The+New+AdminClient#KIP-248-
> CreateNewConfigCommandThatUsesTheNewAdminClient-DescribeQuotas>
> a few weeks ago but at the time I got a couple more comments and it was
> very close to 1.1 feature freeze, people were occupied with that, so I
> wanted to restart the vote on this.
>
>
> *Summary of the KIP*
> For those who don't have context I thought I'd summarize it in a few
> sentence.
> *Problem & Motivation: *The basic problem that the KIP tries to solve is
> that kafka-configs.sh (which in turn uses the ConfigCommand class) uses a
> direct zookeeper connection. This is not desirable as getting around the
> broker opens up security issues and prevents the tool from being used in
> deployments where only the brokers are exposed to clients. Also a somewhat
> smaller motivation is to rewrite the tool in java as part of the tools
> component so we can get rid of requiring the core module on the classpath
> for the kafka-configs tool.
> *Solution:*
> - I've designed new 2 protocols: DescribeQuotas and AlterQuotas.
> - Also redesigned the output format of the command line tool so it provides
> a nicer result.
> - kafka-configs.[sh/bat] will use a new java based ConfigCommand that is
> placed in tools.
>
>
> I'd be happy to receive any votes or feedback on this.
>
> Regards,
> Viktor
>


Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Edoardo Comar
Great to hear! thanks for driving the release process.
--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN



From:   Mickael Maison 
To: Users 
Cc: kafka-clients , dev 

Date:   29/03/2018 10:46
Subject:Re: [ANNOUNCE] Apache Kafka 1.1.0 Released



Great news, thanks Damian and Rajini for running this release!

On Thu, Mar 29, 2018 at 10:33 AM, Rajini Sivaram
 wrote:
> Resending to kaka-clients group:
>
> -- Forwarded message --
> From: Rajini Sivaram 
> Date: Thu, Mar 29, 2018 at 10:27 AM
> Subject: [ANNOUNCE] Apache Kafka 1.1.0 Released
> To: annou...@apache.org, Users , dev <
> dev@kafka.apache.org>, kafka-clients 
>
>
> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 1.1.0.
>
>
> Kafka 1.1.0 includes a number of significant new features.
>
> Here is a summary of some notable changes:
>
>
> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
>
>that speed up controlled shutdown. ZooKeeper session expiration edge
> cases
>
>have also been fixed as part of this effort.
>
>
> ** Controller improvements also enable more partitions to be supported 
on a
>
>single cluster. KIP-227 introduced incremental fetch requests, 
providing
>
>more efficient replication when the number of partitions is large.
>
>
> ** KIP-113 added support for replica movement between log directories to
>
>enable data balancing with JBOD.
>
>
> ** Some of the broker configuration options like SSL keystores can now 
be
>
>updated dynamically without restarting the broker. See KIP-226 for
> details
>
>and the full list of dynamic configs.
>
>
> ** Delegation token based authentication (KIP-48) has been added to 
Kafka
>
>brokers to support large number of clients without overloading 
Kerberos
>
>KDCs or other authentication servers.
>
>
> ** Several new features have been added to Kafka Connect, including 
header
>
>support (KIP-145), SSL and Kafka cluster identifiers in the Connect 
REST
>
>interface (KIP-208 and KIP-238), validation of connector names 
(KIP-212)
>
>and support for topic regex in sink connectors (KIP-215). 
Additionally,
>
>the default maximum heap size for Connect workers was increased to 
2GB.
>
>
> ** Several improvements have been added to the Kafka Streams API, 
including
>
>reducing repartition topic partitions footprint, customizable error
>
>handling for produce failures and enhanced resilience to broker
>
>unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
>
>
> All of the changes in this release can be found in the release notes:
>
>
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_kafka_1.1.0_RELEASE-5FNOTES.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=hcT9l8smi-Mzd7IISQuozrFaicWvFgNLeI3qS-iAH5I=K-fcOSRqIsNLv7Ffi2OLvPk1BrdrmxRaM0O9bUUvzFY=

>
>
>
>
> You can download the source release from:
>
>
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_dyn_closer.cgi-3Fpath-3D_kafka_1.1.0_kafka-2D1.1.0-2Dsrc.tgz=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=hcT9l8smi-Mzd7IISQuozrFaicWvFgNLeI3qS-iAH5I=ngOY3Ljm4YLxr-prOs8mDRjvDSLo-Wtq_i6ttpcwTPg=

>
>
>
> and binary releases from:
>
>
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_dyn_closer.cgi-3Fpath-3D_kafka_1.1.0_kafka-5F2.11-2D1.1.0.tgz=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=hcT9l8smi-Mzd7IISQuozrFaicWvFgNLeI3qS-iAH5I=VKdrqsCBxq9gqE4lOyULufMnALwmReTva42dx5NiuUk=

>
> (Scala 2.11)
>
>
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_dyn_closer.cgi-3Fpath-3D_kafka_1.1.0_kafka-5F2.12-2D1.1.0.tgz=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ=hcT9l8smi-Mzd7IISQuozrFaicWvFgNLeI3qS-iAH5I=IS_fkPnoQlL2-dHZzVGTJajOtrUFrRoi0r37D0O5qL8=

>
> (Scala 2.12)
>
>
> 
> --
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an application to act as a stream processor,
>
> consuming an input stream from one or more topics and producing an 
output
>
> stream to one or more output topics, effectively transforming the input
>
> streams to output streams.
>
>
>
> ** The Connector API allows building and running reusable producers or
>
> 

Re: [VOTE] KIP-249: Add Delegation Token Operations to Kafka Admin Client

2018-03-29 Thread Manikumar
I'm bumping this up to get some attention.


On Wed, Jan 24, 2018 at 3:36 PM, Satish Duggana 
wrote:

>  +1, thanks for the KIP.
>
> ~Satish.
>
> On Wed, Jan 24, 2018 at 5:09 AM, Jun Rao  wrote:
>
> > Hi, Mani,
> >
> > Thanks for the KIP. +1
> >
> > Jun
> >
> > On Sun, Jan 21, 2018 at 7:44 AM, Manikumar 
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start a vote on KIP-249 which would add delegation
> token
> > > operations
> > > to Java Admin Client.
> > >
> > > We have merged DelegationToken API PR recently. We want to include
> admin
> > > client changes in the upcoming release. This will make the feature
> > > complete.
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > >
> > > Thanks,
> > >
> >
>


Re: [VOTE] KIP-257 - Configurable Quota Management

2018-03-29 Thread Rajini Sivaram
The vote has passed with three binding votes (Jun, Gwen, Rajini) and 5
non-binding votes (Ted, Edo, Mickael, Viktor, Kaushik). Many thanks to all
of you for the feedback and votes.

I will update the KIP page and get the PR ready for review.

Many thanks,

Rajini


On Wed, Mar 28, 2018 at 6:42 PM, Gwen Shapira  wrote:

> +1
>
> On Thu, Mar 22, 2018 at 2:56 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I would like to start vote on KIP-257 to enable customisation of client
> > quota computation:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 257+-+Configurable+Quota+Management
> >
> > The KIP proposes to make quota management pluggable to enable group-based
> > and partition-based quotas for clients.
> >
> >
> > Thanks,
> >
> >
> > Rajini
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Manikumar
Thanks for running the release.

On Thu, Mar 29, 2018 at 3:21 PM, zhenya Sun  wrote:

> good !!
>
> > 在 2018年3月29日,下午5:45,Mickael Maison  写道:
> >
> > Great news, thanks Damian and Rajini for running this release!
> >
> > On Thu, Mar 29, 2018 at 10:33 AM, Rajini Sivaram
> >  wrote:
> >> Resending to kaka-clients group:
> >>
> >> -- Forwarded message --
> >> From: Rajini Sivaram 
> >> Date: Thu, Mar 29, 2018 at 10:27 AM
> >> Subject: [ANNOUNCE] Apache Kafka 1.1.0 Released
> >> To: annou...@apache.org, Users , dev <
> >> dev@kafka.apache.org>, kafka-clients 
> >>
> >>
> >> The Apache Kafka community is pleased to announce the release for
> >>
> >> Apache Kafka 1.1.0.
> >>
> >>
> >> Kafka 1.1.0 includes a number of significant new features.
> >>
> >> Here is a summary of some notable changes:
> >>
> >>
> >> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
> >>
> >>   that speed up controlled shutdown. ZooKeeper session expiration edge
> >> cases
> >>
> >>   have also been fixed as part of this effort.
> >>
> >>
> >> ** Controller improvements also enable more partitions to be supported
> on a
> >>
> >>   single cluster. KIP-227 introduced incremental fetch requests,
> providing
> >>
> >>   more efficient replication when the number of partitions is large.
> >>
> >>
> >> ** KIP-113 added support for replica movement between log directories to
> >>
> >>   enable data balancing with JBOD.
> >>
> >>
> >> ** Some of the broker configuration options like SSL keystores can now
> be
> >>
> >>   updated dynamically without restarting the broker. See KIP-226 for
> >> details
> >>
> >>   and the full list of dynamic configs.
> >>
> >>
> >> ** Delegation token based authentication (KIP-48) has been added to
> Kafka
> >>
> >>   brokers to support large number of clients without overloading
> Kerberos
> >>
> >>   KDCs or other authentication servers.
> >>
> >>
> >> ** Several new features have been added to Kafka Connect, including
> header
> >>
> >>   support (KIP-145), SSL and Kafka cluster identifiers in the Connect
> REST
> >>
> >>   interface (KIP-208 and KIP-238), validation of connector names
> (KIP-212)
> >>
> >>   and support for topic regex in sink connectors (KIP-215).
> Additionally,
> >>
> >>   the default maximum heap size for Connect workers was increased to
> 2GB.
> >>
> >>
> >> ** Several improvements have been added to the Kafka Streams API,
> including
> >>
> >>   reducing repartition topic partitions footprint, customizable error
> >>
> >>   handling for produce failures and enhanced resilience to broker
> >>
> >>   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
> >>
> >>
> >> All of the changes in this release can be found in the release notes:
> >>
> >>
> >>
> >> https://dist.apache.org/repos/dist/release/kafka/1.1.0/
> RELEASE_NOTES.html
> >>
> >>
> >>
> >>
> >> You can download the source release from:
> >>
> >>
> >>
> >> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka-1.1.0-src.tgz
> >>
> >>
> >>
> >> and binary releases from:
> >>
> >>
> >>
> >> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.11-1.1.0.tgz
> >>
> >> (Scala 2.11)
> >>
> >>
> >> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/
> kafka_2.12-1.1.0.tgz
> >>
> >> (Scala 2.12)
> >>
> >>
> >> 
> >> --
> >>
> >>
> >>
> >> Apache Kafka is a distributed streaming platform with four core APIs:
> >>
> >>
> >>
> >> ** The Producer API allows an application to publish a stream records to
> >>
> >> one or more Kafka topics.
> >>
> >>
> >>
> >> ** The Consumer API allows an application to subscribe to one or more
> >>
> >> topics and process the stream of records produced to them.
> >>
> >>
> >>
> >> ** The Streams API allows an application to act as a stream processor,
> >>
> >> consuming an input stream from one or more topics and producing an
> output
> >>
> >> stream to one or more output topics, effectively transforming the input
> >>
> >> streams to output streams.
> >>
> >>
> >>
> >> ** The Connector API allows building and running reusable producers or
> >>
> >> consumers that connect Kafka topics to existing applications or data
> >>
> >> systems. For example, a connector to a relational database might capture
> >>
> >> every change to a table.three key capabilities:
> >>
> >>
> >>
> >>
> >> With these APIs, Kafka can be used for two broad classes of application:
> >>
> >> ** Building real-time streaming data pipelines that reliably get data
> >>
> >> between systems or applications.
> >>
> >>
> >>
> >> ** Building real-time streaming applications that transform or react to
> the
> >>
> >> streams of data.
> >>
> >>
> >>
> >>
> >> Apache Kafka is in use at large and small companies worldwide, including
> >>
> >> Capital 

Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread zhenya Sun
good !!

> 在 2018年3月29日,下午5:45,Mickael Maison  写道:
> 
> Great news, thanks Damian and Rajini for running this release!
> 
> On Thu, Mar 29, 2018 at 10:33 AM, Rajini Sivaram
>  wrote:
>> Resending to kaka-clients group:
>> 
>> -- Forwarded message --
>> From: Rajini Sivaram 
>> Date: Thu, Mar 29, 2018 at 10:27 AM
>> Subject: [ANNOUNCE] Apache Kafka 1.1.0 Released
>> To: annou...@apache.org, Users , dev <
>> dev@kafka.apache.org>, kafka-clients 
>> 
>> 
>> The Apache Kafka community is pleased to announce the release for
>> 
>> Apache Kafka 1.1.0.
>> 
>> 
>> Kafka 1.1.0 includes a number of significant new features.
>> 
>> Here is a summary of some notable changes:
>> 
>> 
>> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
>> 
>>   that speed up controlled shutdown. ZooKeeper session expiration edge
>> cases
>> 
>>   have also been fixed as part of this effort.
>> 
>> 
>> ** Controller improvements also enable more partitions to be supported on a
>> 
>>   single cluster. KIP-227 introduced incremental fetch requests, providing
>> 
>>   more efficient replication when the number of partitions is large.
>> 
>> 
>> ** KIP-113 added support for replica movement between log directories to
>> 
>>   enable data balancing with JBOD.
>> 
>> 
>> ** Some of the broker configuration options like SSL keystores can now be
>> 
>>   updated dynamically without restarting the broker. See KIP-226 for
>> details
>> 
>>   and the full list of dynamic configs.
>> 
>> 
>> ** Delegation token based authentication (KIP-48) has been added to Kafka
>> 
>>   brokers to support large number of clients without overloading Kerberos
>> 
>>   KDCs or other authentication servers.
>> 
>> 
>> ** Several new features have been added to Kafka Connect, including header
>> 
>>   support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST
>> 
>>   interface (KIP-208 and KIP-238), validation of connector names (KIP-212)
>> 
>>   and support for topic regex in sink connectors (KIP-215). Additionally,
>> 
>>   the default maximum heap size for Connect workers was increased to 2GB.
>> 
>> 
>> ** Several improvements have been added to the Kafka Streams API, including
>> 
>>   reducing repartition topic partitions footprint, customizable error
>> 
>>   handling for produce failures and enhanced resilience to broker
>> 
>>   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
>> 
>> 
>> All of the changes in this release can be found in the release notes:
>> 
>> 
>> 
>> https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html
>> 
>> 
>> 
>> 
>> You can download the source release from:
>> 
>> 
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz
>> 
>> 
>> 
>> and binary releases from:
>> 
>> 
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz
>> 
>> (Scala 2.11)
>> 
>> 
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz
>> 
>> (Scala 2.12)
>> 
>> 
>> 
>> --
>> 
>> 
>> 
>> Apache Kafka is a distributed streaming platform with four core APIs:
>> 
>> 
>> 
>> ** The Producer API allows an application to publish a stream records to
>> 
>> one or more Kafka topics.
>> 
>> 
>> 
>> ** The Consumer API allows an application to subscribe to one or more
>> 
>> topics and process the stream of records produced to them.
>> 
>> 
>> 
>> ** The Streams API allows an application to act as a stream processor,
>> 
>> consuming an input stream from one or more topics and producing an output
>> 
>> stream to one or more output topics, effectively transforming the input
>> 
>> streams to output streams.
>> 
>> 
>> 
>> ** The Connector API allows building and running reusable producers or
>> 
>> consumers that connect Kafka topics to existing applications or data
>> 
>> systems. For example, a connector to a relational database might capture
>> 
>> every change to a table.three key capabilities:
>> 
>> 
>> 
>> 
>> With these APIs, Kafka can be used for two broad classes of application:
>> 
>> ** Building real-time streaming data pipelines that reliably get data
>> 
>> between systems or applications.
>> 
>> 
>> 
>> ** Building real-time streaming applications that transform or react to the
>> 
>> streams of data.
>> 
>> 
>> 
>> 
>> Apache Kafka is in use at large and small companies worldwide, including
>> 
>> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
>> 
>> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>> 
>> 
>> 
>> 
>> A big thank you for the following 120 contributors to this release!
>> 
>> 
>> Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
>> 
>> Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
>> 

Re: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Mickael Maison
Great news, thanks Damian and Rajini for running this release!

On Thu, Mar 29, 2018 at 10:33 AM, Rajini Sivaram
 wrote:
> Resending to kaka-clients group:
>
> -- Forwarded message --
> From: Rajini Sivaram 
> Date: Thu, Mar 29, 2018 at 10:27 AM
> Subject: [ANNOUNCE] Apache Kafka 1.1.0 Released
> To: annou...@apache.org, Users , dev <
> dev@kafka.apache.org>, kafka-clients 
>
>
> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 1.1.0.
>
>
> Kafka 1.1.0 includes a number of significant new features.
>
> Here is a summary of some notable changes:
>
>
> ** Kafka 1.1.0 includes significant improvements to the Kafka Controller
>
>that speed up controlled shutdown. ZooKeeper session expiration edge
> cases
>
>have also been fixed as part of this effort.
>
>
> ** Controller improvements also enable more partitions to be supported on a
>
>single cluster. KIP-227 introduced incremental fetch requests, providing
>
>more efficient replication when the number of partitions is large.
>
>
> ** KIP-113 added support for replica movement between log directories to
>
>enable data balancing with JBOD.
>
>
> ** Some of the broker configuration options like SSL keystores can now be
>
>updated dynamically without restarting the broker. See KIP-226 for
> details
>
>and the full list of dynamic configs.
>
>
> ** Delegation token based authentication (KIP-48) has been added to Kafka
>
>brokers to support large number of clients without overloading Kerberos
>
>KDCs or other authentication servers.
>
>
> ** Several new features have been added to Kafka Connect, including header
>
>support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST
>
>interface (KIP-208 and KIP-238), validation of connector names (KIP-212)
>
>and support for topic regex in sink connectors (KIP-215). Additionally,
>
>the default maximum heap size for Connect workers was increased to 2GB.
>
>
> ** Several improvements have been added to the Kafka Streams API, including
>
>reducing repartition topic partitions footprint, customizable error
>
>handling for produce failures and enhanced resilience to broker
>
>unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.
>
>
> All of the changes in this release can be found in the release notes:
>
>
>
> https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html
>
>
>
>
> You can download the source release from:
>
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz
>
>
>
> and binary releases from:
>
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz
>
> (Scala 2.11)
>
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz
>
> (Scala 2.12)
>
>
> 
> --
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an application to act as a stream processor,
>
> consuming an input stream from one or more topics and producing an output
>
> stream to one or more output topics, effectively transforming the input
>
> streams to output streams.
>
>
>
> ** The Connector API allows building and running reusable producers or
>
> consumers that connect Kafka topics to existing applications or data
>
> systems. For example, a connector to a relational database might capture
>
> every change to a table.three key capabilities:
>
>
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
>
> between systems or applications.
>
>
>
> ** Building real-time streaming applications that transform or react to the
>
> streams of data.
>
>
>
>
> Apache Kafka is in use at large and small companies worldwide, including
>
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
>
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
>
>
>
> A big thank you for the following 120 contributors to this release!
>
>
> Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,
>
> Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,
>
> Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,
>
> Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
>
> Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,
>
> ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,
>
> fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, 

Fwd: [ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Rajini Sivaram
Resending to kaka-clients group:

-- Forwarded message --
From: Rajini Sivaram 
Date: Thu, Mar 29, 2018 at 10:27 AM
Subject: [ANNOUNCE] Apache Kafka 1.1.0 Released
To: annou...@apache.org, Users , dev <
dev@kafka.apache.org>, kafka-clients 


The Apache Kafka community is pleased to announce the release for

Apache Kafka 1.1.0.


Kafka 1.1.0 includes a number of significant new features.

Here is a summary of some notable changes:


** Kafka 1.1.0 includes significant improvements to the Kafka Controller

   that speed up controlled shutdown. ZooKeeper session expiration edge
cases

   have also been fixed as part of this effort.


** Controller improvements also enable more partitions to be supported on a

   single cluster. KIP-227 introduced incremental fetch requests, providing

   more efficient replication when the number of partitions is large.


** KIP-113 added support for replica movement between log directories to

   enable data balancing with JBOD.


** Some of the broker configuration options like SSL keystores can now be

   updated dynamically without restarting the broker. See KIP-226 for
details

   and the full list of dynamic configs.


** Delegation token based authentication (KIP-48) has been added to Kafka

   brokers to support large number of clients without overloading Kerberos

   KDCs or other authentication servers.


** Several new features have been added to Kafka Connect, including header

   support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST

   interface (KIP-208 and KIP-238), validation of connector names (KIP-212)

   and support for topic regex in sink connectors (KIP-215). Additionally,

   the default maximum heap size for Connect workers was increased to 2GB.


** Several improvements have been added to the Kafka Streams API, including

   reducing repartition topic partitions footprint, customizable error

   handling for produce failures and enhanced resilience to broker

   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.


All of the changes in this release can be found in the release notes:



https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html




You can download the source release from:



https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz



and binary releases from:



https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz

(Scala 2.11)


https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz

(Scala 2.12)



--



Apache Kafka is a distributed streaming platform with four core APIs:



** The Producer API allows an application to publish a stream records to

one or more Kafka topics.



** The Consumer API allows an application to subscribe to one or more

topics and process the stream of records produced to them.



** The Streams API allows an application to act as a stream processor,

consuming an input stream from one or more topics and producing an output

stream to one or more output topics, effectively transforming the input

streams to output streams.



** The Connector API allows building and running reusable producers or

consumers that connect Kafka topics to existing applications or data

systems. For example, a connector to a relational database might capture

every change to a table.three key capabilities:




With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data

between systems or applications.



** Building real-time streaming applications that transform or react to the

streams of data.




Apache Kafka is in use at large and small companies worldwide, including

Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,

Target, The New York Times, Uber, Yelp, and Zalando, among others.




A big thank you for the following 120 contributors to this release!


Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,

Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,

Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,

Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,

Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,

ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,

fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, Hugo Louro,

huxi, huxihx, Igor Kostiakov, Ismael Juma, Ivan Babrou, Jacek Laskowski,

Jakub Scholz, Jason Gustafson, Jeff Klukas, Jeff Widman, Jeremy
Custenborder,

Jeyhun Karimov, Jiangjie (Becket) Qin, Jiangjie Qin, Jimin Hsieh, Joel
Hamill,

John Roesler, Jorge Quilcate Otoya, Jun Rao, Kamal C, Kamil Szymański,

Koen De Groote, Konstantine Karantasis, lisa2lisa, Logan Buckley,

Magnus Edenhill, Magnus 

[ANNOUNCE] Apache Kafka 1.1.0 Released

2018-03-29 Thread Rajini Sivaram
The Apache Kafka community is pleased to announce the release for

Apache Kafka 1.1.0.


Kafka 1.1.0 includes a number of significant new features.

Here is a summary of some notable changes:


** Kafka 1.1.0 includes significant improvements to the Kafka Controller

   that speed up controlled shutdown. ZooKeeper session expiration edge
cases

   have also been fixed as part of this effort.


** Controller improvements also enable more partitions to be supported on a

   single cluster. KIP-227 introduced incremental fetch requests, providing

   more efficient replication when the number of partitions is large.


** KIP-113 added support for replica movement between log directories to

   enable data balancing with JBOD.


** Some of the broker configuration options like SSL keystores can now be

   updated dynamically without restarting the broker. See KIP-226 for
details

   and the full list of dynamic configs.


** Delegation token based authentication (KIP-48) has been added to Kafka

   brokers to support large number of clients without overloading Kerberos

   KDCs or other authentication servers.


** Several new features have been added to Kafka Connect, including header

   support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST

   interface (KIP-208 and KIP-238), validation of connector names (KIP-212)

   and support for topic regex in sink connectors (KIP-215). Additionally,

   the default maximum heap size for Connect workers was increased to 2GB.


** Several improvements have been added to the Kafka Streams API, including

   reducing repartition topic partitions footprint, customizable error

   handling for produce failures and enhanced resilience to broker

   unavailability.  See KIPs 205, 210, 220, 224 and 239 for details.


All of the changes in this release can be found in the release notes:



https://dist.apache.org/repos/dist/release/kafka/1.1.0/RELEASE_NOTES.html




You can download the source release from:



https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka-1.1.0-src.tgz



and binary releases from:



https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.11-1.1.0.tgz

(Scala 2.11)


https://www.apache.org/dyn/closer.cgi?path=/kafka/1.1.0/kafka_2.12-1.1.0.tgz

(Scala 2.12)


--



Apache Kafka is a distributed streaming platform with four core APIs:



** The Producer API allows an application to publish a stream records to

one or more Kafka topics.



** The Consumer API allows an application to subscribe to one or more

topics and process the stream of records produced to them.



** The Streams API allows an application to act as a stream processor,

consuming an input stream from one or more topics and producing an output

stream to one or more output topics, effectively transforming the input

streams to output streams.



** The Connector API allows building and running reusable producers or

consumers that connect Kafka topics to existing applications or data

systems. For example, a connector to a relational database might capture

every change to a table.three key capabilities:




With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data

between systems or applications.



** Building real-time streaming applications that transform or react to the

streams of data.




Apache Kafka is in use at large and small companies worldwide, including

Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,

Target, The New York Times, Uber, Yelp, and Zalando, among others.




A big thank you for the following 120 contributors to this release!


Adem Efe Gencer, Alex Good, Andras Beni, Andy Bryant, Antony Stubbs,

Apurva Mehta, Arjun Satish, bartdevylder, Bill Bejeck, Charly Molter,

Chris Egerton, Clemens Valiente, cmolter, Colin P. Mccabe,

Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,

Daniel Wojda, Derrick Or, Dmitry Minkovsky, Dong Lin, Edoardo Comar,

ekenny, Elyahou, Eugene Sevastyanov, Ewen Cheslack-Postava, Filipe Agapito,

fredfp, Gavrie Philipson, Gunnar Morling, Guozhang Wang, hmcl, Hugo Louro,

huxi, huxihx, Igor Kostiakov, Ismael Juma, Ivan Babrou, Jacek Laskowski,

Jakub Scholz, Jason Gustafson, Jeff Klukas, Jeff Widman, Jeremy
Custenborder,

Jeyhun Karimov, Jiangjie (Becket) Qin, Jiangjie Qin, Jimin Hsieh, Joel
Hamill,

John Roesler, Jorge Quilcate Otoya, Jun Rao, Kamal C, Kamil Szymański,

Koen De Groote, Konstantine Karantasis, lisa2lisa, Logan Buckley,

Magnus Edenhill, Magnus Reftel, Manikumar Reddy, Manikumar Reddy O,
manjuapu,

Manjula K, Mats Julian Olsen, Matt Farmer, Matthias J. Sax,

Matthias Wessendorf, Max Zheng, Maytee Chinavanichkit, Mickael Maison,
Mikkin,

mulvenna, Narendra kumar, Nick Chiu, Onur Karaman, Panuwat Anawatmongkhon,

Paolo Patierno, parafiend, ppatierno, Prasanna Gautam, Radai 

Re: [DISCUSS] KIP-273 Kafka to support using ETCD beside Zookeeper

2018-03-29 Thread Jakub Scholz
I can understand the concerns about the plugability of Zookeeper/Etcd. It
would not be good for Kafka community if it splits into several groups
using different implementations.

On the other hand, Zookeeper development seems to be a bit stalled. So
maybe there should be at least a discussion whether it makes sense to
replace Zookeeper with something like Etcd or not.

JAkub

On Wed, Mar 28, 2018 at 6:18 PM, Molnár Bálint 
wrote:

> Hi all,
>
> I have created KIP-273: Kafka to support using ETCD beside Zookeeper
>
> Here is the link to the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 273+-+Kafka+to+support+using+ETCD+beside+Zookeeper
>
> Looking forward to the discussion.
>
> Thanks,
> Balint
>


Re: [ANNOUNCE] New Committer: Dong Lin

2018-03-29 Thread Attila Sasvári
Congratulations Dong. Keep up the good work!



Becket Qin  ezt írta (időpont: 2018. márc. 28., Sze
19:58):

> Hello everyone,
>
> The PMC of Apache Kafka is pleased to announce that Dong Lin has accepted
> our invitation to be a new Kafka committer.
>
> Dong started working on Kafka about four years ago, since which he has
> contributed numerous features and patches. His work on Kafka core has been
> consistent and important. Among his contributions, most noticeably, Dong
> developed JBOD (KIP-112, KIP-113) to handle disk failures and to reduce
> overall cost, added deleteDataBefore() API (KIP-107) to allow users
> actively remove old messages. Dong has also been active in the community,
> participating in KIP discussions and doing code reviews.
>
> Congratulations and looking forward to your future contribution, Dong!
>
> Jiangjie (Becket) Qin, on behalf of Apache Kafka PMC
>


Re: 答复: [ANNOUNCE] New Committer: Dong Lin

2018-03-29 Thread Edoardo Comar
congratulations Dong!
--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN



From:   Hu Xi 
To: "us...@kafka.apache.org" , 
"dev@kafka.apache.org" 
Date:   29/03/2018 00:04
Subject:答复: [ANNOUNCE] New Committer: Dong Lin



Congrats, Dong Lin!



发件人: Matthias J. Sax 
发送时间: 2018年3月29日 6:37
收件人: us...@kafka.apache.org; dev@kafka.apache.org
主题: Re: [ANNOUNCE] New Committer: Dong Lin

Congrats!

On 3/28/18 1:16 PM, James Cheng wrote:
> Congrats, Dong!
>
> -James
>
>> On Mar 28, 2018, at 10:58 AM, Becket Qin  wrote:
>>
>> Hello everyone,
>>
>> The PMC of Apache Kafka is pleased to announce that Dong Lin has 
accepted
>> our invitation to be a new Kafka committer.
>>
>> Dong started working on Kafka about four years ago, since which he has
>> contributed numerous features and patches. His work on Kafka core has 
been
>> consistent and important. Among his contributions, most noticeably, 
Dong
>> developed JBOD (KIP-112, KIP-113) to handle disk failures and to reduce
>> overall cost, added deleteDataBefore() API (KIP-107) to allow users
>> actively remove old messages. Dong has also been active in the 
community,
>> participating in KIP discussions and doing code reviews.
>>
>> Congratulations and looking forward to your future contribution, Dong!
>>
>> Jiangjie (Becket) Qin, on behalf of Apache Kafka PMC
>




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Build failed in Jenkins: kafka-trunk-jdk8 #2513

2018-03-29 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] HOTFIX: ignoring tests using old versions of Streams until KIP-268 is

--
[...truncated 3.53 MB...]

kafka.zookeeper.ZooKeeperClientTest > testZNodeChildChangeHandlerForChildChange 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChildChangeHandlerForChildChange 
PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline STARTED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry STARTED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics PASSED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion STARTED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest > testNoOpAction PASSED

kafka.network.SocketServerTest > simpleRequest STARTED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > closingChannelException STARTED

kafka.network.SocketServerTest > closingChannelException PASSED

kafka.network.SocketServerTest > testIdleConnection STARTED

kafka.network.SocketServerTest > testIdleConnection PASSED

kafka.network.SocketServerTest > 
testClientDisconnectionWithStagedReceivesFullyProcessed STARTED

kafka.network.SocketServerTest > 
testClientDisconnectionWithStagedReceivesFullyProcessed PASSED

kafka.network.SocketServerTest > testMetricCollectionAfterShutdown STARTED

kafka.network.SocketServerTest > testMetricCollectionAfterShutdown PASSED

kafka.network.SocketServerTest > testSessionPrincipal STARTED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > configureNewConnectionException STARTED

kafka.network.SocketServerTest > configureNewConnectionException PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > processNewResponseException STARTED

kafka.network.SocketServerTest >