Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-10-29 Thread Boyang Chen
Btw, I updated KIP 345 based on my understanding. Feel free to take another 
round of look:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances

KIP-345: Introduce static membership protocol to reduce 
...
cwiki.apache.org
For stateful applications, one of the biggest performance bottleneck is the 
state shuffling. In Kafka consumer, there is a concept called "rebalance" which 
means that for given M partitions and N consumers in one consumer group, Kafka 
will try to balance the load between consumers and ideally have ...






From: Boyang Chen 
Sent: Monday, October 29, 2018 12:34 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
specifying member id

Thanks everyone for the input on this thread! (Sorry it's been a while) I feel 
that we are very close to the final solution.


Hey Jason and Mike, I have two quick questions on the new features here:

  1.  so our proposal is that until we add a new static member into the group 
(scale up), we will not trigger rebalance until the "registration timeout"( the 
member has been offline for too long)? How about leader's rejoin request, I 
think we should still trigger rebalance when that happens, since the consumer 
group may have new topics to consume?
  2.  I'm not very clear on the scale up scenario in static membership here. 
Should we fallback to dynamic membership while adding/removing hosts (by 
setting member.name = null), or we still want to add instances with 
`member.name` so that we eventually expand/shrink the static membership? I 
personally feel the easier solution is to spin up new members and wait until 
either the same "registration timeout" or a "scale up timeout" before starting 
the rebalance. What do you think?

Meanwhile I will go ahead to make changes to the KIP with our newly discussed 
items and details. Really excited to see the design has become more solid.

Best,
Boyang


From: Jason Gustafson 
Sent: Saturday, August 25, 2018 6:04 AM
To: dev
Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
specifying member id

Hey Mike,

Yeah, that's a good point. A long "registration timeout" may not be a great
idea. Perhaps in practice you'd set it long enough to be able to detect a
failure and provision a new instance. Maybe on the order of 10 minutes is
more reasonable.

In any case, it's probably a good idea to have an administrative way to
force deregistration. One option is to extend the DeleteGroups API with a
list of members names.

-Jason



On Fri, Aug 24, 2018 at 2:21 PM, Mike Freyberger 
wrote:

> Jason,
>
> Regarding step 4 in your proposal which suggests beginning a long timer
> (30 minutes) when a static member leaves the group, would there also be the
> ability for an admin to force a static membership expiration?
>
> I’m thinking that during particular types of outages or upgrades users
> would want forcefully remove a static member from the group.
>
> So the user would shut the consumer down normally, which wouldn’t trigger
> a rebalance. Then the user could use an admin CLI tool to force remove that
> consumer from the group, so the TopicPartitions that were previously owned
> by that consumer can be released.
>
> At a high level, we need consumer groups to gracefully handle intermittent
> failures and permanent failures. Currently, the consumer group protocol
> handles permanent failures well, but does not handle intermittent failures
> well (it creates unnecessary rebalances). I want to make sure the overall
> solution here handles both intermittent failures and permanent failures,
> rather than sacrificing support for permanent failures in order to provide
> support for intermittent failures.
>
> Mike
>
> Sent from my iPhone
>
> > On Aug 24, 2018, at 3:03 PM, Jason Gustafson  wrote:
> >
> > Hey Guozhang,
> >
> > Responses below:
> >
> > Originally I was trying to kill more birds with one stone with KIP-345,
> >> e.g. to fix the multi-rebalance issue on starting up / shutting down a
> >> multi-instance client (mentioned as case 1)/2) in my early email), and
> >> hence proposing to have a pure static-membership protocol. But thinking
> >> twice about it I now feel it may be too ambitious and worth fixing in
> >> another KIP.
> >
> >
> > I was considering an extension to support pre-initialization of the
> static
> > members of the group, but I agree we should probably leave this problem
> for
> > future work.
> >
> > 1. How this longish static member expiration timeout defined? Is it via a
> >> broker, hence global config, or via a client config which can be
> >> communicated to broker via JoinGroupRequest?
> >
> >
> > I am not too sure. I tend to lean toward server-side configs because 

Re: [DISCUSS] KIP-370: Remove Orphan Partitions

2018-10-29 Thread xiongqi wu
Thanks Dong.

I have updated the KIP.
Instead of using a configure to specify the timeout, I switch it to use
internal timer.  User doesn't need a new configuration to use this feature.

Xiongqi (Wesley) Wu


On Mon, Oct 29, 2018 at 4:40 PM xiongqi wu  wrote:

> Dong,
>
> Thanks for the comments.
>
> 1) With KIP-380, in theory we don't need the timeout phase.
> However, once orphan partitions are removed, they cannot be recovered.
> The question is should we rely on the fact that the first leaderandISR
> always contains correct information.
>
> For retention enabled topic,  the deletion phase (step 3 in this KIP) will
> protect against deletion of new segments.
> For log compaction topic,  since log segments can be relative old, delete
> phase might delete useful segments if by any chance first leaderandISR is
> incorrect.
>
> Here is the different with/without timeout phase:
> Solution 1: without timeout phase,  we rely on the first leaderandISR and
> understand that if first leaderandISR is incorrect, we might loss data.  We
> don't protect against bug.
> Solution 2:  with timeout phase,   we rely on the fact that, during
> timeout period, there is at least one valid leaderandISR for any given
> partition hosted by the broker.
> With the complexity of adding a timeout configuration.
>
> The solution 2 is a more safer option that comes with the cost of timeout
> configuration.
> *What is your opinion on these two solutions?*
>
>
> For your second comment:
>
> I will change the metric description. Thanks for pointing out the right
> metric format.
>
>
> Xiongqi (Wesley) Wu
>
>
> On Sun, Oct 28, 2018 at 9:39 PM Dong Lin  wrote:
>
>> Hey Xiongqi,
>>
>> Thanks for the KIP. Here are some comments:
>>
>> 1) KIP provides two motivation for the timeout/correction phase. One
>> motivation is to handle outdated requests. Would this still be an issue
>> after KIP-380? The second motivation seems to be mainly for performance
>> optimization when there is reassignment. In general we expect data
>> movement
>> when we reassign partitions to new brokers. So this is probably not a
>> strong reason for adding a new config.
>>
>> 2) The KIP says "Adding metrics to keep track of the number of orphan
>> partitions and the size of these orphan partitions". Can you add the
>> specification of these new metrics? Here are example doc in
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>> .
>>
>> Thanks,
>> Dong
>>
>> On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu  wrote:
>>
>> > Colin,
>> >
>> > Thanks for the comment.
>> > 1)
>> > auto.orphan.partition.removal.delay.ms refers to timeout since the
>> first
>> > leader and ISR request was received.  The idea is we want to wait enough
>> > time to receive up-to-dated leaderandISR request and any old or new
>> > partitions reassignment requests.
>> >
>> > 2)
>> > Is there any logic to remove the partition folders on disk?  I can only
>> > find references to removing older log segments, but not the folder, in
>> the
>> > KIP.
>> > ==> yes, the plan is to remove partition folders as well.
>> >
>> > I will update the KIP to make it more clear.
>> >
>> >
>> > Xiongqi (Wesley) Wu
>> >
>> >
>> > On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe 
>> wrote:
>> >
>> > > Hi Xiongqi,
>> > >
>> > > Thanks for the KIP.
>> > >
>> > > Can you be a bit more clear what the timeout
>> > > auto.orphan.partition.removal.delay.ms refers to?  Is the timeout
>> > > measured since the partition was supposed to be on the broker?  Or is
>> the
>> > > timeout measured since the broker started up?
>> > >
>> > > Is there any logic to remove the partition folders on disk?  I can
>> only
>> > > find references to removing older log segments, but not the folder, in
>> > the
>> > > KIP.
>> > >
>> > > best,
>> > > Colin
>> > >
>> > > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote:
>> > > > Any comments?
>> > > >
>> > > > Xiongqi (Wesley) Wu
>> > > >
>> > > >
>> > > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu 
>> > wrote:
>> > > >
>> > > > > Here is the implementation for the KIP 370.
>> > > > >
>> > > > >
>> > > > >
>> > >
>> >
>> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9
>> > > > >
>> > > > >
>> > > > > The purpose is to do one time cleanup (after a configured delay)
>> of
>> > > orphan
>> > > > > partitions when a broker starts up.
>> > > > >
>> > > > >
>> > > > > Xiongqi (Wesley) Wu
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu 
>> > > wrote:
>> > > > >
>> > > > >>
>> > > > >> This KIP enables broker to remove orphan partitions
>> automatically.
>> > > > >>
>> > > > >>
>> > > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions
>> > > > >>
>> > > > >>
>> > > > >> Xiongqi (Wesley) Wu
>> > > > >>
>> > > > >
>> > >
>> >
>>
>


Re: [VOTE] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-29 Thread Jun Rao
Hi, Patrick,

Thanks for the updated KIP. +1

Jun

On Wed, Oct 24, 2018 at 4:52 PM, Patrick Huang  wrote:

> Hi Jun,
>
> Sure. I already updated the KIP. Thanks!
>
> Best,
> Zhanxiang (Patrick) Huang
>
> --
> *From:* Jun Rao 
> *Sent:* Wednesday, October 24, 2018 14:17
> *To:* dev
> *Subject:* Re: [VOTE] KIP-380: Detect outdated control requests and
> bounced brokers using broker generation
>
> Hi, Patrick,
>
> Could you update the KIP with the changes to ControlledShutdownRequest
> based on the discussion thread?
>
> Thanks,
>
> Jun
>
>
> On Sun, Oct 21, 2018 at 2:25 PM, Mickael Maison 
> wrote:
>
> > +1( non-binding)
> > Thanks for the KIP!
> >
> > On Sun, Oct 21, 2018, 03:31 Harsha Chintalapani  wrote:
> >
> > > +1(binding). LGTM.
> > > -Harsha
> > > On Oct 20, 2018, 4:49 PM -0700, Dong Lin , wrote:
> > > > Thanks much for the KIP Patrick. Looks pretty good.
> > > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Oct 19, 2018 at 10:17 AM Patrick Huang 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to call for a vote on KIP-380:
> > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 380%3A+Detect+outdated+control+requests+and+bounced+
> brokers+using+broker+
> > generation
> > > > >
> > > > > Here is the discussion thread:
> > > > >
> > > > >
> > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3E
> > > > > KIP-380
> > > > > <
> > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3EKIP-380
> > > >:
> > > > > Detect outdated control requests and bounced ...<
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 380%3A+Detect+outdated+control+requests+and+bounced+
> brokers+using+broker+
> > generation
> > > > > >
> > > > > Note: Normalizing the schema is a good-to-have optimization because
> > the
> > > > > memory footprint for the control requests hinders the controller
> from
> > > > > scaling up if we have many topics with large partition counts.
> > > > > cwiki.apache.org
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Zhanxiang (Patrick) Huang
> > > > >
> > >
> >
>


Re: [DISCUSS] KIP-370: Remove Orphan Partitions

2018-10-29 Thread xiongqi wu
Dong,

Thanks for the comments.

1) With KIP-380, in theory we don't need the timeout phase.
However, once orphan partitions are removed, they cannot be recovered.
The question is should we rely on the fact that the first leaderandISR
always contains correct information.

For retention enabled topic,  the deletion phase (step 3 in this KIP) will
protect against deletion of new segments.
For log compaction topic,  since log segments can be relative old, delete
phase might delete useful segments if by any chance first leaderandISR is
incorrect.

Here is the different with/without timeout phase:
Solution 1: without timeout phase,  we rely on the first leaderandISR and
understand that if first leaderandISR is incorrect, we might loss data.  We
don't protect against bug.
Solution 2:  with timeout phase,   we rely on the fact that, during timeout
period, there is at least one valid leaderandISR for any given partition
hosted by the broker.
With the complexity of adding a timeout configuration.

The solution 2 is a more safer option that comes with the cost of timeout
configuration.
*What is your opinion on these two solutions?*


For your second comment:

I will change the metric description. Thanks for pointing out the right
metric format.


Xiongqi (Wesley) Wu


On Sun, Oct 28, 2018 at 9:39 PM Dong Lin  wrote:

> Hey Xiongqi,
>
> Thanks for the KIP. Here are some comments:
>
> 1) KIP provides two motivation for the timeout/correction phase. One
> motivation is to handle outdated requests. Would this still be an issue
> after KIP-380? The second motivation seems to be mainly for performance
> optimization when there is reassignment. In general we expect data movement
> when we reassign partitions to new brokers. So this is probably not a
> strong reason for adding a new config.
>
> 2) The KIP says "Adding metrics to keep track of the number of orphan
> partitions and the size of these orphan partitions". Can you add the
> specification of these new metrics? Here are example doc in
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
> .
>
> Thanks,
> Dong
>
> On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu  wrote:
>
> > Colin,
> >
> > Thanks for the comment.
> > 1)
> > auto.orphan.partition.removal.delay.ms refers to timeout since the first
> > leader and ISR request was received.  The idea is we want to wait enough
> > time to receive up-to-dated leaderandISR request and any old or new
> > partitions reassignment requests.
> >
> > 2)
> > Is there any logic to remove the partition folders on disk?  I can only
> > find references to removing older log segments, but not the folder, in
> the
> > KIP.
> > ==> yes, the plan is to remove partition folders as well.
> >
> > I will update the KIP to make it more clear.
> >
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe  wrote:
> >
> > > Hi Xiongqi,
> > >
> > > Thanks for the KIP.
> > >
> > > Can you be a bit more clear what the timeout
> > > auto.orphan.partition.removal.delay.ms refers to?  Is the timeout
> > > measured since the partition was supposed to be on the broker?  Or is
> the
> > > timeout measured since the broker started up?
> > >
> > > Is there any logic to remove the partition folders on disk?  I can only
> > > find references to removing older log segments, but not the folder, in
> > the
> > > KIP.
> > >
> > > best,
> > > Colin
> > >
> > > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote:
> > > > Any comments?
> > > >
> > > > Xiongqi (Wesley) Wu
> > > >
> > > >
> > > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu 
> > wrote:
> > > >
> > > > > Here is the implementation for the KIP 370.
> > > > >
> > > > >
> > > > >
> > >
> >
> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9
> > > > >
> > > > >
> > > > > The purpose is to do one time cleanup (after a configured delay) of
> > > orphan
> > > > > partitions when a broker starts up.
> > > > >
> > > > >
> > > > > Xiongqi (Wesley) Wu
> > > > >
> > > > >
> > > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu 
> > > wrote:
> > > > >
> > > > >>
> > > > >> This KIP enables broker to remove orphan partitions automatically.
> > > > >>
> > > > >>
> > > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions
> > > > >>
> > > > >>
> > > > >> Xiongqi (Wesley) Wu
> > > > >>
> > > > >
> > >
> >
>


[jira] [Created] (KAFKA-7568) Return leader epoch in ListOffsets responses

2018-10-29 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7568:
--

 Summary: Return leader epoch in ListOffsets responses
 Key: KAFKA-7568
 URL: https://issues.apache.org/jira/browse/KAFKA-7568
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


This is part of KIP-320. The changes to the API have already been made, but 
currently we return unknown epoch. We need to update the logic to search for 
the epoch corresponding to a fetched offset in the leader epoch cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7567) Clean up internal metadata usage for consistency and extensibility

2018-10-29 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7567:
--

 Summary: Clean up internal metadata usage for consistency and 
extensibility
 Key: KAFKA-7567
 URL: https://issues.apache.org/jira/browse/KAFKA-7567
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


This refactor has two objectives to improve metadata handling logic and testing:

1. We want to reduce dependence on the public object `Cluster` for internal 
metadata propagation since it is not easy to evolve. As an example, we need to 
propagate leader epochs from the metadata response to `Metadata`, but it is not 
straightforward to do this without exposing it in `PartitionInfo` since that is 
what `Cluster` uses internally. By doing this change, we are able to remove 
some redundant `Cluster` building logic. 

2. We want to make the metadata handling in `MockClient` simpler and more 
consistent. Currently we have mix of metadata update mechanisms which are 
internally inconsistent with each other and also do not match the 
implementation in `NetworkClient`.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-10-29 Thread xiongqi wu
Hi Dong,
I have updated the KIP to address your comments.
One correction to previous Email:
after offline discussion with Dong,  we decide to use MAX_LONG as default
value for max.compaction.lag.ms.


Xiongqi (Wesley) Wu


On Mon, Oct 29, 2018 at 12:15 PM xiongqi wu  wrote:

> Hi Dong,
>
> Thank you for your comment.  See my inline comments.
> I will update the KIP shortly.
>
> Xiongqi (Wesley) Wu
>
>
> On Sun, Oct 28, 2018 at 9:17 PM Dong Lin  wrote:
>
>> Hey Xiongqi,
>>
>> Sorry for late reply. I have some comments below:
>>
>> 1) As discussed earlier in the email list, if the topic is configured with
>> both deletion and compaction, in some cases messages produced a long time
>> ago can not be deleted based on time. This is a valid use-case because we
>> actually have topic which is configured with both deletion and compaction
>> policy. And we should enforce the semantics for both policy. Solution A
>> sounds good. We do not need interface change (e.g. extra config) to
>> enforce
>> solution A. All we need is to update implementation so that when broker
>> compacts a topic, if the message has timestamp (which is the common case),
>> messages that are too old (based on the time-based retention config) will
>> be discarded. Since this is a valid issue and it is also related to the
>> guarantee of when a message can be deleted, can we include the solution of
>> this problem in the KIP?
>>
> ==  This makes sense.  We can use similar approach to increase the log
> start offset.
>
>>
>> 2) It is probably OK to assume that all messages have timestamp. The
>> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and
>> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer
>> supported. Also, since the use-case for this feature is primarily for
>> GDPR,
>> we can assume that client library has already been upgraded to support
>> SSL,
>> which feature is added after KIP-31 and KIP-32.
>>
>>  =>  Ok. We can use message timestamp to delete expired records
> if both compaction and retention are enabled.
>
>
> 3) In Proposed Change section 2.a, it is said that segment.largestTimestamp
>> - maxSegmentMs can be used to determine the timestamp of the earliest
>> message. Would it be simpler to just use the create time of the file to
>> determine the time?
>>
>> >  Linux/Java doesn't provide API for file creation time because
> some filesystem type doesn't provide file creation time.
>
>
>> 4) The KIP suggests to use must-clean-ratio to select the partition to be
>> compacted. Unlike dirty ratio which is mostly for performance, the logs
>> whose "must-clean-ratio" is non-zero must be compacted immediately for
>> correctness reason (and for GDPR). And if this can no be achieved because
>> e.g. broker compaction throughput is too low, investigation will be
>> needed.
>> So it seems simpler to first compact logs which has segment whose earliest
>> timetamp is earlier than now - max.compaction.lag.ms, instead of defining
>> must-clean-ratio and sorting logs based on this value.
>>
>>
> ==>  Good suggestion. This can simply the implementation quite a bit
> if we are not too concerned about compaction of GDPR required partition
> queued behind some large partition.  The actual compaction completion time
> is not guaranteed anyway.
>
>
>> 5) The KIP says max.compaction.lag.ms is 0 by default and it is also
>> suggested that 0 means disable. Should we set this value to MAX_LONG by
>> default to effectively disable the feature added in this KIP?
>>
>> > I would rather use 0 so the corresponding code path will not be
> exercised.  By using MAX_LONG, we would theoretically go through related
> code to find out whether the partition is required to be compacted to
> satisfy MAX_LONG.
>
> 6) It is probably cleaner and readable not to include in Public Interface
>> section those configs whose meaning is not changed.
>>
>> > I will clean that up.
>
> 7) The goal of this KIP is to ensure that log segment whose earliest
>> message is earlier than a given threshold will be compacted. This goal may
>> not be achieved if the compact throughput can not catchup with the total
>> bytes-in-rate for the compacted topics on the broker. Thus we need an easy
>> way to tell operator whether this goal is achieved. If we don't already
>> have such metric, maybe we can include metrics to show 1) the total number
>> of log segments (or logs) which needs to be immediately compacted as
>> determined by max.compaction.lag; and 2) the maximum value of now -
>> earliest_time_stamp_of_segment among all segments that needs to be
>> compacted.
>>
>> ===> good suggestion.  I will update KIP for these metrics.
>
> 8) The Performance Impact suggests user to use the existing metrics to
>> monitor the performance impact of this KIP. It i useful to list mean of
>> each jmx metrics that we want user to monitor, and possibly explain how to
>> interpret the value of these metrics to determine whether 

Build failed in Jenkins: kafka-trunk-jdk11 #70

2018-10-29 Thread Apache Jenkins Server
See 


Changes:

[colin] KAFKA-7515: Trogdor - Add Consumer Group Benchmark Specification (#5810)

--
[...truncated 2.37 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

[jira] [Created] (KAFKA-7566) Add sidecar job to leader (or a random single follower) only

2018-10-29 Thread Boyang Chen (JIRA)
Boyang Chen created KAFKA-7566:
--

 Summary: Add sidecar job to leader (or a random single follower) 
only
 Key: KAFKA-7566
 URL: https://issues.apache.org/jira/browse/KAFKA-7566
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen


Hey there,

recently we need to add an archive job to a streaming application. The caveat 
is that we need to make sure only one instance is doing this task to avoid 
potential race condition, and we also don't want to schedule it as a regular 
stream task so that we will be blocking normal streaming operation. 

Although we could do so by doing a zk lease, I'm raising the case here since 
this could be some potential use case for streaming job also. For example, 
there are some `leader specific` operation we could schedule in DSL instead of 
adhoc manner.

Let me know if you think this makes sense to you, thank you!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-10-29 Thread xiongqi wu
Hi Dong,

Thank you for your comment.  See my inline comments.
I will update the KIP shortly.

Xiongqi (Wesley) Wu


On Sun, Oct 28, 2018 at 9:17 PM Dong Lin  wrote:

> Hey Xiongqi,
>
> Sorry for late reply. I have some comments below:
>
> 1) As discussed earlier in the email list, if the topic is configured with
> both deletion and compaction, in some cases messages produced a long time
> ago can not be deleted based on time. This is a valid use-case because we
> actually have topic which is configured with both deletion and compaction
> policy. And we should enforce the semantics for both policy. Solution A
> sounds good. We do not need interface change (e.g. extra config) to enforce
> solution A. All we need is to update implementation so that when broker
> compacts a topic, if the message has timestamp (which is the common case),
> messages that are too old (based on the time-based retention config) will
> be discarded. Since this is a valid issue and it is also related to the
> guarantee of when a message can be deleted, can we include the solution of
> this problem in the KIP?
>
==  This makes sense.  We can use similar approach to increase the log
start offset.

>
> 2) It is probably OK to assume that all messages have timestamp. The
> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and
> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer
> supported. Also, since the use-case for this feature is primarily for GDPR,
> we can assume that client library has already been upgraded to support SSL,
> which feature is added after KIP-31 and KIP-32.
>
>  =>  Ok. We can use message timestamp to delete expired records if
both compaction and retention are enabled.


3) In Proposed Change section 2.a, it is said that segment.largestTimestamp
> - maxSegmentMs can be used to determine the timestamp of the earliest
> message. Would it be simpler to just use the create time of the file to
> determine the time?
>
> >  Linux/Java doesn't provide API for file creation time because
some filesystem type doesn't provide file creation time.


> 4) The KIP suggests to use must-clean-ratio to select the partition to be
> compacted. Unlike dirty ratio which is mostly for performance, the logs
> whose "must-clean-ratio" is non-zero must be compacted immediately for
> correctness reason (and for GDPR). And if this can no be achieved because
> e.g. broker compaction throughput is too low, investigation will be needed.
> So it seems simpler to first compact logs which has segment whose earliest
> timetamp is earlier than now - max.compaction.lag.ms, instead of defining
> must-clean-ratio and sorting logs based on this value.
>
>
==>  Good suggestion. This can simply the implementation quite a bit if
we are not too concerned about compaction of GDPR required partition queued
behind some large partition.  The actual compaction completion time is not
guaranteed anyway.


> 5) The KIP says max.compaction.lag.ms is 0 by default and it is also
> suggested that 0 means disable. Should we set this value to MAX_LONG by
> default to effectively disable the feature added in this KIP?
>
> > I would rather use 0 so the corresponding code path will not be
exercised.  By using MAX_LONG, we would theoretically go through related
code to find out whether the partition is required to be compacted to
satisfy MAX_LONG.

6) It is probably cleaner and readable not to include in Public Interface
> section those configs whose meaning is not changed.
>
> > I will clean that up.

7) The goal of this KIP is to ensure that log segment whose earliest
> message is earlier than a given threshold will be compacted. This goal may
> not be achieved if the compact throughput can not catchup with the total
> bytes-in-rate for the compacted topics on the broker. Thus we need an easy
> way to tell operator whether this goal is achieved. If we don't already
> have such metric, maybe we can include metrics to show 1) the total number
> of log segments (or logs) which needs to be immediately compacted as
> determined by max.compaction.lag; and 2) the maximum value of now -
> earliest_time_stamp_of_segment among all segments that needs to be
> compacted.
>
> ===> good suggestion.  I will update KIP for these metrics.

8) The Performance Impact suggests user to use the existing metrics to
> monitor the performance impact of this KIP. It i useful to list mean of
> each jmx metrics that we want user to monitor, and possibly explain how to
> interpret the value of these metrics to determine whether there is
> performance issue.
>
> =>  I will update the KIP.

> Thanks,
> Dong
>
> On Tue, Oct 16, 2018 at 10:53 AM xiongqi wu  wrote:
>
> > Mayuresh,
> >
> > Thanks for the comments.
> > The requirement is that we need to pick up segments that are older than
> > maxCompactionLagMs for compaction.
> > maxCompactionLagMs is an upper-bound, which implies that picking up
> > segments for compaction 

[jira] [Created] (KAFKA-7565) NPE in KafkaConsumer

2018-10-29 Thread Alexey Vakhrenev (JIRA)
Alexey Vakhrenev created KAFKA-7565:
---

 Summary: NPE in KafkaConsumer
 Key: KAFKA-7565
 URL: https://issues.apache.org/jira/browse/KAFKA-7565
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 1.1.1
Reporter: Alexey Vakhrenev


The stacktrace is
{noformat}
java.lang.NullPointerException
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:221)
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:202)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:563)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:390)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:244)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1171)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
{noformat}

Couldn't find minimal reproducer, but it happens quite often in our system. We 
use {{pause()}} and {{wakeup()}} methods quite extensively, maybe it is somehow 
related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] 2.0.1 RC0

2018-10-29 Thread Manikumar
Hi Eno,

This looks like an existing issue occuring only on source artifacts.  We
are able to generate aggregate docs on cloned repo.
I am getting similar error on previous release and 2.1.0 RC0 src artifacts.
maybe related to gradle task ordering.
I will look into it and try to fix it on trunk.

Similar issue reported here:
https://jira.apache.org/jira/browse/KAFKA-6500

Thanks,


On Mon, Oct 29, 2018 at 5:28 PM Eno Thereska  wrote:

> Thanks. Tested basic building and running of unit and integration tests.
> They work.
> Tested docs. The following fails. Is it a known issue?
>
> "
> ./gradlew aggregatedJavadoc
> with info:
> > Configure project :
> Building project 'core' with Scala version 2.11.12
> Building project 'streams-scala' with Scala version 2.11.12
>
> > Task :aggregatedJavadoc FAILED
>
> FAILURE: Build failed with an exception.
>
> * What went wrong:
> A problem was found with the configuration of task ':aggregatedJavadoc'.
> > No value has been specified for property 'outputDirectory'.
>
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or
> --debug option to get more log output. Run with --scan to get full
> insights.
>
> * Get more help at https://help.gradle.org
>
> BUILD FAILED in 3s
> "
> Eno
>
> On Fri, Oct 26, 2018 at 3:29 AM Manikumar 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 2.0.1.
> >
> > This is a bug fix release closing 49 tickets:
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> >
> > Release notes for the 2.0.1 release:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by  Tuesday, October 30, end of day
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> >
> > * Documentation:
> > http://kafka.apache.org/20/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/20/protocol.html
> >
> > * Successful Jenkins builds for the 2.0 branch:
> > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.0-jdk8/177/
> >
> > /**
> >
> > Thanks,
> > Manikumar
> >
>


Re: [VOTE] 2.0.1 RC0

2018-10-29 Thread Eno Thereska
Thanks. Tested basic building and running of unit and integration tests.
They work.
Tested docs. The following fails. Is it a known issue?

"
./gradlew aggregatedJavadoc
with info:
> Configure project :
Building project 'core' with Scala version 2.11.12
Building project 'streams-scala' with Scala version 2.11.12

> Task :aggregatedJavadoc FAILED

FAILURE: Build failed with an exception.

* What went wrong:
A problem was found with the configuration of task ':aggregatedJavadoc'.
> No value has been specified for property 'outputDirectory'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 3s
"
Eno

On Fri, Oct 26, 2018 at 3:29 AM Manikumar  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 2.0.1.
>
> This is a bug fix release closing 49 tickets:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
>
> Release notes for the 2.0.1 release:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by  Tuesday, October 30, end of day
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> https://github.com/apache/kafka/releases/tag/2.0.1-rc0
>
> * Documentation:
> http://kafka.apache.org/20/documentation.html
>
> * Protocol:
> http://kafka.apache.org/20/protocol.html
>
> * Successful Jenkins builds for the 2.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/177/
>
> /**
>
> Thanks,
> Manikumar
>


[jira] [Created] (KAFKA-7564) Trogdor - Expose single task details from Trogdor Coordinator

2018-10-29 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7564:
--

 Summary: Trogdor - Expose single task details from Trogdor 
Coordinator
 Key: KAFKA-7564
 URL: https://issues.apache.org/jira/browse/KAFKA-7564
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


The only way to currently get the results from tasks ran in Trogdor is through 
listing all of them via the "--show-tasks" CLI command
{code:java}
./bin/trogdor.sh client --show-tasks localhost:8889 Got coordinator tasks: 
{      "tasks":{     "produce_bench_20462":{    "state":"DONE", 
 "spec":{   
"class":"org.apache.kafka.trogdor.workload.ProduceBenchSpec", 
"startMs":0, "durationMs":1000, 
"producerNode":"node0", "bootstrapServers":"localhost:9092",    
 "targetMessagesPerSec":10, "maxMessages":100, 
"keyGenerator":{      "type":"sequential",    "size":4, 
   "startOffset":0 }, "valueGenerator":{    
  "type":"constant",    "size":512,    
"value":"AAA="
 }, "totalTopics":10, "activeTopics":5, 
"topicPrefix":"foo", "replicationFactor":1, 
"classLoader":{   }, "numPartitions":1  },  
"startedMs":1523552769850,  "doneMs":1523552780878,  
"cancelled":false,  "status":{   "totalSent":500,   
  "averageLatencyMs":4.972, "p50LatencyMs":4, 
"p95LatencyMs":6, "p99LatencyMs":12  }   }    } }
{code}
This can prove inefficient and annoying if the Trogdor Coordinator is 
long-running and we only want to get the results from a specific task.
The current REST endpoint ("/tasks") for listing tasks enables filtering 
through StartTimeMs/EndTimeMs and supplying specific TaskIDs, but it would be 
cleaner if we had a specific endpoint for fetching a single task. That endpoint 
would also return a 404 in the case where no task was found instead of an empty 
response as the /tasks endpoint would.


I propose we expose a new "/tasks/:id" endpoint and a new cli command 
"--show-task TASK_ID"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-trunk-jdk8 #3171

2018-10-29 Thread Apache Jenkins Server
See 




Re: [VOTE] 2.1.0 RC0

2018-10-29 Thread Magnus Edenhill
+1 (non-binding)

passes librdkafka integration test suite

Den fre 26 okt. 2018 kl 15:58 skrev Manikumar :

> minor observation: config sections are empty in the documentation page.
> http://kafka.apache.org/21/documentation.html#producerconfigs
>
> On Wed, Oct 24, 2018 at 10:49 PM Ted Yu  wrote:
>
> > +1
> >
> > InternalTopicIntegrationTest failed during test suite run but passed with
> > rerun.
> >
> > On Wed, Oct 24, 2018 at 3:48 AM Andras Beni  > .invalid>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Verified signatures and checksums of release artifacts
> > > Performed quickstart steps on rc artifacts (both scala 2.11 and 2.12)
> and
> > > one built from tag 2.1.0-rc0
> > >
> > > Andras
> > >
> > > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the first candidate for feature release of Apache Kafka
> 2.1.0.
> > > >
> > > > This is a major version release of Apache Kafka. It includes 28 new
> > KIPs
> > > > and
> > > >
> > > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > > > details:
> > > >
> > > > *
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > > >
> > > >
> > > > Here are a few notable highlights:
> > > >
> > > > - Java 11 support
> > > > - Support for Zstandard, which achieves compression comparable to
> gzip
> > > with
> > > > higher compression and especially decompression speeds(KIP-110)
> > > > - Avoid expiring committed offsets for active consumer group
> (KIP-211)
> > > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > > - Kafka's replication protocol now supports improved fencing of
> > zombies.
> > > > Previously, under certain rare conditions, if a broker became
> > partitioned
> > > > from Zookeeper but not the rest of the cluster, then the logs of
> > > replicated
> > > > partitions could diverge and cause data loss in the worst case
> > (KIP-320)
> > > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> > KIP-356)
> > > > - Admin script and admin client API improvements to simplify admin
> > > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > > - DNS handling improvements (KIP-235, KIP-302)
> > > >
> > > > Release notes for the 2.1.0 release:
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote ***
> > > >
> > > > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> > > > https://github.com/apache/kafka/tree/2.1.0-rc0
> > > >
> > > > * Documentation:
> > > > *http://kafka.apache.org/21/documentation.html*
> > > > 
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/21/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 2.1 branch:
> > > > Unit/integration tests: *
> > > https://builds.apache.org/job/kafka-2.1-jdk8/38/
> > > > *
> > > >
> > > > Please test and verify the release artifacts and submit a vote for
> this
> > > RC,
> > > > or report any issues so we can fix them and get a new RC out ASAP.
> > > Although
> > > > this release vote requires PMC votes to pass, testing, votes, and bug
> > > > reports are valuable and appreciated from everyone.
> > > >
> > > > Cheers,
> > > > Dong
> > > >
> > >
> >
>


Jenkins build is back to normal : kafka-trunk-jdk11 #68

2018-10-29 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-7563) Single broker sends incorrect metadata for topic partitions

2018-10-29 Thread Martin Kamp Jensen (JIRA)
Martin Kamp Jensen created KAFKA-7563:
-

 Summary: Single broker sends incorrect metadata for topic 
partitions
 Key: KAFKA-7563
 URL: https://issues.apache.org/jira/browse/KAFKA-7563
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Martin Kamp Jensen
 Attachments: kafka.log, zookeeper.log

When starting our Kafka Streams application in a test setup with just one Kafka 
broker we are seeing the following error roughly 1 out of 15 runs:

{{StreamsException: Existing internal topic 
alarm-message-streams-alarm-from-unknown-asset-changelog has invalid 
partitions: expected: 32; actual: 25. Use 'kafka.tools.StreamsResetter' tool to 
clean up invalid topics before processing.}}

(Note: It is not always the same topic that causes the error.)

When we see the error above the actual number of partitions varies (expected is 
32, actual is above 0 and below 32).

Before each test run the Kafka broker is started without data (using 
[https://hub.docker.com/r/wurstmeister/kafka/]).

We have never seen this happen in non-test where we are running with 6 Kafka 
brokers. However, we are running a significantly higher number of test runs 
than deploys to non-test.

After some investigation (including using AdminClient to describe the topics 
when the Kafka Streams application got the StreamsException and confirming that 
AdminClient also reports that a topic has the wrong number of partitions!) we 
implemented the following workaround: When the Kafka Streams application fails 
with the exception, we stop the application, stop the Kafka broker, start the 
Kafka broker, and finally start the application. Then the exception is not 
thrown. Of course this does not explain or fix the real issue at hand but it is 
still important because we all hate flaky tests.

Kafka and ZooKeeper log files from a run where the exception above occurred and 
where applying the workaround described above enabled us to continue without 
the exception are attached.

This issue was created by request of Matthias J. Sax at 
https://stackoverflow.com/questions/52943653/existing-internal-topic-has-invalid-partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7562) Onemirrormaker synchronization problem in 0.11.0.2

2018-10-29 Thread salasming (JIRA)
salasming created KAFKA-7562:


 Summary: Onemirrormaker synchronization problem in 0.11.0.2
 Key: KAFKA-7562
 URL: https://issues.apache.org/jira/browse/KAFKA-7562
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 0.11.0.2
 Environment: Kafka 0.11.0.2
Reporter: salasming


In the process of using mirrormaker, I found that if "auto. create. topics. 
enable" was set to false in the mirrormaker configuration file "server. 
properties", the data synchronization would stop after a few minutes. 
But if "auto. create. topics. enable" was set to true, everything is normal.
My Kafka version is: 0.11.0.2

Does anyone know why?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)