[jira] [Resolved] (KAFKA-12407) Document omitted Controller Health Metrics

2021-03-04 Thread Dong Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-12407.
--
Resolution: Fixed

> Document omitted Controller Health Metrics
> --
>
> Key: KAFKA-12407
> URL: https://issues.apache.org/jira/browse/KAFKA-12407
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dongjin Lee
>Assignee: Dongjin Lee
>Priority: Major
>
> [KIP-237|https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics]
>  introduced 3 controller health metics like the following, but none of them 
> are documented.
>  * kafka.controller:type=ControllerEventManager,name=EventQueueSize
>  * kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs
>  * 
> kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs,brokerId=\{broker-id}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-16 Thread Dong Lin
Congratulations everyone!

On Tue, Jan 14, 2020 at 9:30 AM Gwen Shapira  wrote:

> Hi everyone,
>
> I'm happy to announce that Colin McCabe, Vahid Hashemian and Manikumar
> Reddy are now members of Apache Kafka PMC.
>
> Colin and Manikumar became committers on Sept 2018 and Vahid on Jan
> 2019. They all contributed many patches, code reviews and participated
> in many KIP discussions. We appreciate their contributions and are
> looking forward to many more to come.
>
> Congrats Colin, Vahid and Manikumar!
>
> Gwen, on behalf of Apache Kafka PMC
>


Re: [ANNOUNCE] New committer: John Roesler

2019-11-12 Thread Dong Lin
Congratulations John!

On Tue, Nov 12, 2019 at 1:56 PM Guozhang Wang  wrote:

> Hi Everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer, John
> Roesler.
>
> John has been contributing to Apache Kafka since early 2018. His main
> contributions are primarily around Kafka Streams, but have also included
> improving our test coverage beyond Streams as well. Besides his own code
> contributions, John has also actively participated on community discussions
> and reviews including several other contributors' big proposals like
> foreign-key join in Streams (KIP-213). He has also been writing, presenting
> and evangelizing Apache Kafka in many venues.
>
> Congratulations, John! And look forward to more collaborations with you on
> Apache Kafka.
>
>
> Guozhang, on behalf of the Apache Kafka PMC
>


[jira] [Resolved] (KAFKA-9016) Warn when log dir stopped serving replicas

2019-11-08 Thread Dong Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-9016.
-
Resolution: Fixed

> Warn when log dir stopped serving replicas
> --
>
> Key: KAFKA-9016
> URL: https://issues.apache.org/jira/browse/KAFKA-9016
> Project: Kafka
>  Issue Type: Improvement
>  Components: log, log cleaner
>Reporter: Viktor Somogyi-Vass
>Assignee: kumar uttpal
>Priority: Major
>  Labels: easy, newbie
>
> Kafka should warn if the log directory stops serving replicas as usually it 
> is due to an error and it's much visible if it's on warn level.
> {noformat}
> 2019-09-19 12:39:54,194 ERROR kafka.server.LogDirFailureChannel: Error while 
> writing to checkpoint file /kafka/data/diskX/replication-offset-checkpoint
> java.io.SyncFailedException: sync failed
> ..
> 2019-09-19 12:39:54,197 INFO kafka.server.ReplicaManager: [ReplicaManager 
> broker=638] Stopping serving replicas in dir /kafka/data/diskX
> ..
> 2019-09-19 12:39:54,205 INFO kafka.server.ReplicaFetcherManager: 
> [ReplicaFetcherManager on broker 638] Removed fetcher for partitions 
> Set(test1-0, test2-2, test-0, test2-2, test4-14, test4-0, test2-6)
> 2019-09-19 12:39:54,206 INFO kafka.server.ReplicaAlterLogDirsManager: 
> [ReplicaAlterLogDirsManager on broker 638] Removed fetcher for partitions 
> Set(test1-0, test2-2, test-0, test3-2, test4-14, test4-0, test2-6)
> 2019-09-19 12:39:54,222 INFO kafka.server.ReplicaManager: [ReplicaManager 
> broker=638] Broker 638 stopped fetcher for partitions 
> test1-0,test2-2,test-0,test3-2,test4-14,test4-0,test2-6 and stopped moving 
> logs for partitions  because they are in the failed log directory 
> /kafka/data/diskX.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New committer: Mickael Maison

2019-11-07 Thread Dong Lin
Congratulations Mickael!

On Thu, Nov 7, 2019 at 1:38 PM Jun Rao  wrote:

> Hi, Everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer
> Mickael
> Maison.
>
> Mickael has been contributing to Kafka since 2016. He proposed and
> implemented multiple KIPs. He has also been propomating Kafka through blogs
> and public talks.
>
> Congratulations, Mickael!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>


Re: [ANNOUNCE] New Kafka PMC member: Matthias J. Sax

2019-04-18 Thread Dong Lin
Congratulations Matthias!

Very well deserved!

On Thu, Apr 18, 2019 at 2:35 PM Guozhang Wang  wrote:

> Hello Everyone,
>
> I'm glad to announce that Matthias J. Sax is now a member of Kafka PMC.
>
> Matthias has been a committer since Jan. 2018, and since then he continued
> to be active in the community and made significant contributions the
> project.
>
>
> Congratulations to Matthias!
>
> -- Guozhang
>


Re: [ANNOUNCE] New Kafka PMC member: Sriharsh Chintalapan

2019-04-18 Thread Dong Lin
Congratulations Sriharsh!

On Thu, Apr 18, 2019 at 11:46 AM Jun Rao  wrote:

> Hi, Everyone,
>
> Sriharsh Chintalapan has been active in the Kafka community since he became
> a Kafka committer in 2015. I am glad to announce that Harsh is now a member
> of Kafka PMC.
>
> Congratulations, Harsh!
>
> Jun
>


Re: [VOTE] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-03-05 Thread Dong Lin
Hey Kevin,

Thanks for the KIP!

+1 (binding)

Thanks,
Dong

On Tue, Mar 5, 2019 at 9:38 AM Kevin Lu  wrote:

> Hi All,
>
> I would like to start the vote thread for KIP-427: Add AtMinIsr topic
> partition category (new metric & TopicCommand option).
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
>
> Thanks!
>
> Regards,
> Kevin
>


Re: [DISCUSS] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-02-28 Thread Dong Lin
 as the
> UnderReplicated metric, but better in some scenarios such as the ones
> listed above.
>
> Regards,
> Kevin
>
> On Thu, Feb 28, 2019 at 10:21 AM Harsha  wrote:
>
> > Hi Dong,
> >  I think AtMinIsr is still valuable to indicate cluster is at
> > a critical state and something needs to be done asap to restore.
> > To your example
> > " let's say min_isr = 1 and replica_set_size = 3, it is
> > > still possible that planned maintenance (e.g. one broker restart +
> > > partition reassignment) can cause isr size drop to 1. Since AtMinIsr
> can
> > > also cause fault positive (i.e. the fact that AtMinIsr > 0 does not
> > > necessarily need attention from user), "
> >
> > One broker restart shouldn't cause ISR to drop to 1 from 3 unless 2
> > partitions are co-located on the same broker.
> > This is still a valuable indicator to the admins that the partition
> > assignment needs to be moved.
> >
> > In our case, we run 4 replicas for critical topics with min.isr = 2 .
> URPs
> > are not really good indicator to take immediate action if one of the
> > replicas is down. If 2 replicas are down and we are at 2 alive replicas
> > this is stop everything to restore the cluster to a good state.
> >
> > Thanks,
> > Harsha
> >
> >
> >
> >
> >
> >
> > On Wed, Feb 27, 2019, at 11:17 PM, Dong Lin wrote:
> > > Hey Kevin,
> > >
> > > Thanks for the update.
> > >
> > > The KIP suggests that AtMinIsr is better than UnderReplicatedPartition
> as
> > > indicator for alerting. However, in most case where min_isr =
> > > replica_set_size - 1, these two metrics are exactly the same, where
> > planned
> > > maintenance can easily cause positive AtMinIsr value. In the other
> > > scenario, for example let's say min_isr = 1 and replica_set_size = 3,
> it
> > is
> > > still possible that planned maintenance (e.g. one broker restart +
> > > partition reassignment) can cause isr size drop to 1. Since AtMinIsr
> can
> > > also cause fault positive (i.e. the fact that AtMinIsr > 0 does not
> > > necessarily need attention from user), I am not sure it is worth to add
> > > this metric.
> > >
> > > In the Usage section, it is mentioned that user needs to manually check
> > > whether there is ongoing maintenance after AtMinIsr is triggered. Could
> > you
> > > explain how is this different from the current way where we use
> > > UnderReplicatedPartition to trigger alert? More specifically, can we
> just
> > > replace AtMinIsr with UnderReplicatedPartition in the Usage section?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > > On Tue, Feb 26, 2019 at 6:49 PM Kevin Lu 
> wrote:
> > >
> > > > Hi Dong!
> > > >
> > > > Thanks for the feedback!
> > > >
> > > > You bring up a good point in that the AtMinIsr metric cannot be used
> to
> > > > identify failure in the mentioned scenarios. I admit the motivation
> > section
> > > > placed too much emphasis on "identifying failure".
> > > >
> > > > I have modified the KIP to reflect the implementation as the AtMinIsr
> > > > metric is intended to serve as a warning as one more failure to a
> > partition
> > > > AtMinIsr will cause producers with acks=ALL configured to fail. It
> has
> > an
> > > > additional benefit when minIsr=1 as it will warn us that the entire
> > > > partition is at risk of going offline, but that is more of a side
> > effect
> > > > that only applies in that scenario (minIsr=1).
> > > >
> > > > Regards,
> > > > Kevin
> > > >
> > > > On Tue, Feb 26, 2019 at 5:11 PM Dong Lin 
> wrote:
> > > >
> > > > > Hey Kevin,
> > > > >
> > > > > Thanks for the proposal!
> > > > >
> > > > > It seems that the proposed implementation does not match the
> > motivation.
> > > > > The motivation suggests that the operator wants to tell the planned
> > > > > maintenance (e.g. broker restart) from unplanned failure (e.g.
> > network
> > > > > failure). But the use of the metric AtMinIsr does not really
> > > > differentiate
> > > > > between these causes of the reduced number of ISR. For example, an
> > > > > unplann

Re: [DISCUSS] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-02-27 Thread Dong Lin
Hey Kevin,

Thanks for the update.

The KIP suggests that AtMinIsr is better than UnderReplicatedPartition as
indicator for alerting. However, in most case where min_isr =
replica_set_size - 1, these two metrics are exactly the same, where planned
maintenance can easily cause positive AtMinIsr value. In the other
scenario, for example let's say min_isr = 1 and replica_set_size = 3, it is
still possible that planned maintenance (e.g. one broker restart +
partition reassignment) can cause isr size drop to 1. Since AtMinIsr can
also cause fault positive (i.e. the fact that AtMinIsr > 0 does not
necessarily need attention from user), I am not sure it is worth to add
this metric.

In the Usage section, it is mentioned that user needs to manually check
whether there is ongoing maintenance after AtMinIsr is triggered. Could you
explain how is this different from the current way where we use
UnderReplicatedPartition to trigger alert? More specifically, can we just
replace AtMinIsr with UnderReplicatedPartition in the Usage section?

Thanks,
Dong


On Tue, Feb 26, 2019 at 6:49 PM Kevin Lu  wrote:

> Hi Dong!
>
> Thanks for the feedback!
>
> You bring up a good point in that the AtMinIsr metric cannot be used to
> identify failure in the mentioned scenarios. I admit the motivation section
> placed too much emphasis on "identifying failure".
>
> I have modified the KIP to reflect the implementation as the AtMinIsr
> metric is intended to serve as a warning as one more failure to a partition
> AtMinIsr will cause producers with acks=ALL configured to fail. It has an
> additional benefit when minIsr=1 as it will warn us that the entire
> partition is at risk of going offline, but that is more of a side effect
> that only applies in that scenario (minIsr=1).
>
> Regards,
> Kevin
>
> On Tue, Feb 26, 2019 at 5:11 PM Dong Lin  wrote:
>
> > Hey Kevin,
> >
> > Thanks for the proposal!
> >
> > It seems that the proposed implementation does not match the motivation.
> > The motivation suggests that the operator wants to tell the planned
> > maintenance (e.g. broker restart) from unplanned failure (e.g. network
> > failure). But the use of the metric AtMinIsr does not really
> differentiate
> > between these causes of the reduced number of ISR. For example, an
> > unplanned failure can cause ISR to drop from 3 to 2 but it can still be
> > higher than the minIsr (say 1). And a planned maintenance can cause ISR
> to
> > drop from 3 to 2, which trigger the AtMinIsr metric if minIsr=2. Can you
> > update the design doc to fix or explain this issue?
> >
> > Thanks,
> > Dong
> >
> > On Tue, Feb 12, 2019 at 9:02 AM Kevin Lu  wrote:
> >
> > > Hi All,
> > >
> > > Getting the discussion thread started for KIP-427 in case anyone is
> free
> > > right now.
> > >
> > > I’d like to propose a new category of topic partitions *AtMinIsr* which
> > are
> > > partitions that only have the minimum number of in sync replicas left
> in
> > > the ISR set (as configured by min.insync.replicas).
> > >
> > > This would add two new metrics *ReplicaManager.AtMinIsrPartitionCount
> *&
> > > *Partition.AtMinIsr*, and a new TopicCommand option*
> > > --at-min-isr-partitions* to help in monitoring and alerting.
> > >
> > > KIP link: KIP-427: Add AtMinIsr topic partition category (new metric &
> > > TopicCommand option)
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> > > >
> > >
> > > Please take a look and let me know what you think.
> > >
> > > Regards,
> > > Kevin
> > >
> >
>


Re: [DISCUSS] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-02-26 Thread Dong Lin
Hey Kevin,

Thanks for the proposal!

It seems that the proposed implementation does not match the motivation.
The motivation suggests that the operator wants to tell the planned
maintenance (e.g. broker restart) from unplanned failure (e.g. network
failure). But the use of the metric AtMinIsr does not really differentiate
between these causes of the reduced number of ISR. For example, an
unplanned failure can cause ISR to drop from 3 to 2 but it can still be
higher than the minIsr (say 1). And a planned maintenance can cause ISR to
drop from 3 to 2, which trigger the AtMinIsr metric if minIsr=2. Can you
update the design doc to fix or explain this issue?

Thanks,
Dong

On Tue, Feb 12, 2019 at 9:02 AM Kevin Lu  wrote:

> Hi All,
>
> Getting the discussion thread started for KIP-427 in case anyone is free
> right now.
>
> I’d like to propose a new category of topic partitions *AtMinIsr* which are
> partitions that only have the minimum number of in sync replicas left in
> the ISR set (as configured by min.insync.replicas).
>
> This would add two new metrics *ReplicaManager.AtMinIsrPartitionCount *&
> *Partition.AtMinIsr*, and a new TopicCommand option*
> --at-min-isr-partitions* to help in monitoring and alerting.
>
> KIP link: KIP-427: Add AtMinIsr topic partition category (new metric &
> TopicCommand option)
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> >
>
> Please take a look and let me know what you think.
>
> Regards,
> Kevin
>


Re: [ANNOUNCE] New Committer: Randall Hauch

2019-02-15 Thread Dong Lin
Congratulations Randall!!

On Thu, Feb 14, 2019 at 6:17 PM Guozhang Wang  wrote:

> Hello all,
>
> The PMC of Apache Kafka is happy to announce another new committer joining
> the project today: we have invited Randall Hauch as a project committer and
> he has accepted.
>
> Randall has been participating in the Kafka community for the past 3 years,
> and is well known as the founder of the Debezium project, a popular project
> for database change-capture streams using Kafka (https://debezium.io).
> More
> recently he has become the main person keeping Kafka Connect moving
> forward, participated in nearly all KIP discussions and QAs on the mailing
> list. He's authored 6 KIPs and authored 50 pull requests and conducted over
> a hundred reviews around Kafka Connect, and has also been evangelizing
> Kafka Connect at several Kafka Summit venues.
>
>
> Thank you very much for your contributions to the Connect community Randall
> ! And looking forward to many more :)
>
>
> Guozhang, on behalf of the Apache Kafka PMC
>


Re: [ANNOUNCE] New Committer: Bill Bejeck

2019-02-15 Thread Dong Lin
Congratulations Bill!!

On Wed, Feb 13, 2019 at 5:03 PM Guozhang Wang  wrote:

> Hello all,
>
> The PMC of Apache Kafka is happy to announce that we've added Bill Bejeck
> as our newest project committer.
>
> Bill has been active in the Kafka community since 2015. He has made
> significant contributions to the Kafka Streams project with more than 100
> PRs and 4 authored KIPs, including the streams topology optimization
> framework. Bill's also very keen on tightening Kafka's unit test / system
> tests coverage, which is a great value to our project codebase.
>
> In addition, Bill has been very active in evangelizing Kafka for stream
> processing in the community. He has given several Kafka meetup talks in the
> past year, including a presentation at Kafka Summit SF. He's also authored
> a book about Kafka Streams (
> https://www.manning.com/books/kafka-streams-in-action), as well as various
> of posts in public venues like DZone as well as his personal blog (
> http://codingjunkie.net/).
>
> We really appreciate the contributions and are looking forward to see more
> from him. Congratulations, Bill !
>
>
> Guozhang, on behalf of the Apache Kafka PMC
>


Re: [ANNOUNCE] New Committer: Vahid Hashemian

2019-01-15 Thread Dong Lin
Congratulations Vahid!

On Tue, Jan 15, 2019 at 2:45 PM Jason Gustafson  wrote:

> Hi All,
>
> The PMC for Apache Kafka has invited Vahid Hashemian as a project
> committer and
> we are
> pleased to announce that he has accepted!
>
> Vahid has made numerous contributions to the Kafka community over the past
> few years. He has authored 13 KIPs with core improvements to the consumer
> and the tooling around it. He has also contributed nearly 100 patches
> affecting all parts of the codebase. Additionally, Vahid puts a lot of
> effort into community engagement, helping others on the mail lists and
> sharing his experience at conferences and meetups.
>
> We appreciate the contributions and we are looking forward to more.
> Congrats Vahid!
>
> Jason, on behalf of the Apache Kafka PMC
>


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2019-01-13 Thread Dong Lin
Hey Ryanne,

Sorry I am late here. Thanks much for all the work! After reading through
the latest KIP and all the previous discussion, I have some questions below:

1. Currently if there is topic created with "." in the topic name, would it
cause correctness issue for this KIP? For example, will consumer reads from
the topic that is not intended, and will API such
as RemoteClusterUtils.upstreamClusters(...) return wrong string as the
upstream cluster? If there is correctness issue which reduces the
usability, we probably should mark it in the Java doc and fix these issues
before considering the feature as ready and safe to use. Some future work
such as "propose a special separator character" has been suggested in the
email thread. Could we document these future work in the KIP if the work is
necessary for MM2 to be used reliably?

2. Suppose topic_1 is replicated from cluster A to cluster B. And we have a
pair of producer and consumer that produces and consumes from topic_1 in
cluster A. Let's say cluster A crashed and we need to migrate this pair of
consumer to use cluster B. According to the discussion in the email thread,
producer will producer to topic_1 in cluster B whereas Consumer will
consume from A.topic_1 in cluster B. It means that the message produced by
this producer will not longer be consumed by this consumer. Would this be
an issue?

3. Given that the "mechanism to migrate producers or consumers between
mirrored clusters" is listed as the motivation of this KIP, do we need to
specify in the KIP the migration procedure which we have discussed in depth
in the email thread? For example, user needs to additionally
call offsetsForTimes(...) after getting the timestamp from the checkpoint
for each.

4. A lot of class names (e.g. MirrorCheckpointConnector) is included in the
Public Interface section. Not sure if it is useful to specify the class
name without specifying the class API and its usage. My understanding is
that Public Interface section should include 1) minimum amount of the
information that user can read in order to use the feature provided by the
KIP; and 2) anything whose change incurs compatibility concern and thus KIP
discussion. So I am wondering whether we should include in the Public
Interface section 1) the class API for those classes listed in the section
(or remove from the KIP if we have not decided the class API); 2) the usage
of the new scripts (e.g. connect-mirror-maker.sh) with its arguments (e.g.
this

example); and 3) schema of the checkpoint topic and heartbeat topic which
is currently in the Proposed Change section.

5. It is said in the motivation section that the design includes new
metrics such as end-to-end replication latency across multiple data
centers/clusters. Usually we treat metrics as public interface. Could we
also specify these metrics (e.g. definition, type) in Public Interface
section similar to this

example.

6. The KIP says that the RemoteClusterUtils is "subject to change". Are we
going to have a future KIP to discuss and finalize these APIs before
committing any code that implements the API? If so, it may be cleaner to
specify this or even remove this class from the Public Interface section,
and specify the future KIP in this KIP.

7. If we are going to determine the API for RemoteClusterUtils in this KIP,
then I have one comment regarding the
RemoteClusterUtils.translateOffsets(...). If I understand the discussion in
the email thread correctly, this API reads the timestamp from the
checkpoint topic and return it to the caller. If so, it seems more
intuitive to call this e.g. checkpointTimestamp(...). And the doc probably
should say "Find the timestamp of the last committed messaged in the source
cluster...". Otherwise, could you briefly explain in the KIP how this API
is implemented.

8. It is said in the email discussion that targetClusterAlias is needed in
order to migrate consumer from the remote topic back to the source topic
(i.e. fallback). When a consumer is migrated from A.topic_1 in cluster B to
topic_1 in cluster A, how does consumer determine the start offset for
topic_1 in cluster A?

9. Does RemoteClusterUtils.upstreamClusters(...) return clusters that is no
longer an upstream cluster? If yes, it seems to reduced the usability of
the API if user only wants to know the current upstream cluster list. If
no, could you explain what is the semantics of the API (e.g. when an
upstream cluster is considered outdated) and how is this API implemented to
exclude outdated clusters. It may be useful to briefly explain the
implementation in the proposed change section for those APIs that are not
straightforward.

10. It is mentioned 

[jira] [Resolved] (KAFKA-5335) Controller should batch updatePartitionReassignmentData() operation

2018-12-13 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5335.
-
Resolution: Won't Do

> Controller should batch updatePartitionReassignmentData() operation
> ---
>
> Key: KAFKA-5335
> URL: https://issues.apache.org/jira/browse/KAFKA-5335
> Project: Kafka
>  Issue Type: Bug
>        Reporter: Dong Lin
>    Assignee: Dong Lin
>Priority: Major
>
> Currently controller will update partition reassignment data every time a 
> partition in the reassignment is completed. It means that if user specifies a 
> huge reassignment znode of size 1 MB to move 10K partitions, controller will 
> need to write roughly 0.5 MB * 1 = 5 GB data to zookeeper in order to 
> complete this reassignment. This is because controller needs to write the 
> remaining partitions to the znode every time a partition is completely moved.
> This is problematic because such a huge reassignment may greatly slow down 
> Kafka controller. Note that partition reassignment doesn't necessarily cause 
> data movement between brokers because we may use it only to recorder the 
> replica list of partitions to evenly distribute preferred leader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-10 Thread Dong Lin
Hey Ryanne,

Thanks much for the KIP!

Though I don't have time to review this KIP in detail at this stage, I
think this KIP will be very useful to Apache Kafka users (particularly
global enterprise users) who need geo replication capability. Currently
Kafka users have to setup and manage MM clusters for geo replication which
has many problems as described in the motivation section of this KIP. The
fact that many companies have built in-house projects to simplify Kafka geo
replication management shows that need for better built-in geo replication
support in Apache Kafka. It will require a lot of design discussion and
work in the community. It will be really great to see progress in this KIP.

Thanks!
Dong

On Mon, Oct 15, 2018 at 9:17 AM Ryanne Dolan  wrote:

> Hey y'all!
>
> Please take a look at KIP-382:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
>
> Thanks for your feedback and support.
>
> Ryanne
>


Re: [VOTE] KIP-354 Time-based log compaction policy

2018-12-05 Thread Dong Lin
Thanks for the update. +1 (binding)

On Wed, Dec 5, 2018 at 8:19 AM Colin McCabe  wrote:

> Thanks, Xiongqi Wu.  +1 (binding)
>
> regards,
> Colin
>
>
> On Tue, Dec 4, 2018, at 20:58, xiongqi (wesley) wu wrote:
> > Colin,
> >
> > Thanks for comments.
> > Out of ordered message timestamp is a very good point.
> > We can combine max.compaction.lag.ms  with
> > log.message.timestamp.difference.max.ms to achieve what we want in an
> > environment that message timestamp can be shifted a lot.
> >
> > There are similar discussions regarding log.retention.ms and
> > log.message.timestamp.difference.max.ms  in KAFKA-4340.
> >
> > I agree that we can always use first message timestamp not the
> maxTimestamp
> > of the previous log segment to make it simple.
> >
> > I have updated the KIP.
> >
> > Xiongqi (wesley) Wu
> >
> >
> > On Tue, Dec 4, 2018 at 5:13 PM Colin McCabe  wrote:
> >
> > > Hi Xiongqi,
> > >
> > > Thinking about this a little bit more, it seems like we don't have any
> > > guarantees just by looking at the timestamp of the first message in a
> log
> > > segment.  Similarly, we don't have any guarantees just by looking at
> the
> > > maxTimestamp of the previous log segment.  Old data could appear
> anywhere--
> > > you could put data that was years old in the middle of a segment from
> 2018.
> > >
> > > However, if log.message.timestamp.difference.max.ms is set, then we
> can
> > > make some actual guarantees that old data gets purged-- which is what
> the
> > > GDPR requires, of course.
> > >
> > > Overall, maybe we can make KIP-354 simpler by just always looking at
> the
> > > timestamp of the first log message.  I don't think looking at the
> > > maxTimestamp of the previous segment is any more accurate.  Aside from
> > > that, looks good, since we can get what we need with the combination of
> > > this and log.message.timestamp.difference.max.ms.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Nov 26, 2018, at 13:10, xiongqi wu wrote:
> > > > Thanks for binding and non-binding votes.
> > > > Can I get one more binding vote?
> > > >
> > > > Thanks in advance!
> > > >
> > > > Xiongqi (Wesley) Wu
> > > >
> > > >
> > > > On Wed, Nov 14, 2018 at 7:29 PM Matt Farmer  wrote:
> > > >
> > > > > I'm a +1 (non-binding) — This looks like it would have saved us a
> lot
> > > of
> > > > > pain in an issue we had to debug recently. I can't go into
> details, but
> > > > > figuring out how to achieve this effect gave me quite a headache.
> :)
> > > > >
> > > > > On Mon, Nov 12, 2018 at 1:00 PM xiongqi wu 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Can I have one more vote on this KIP?
> > > > > > Any comment is appreciated.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Add+a+Maximum+Log+Compaction+Lag
> > > > > >
> > > > > >
> > > > > > Xiongqi (Wesley) Wu
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 9, 2018 at 7:56 PM xiongqi wu 
> > > wrote:
> > > > > >
> > > > > > > Thanks Dong.
> > > > > > > I have updated the KIP.
> > > > > > >
> > > > > > > Xiongqi (Wesley) Wu
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 9, 2018 at 5:31 PM Dong Lin 
> > > wrote:
> > > > > > >
> > > > > > >> Thanks for the KIP Xiongqi. LGTM. +1 (binding)
> > > > > > >>
> > > > > > >> One minor comment: it may be a bit better to clarify in the
> public
> > > > > > >> interface section that the value of the newly added metric is
> > > > > determined
> > > > > > >> based by applying that formula across all compactable
> segments.
> > > For
> > > > > > >> example:
> > > > > > >>
> > > > > > >> The maximum value of Math.max(now -
> > > > &g

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-11-23 Thread Dong Lin
fy
> > > the situation with as less possible timeout configs as possible. Here
> is
> > a
> > > concrete suggestion I'd like propose:
> > >
> > > 1.a) Instead of introducing a registration_timeout in addition to the
> > > session_timeout for static members, we can just reuse the
> session_timeout
> > > and ask users to set it to a larger value when they are upgrading a
> > dynamic
> > > client to a static client by setting the "member.name" at the same
> time.
> > > By
> > > default, the broker-side min.session.timeout is 6 seconds and
> > > max.session.timeout is 5 minutes, which seems reasonable to me (we can
> of
> > > course modify this broker config to enlarge the valid interval if we
> want
> > > in practice). And then we should also consider removing the condition
> for
> > > marking a client as failed if the rebalance timeout has reached while
> the
> > > JoinGroup was not received, so that the semantics of session_timeout
> and
> > > rebalance_timeout are totally separated: the former is only used to
> > > determine if a consumer member of the group should be marked as failed
> > and
> > > kicked out of the group, and the latter is only used to determine the
> > > longest time coordinator should wait for PREPARE_REBALANCE phase. In
> > other
> > > words if a member did not send the JoinGroup in time of the
> > > rebalance_timeout, we still include it in the new generation of the
> group
> > > and use its old subscription info to send to leader for assignment.
> Later
> > > if the member came back with HeartBeat request, we can still follow the
> > > normal path to bring it to the latest generation while checking that
> its
> > > sent JoinGroup request contains the same subscription info as we used
> to
> > > assign the partitions previously (which should be likely the case in
> > > practice). In addition, we should let static members to not send the
> > > LeaveGroup request when it is gracefully shutdown, so that a static
> > member
> > > can only be leaving the group if its session has timed out, OR it has
> > been
> > > indicated to not exist in the group any more (details below).
> > >
> > > 1.b) We have a parallel discussion about Incremental Cooperative
> > > Rebalancing, in which we will encode the "when to rebalance" logic at
> the
> > > application level, instead of at the protocol level. By doing this we
> can
> > > also enable a few other optimizations, e.g. at the Streams level to
> first
> > > build up the state store as standby tasks and then trigger a second
> > > rebalance to actually migrate the active tasks while keeping the actual
> > > rebalance latency and hence unavailability window to be small (
> > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-6145data=02%7C01%7C%7C2f2dfc69b3694536124908d6504dbc3c%7C84df9e7fe9f640afb435%7C1%7C0%7C636784692128528166sdata=wrqKK%2BWMoA%2BF0fY%2BergbA73YTV6twD8lpcz1RYOvMKw%3Dreserved=0
> > ).
> > > I'd propose we align
> > > KIP-345 along with this idea, and hence do not add the
> expansion_timeout
> > as
> > > part of the protocol layer, but only do that at the application's
> > > coordinator / assignor layer (Connect, Streams, etc). We can still,
> > > deprecate the "*group.initial.rebalance.delay.ms
> > > <
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroup.initial.rebalance.delay.msdata=02%7C01%7C%7C2f2dfc69b3694536124908d6504dbc3c%7C84df9e7fe9f640afb435%7C1%7C0%7C636784692128528166sdata=q7SAR3tHJoFW0qZwwOdGk9FFNMIbUzPJfhR9iPuXPOM%3Dreserved=0
> > >*"
> > > though as part of this KIP
> > > since we have discussed about its limit and think it is actually not a
> > very
> > > good design and could be replaced with client-side logic above.
> > >
> > >
> > > 2. I'd like to see your thoughts on the upgrade path for this KIP. More
> > > specifically, let's say after we have upgraded broker version to be
> able
> > to
> > > recognize the new versions of JoinGroup request and the admin requests,
> > how
> > > should we upgrade the clients and enable static groups? On top of my
> head
> > > if we do a rolling bounce in which we set the member.name config as
> well
> > > as
> > > optionally increase the session.timeout config when we boun

Re: [ANNOUNCE] Apache Kafka 2.1.0

2018-11-22 Thread Dong Lin
Thanks everyone for the contribution and for verifying the release! I am
really happy to make contribution to Kafka community as well.

Hey Craig,

Thanks much for double checking the URL. You are right and
https://www.apache.org/dev/release-signing.html#keys-policy also suggests
to use https://www.apache.org/dist/kafka/KEYS for key link. It is updated
now.

Hey Mickael,

Thanks much for catching this. I misunderstood one step in the release
process. The issue is fixed now. I also lightly tuned the release process
wiki.

Have a great Thanksgiving holiday everyone!

Cheers,
Dong

On Thu, Nov 22, 2018 at 6:15 AM Craig Russell  wrote:

> Hi Kafka,
>
> Just a note that your download page has a link to the KEYS file at
> https://kafka.apache.org/KEYS
>
> The KEYS link should be https://www.apache.org/dist/kafka/KEYS for future
> announcements.
>
> Regards,
>
> Craig
>
> > On Nov 21, 2018, at 10:09 AM, Dong Lin  wrote:
> >
> > The Apache Kafka community is pleased to announce the release for Apache
> Kafka 2.1.0
> >
> >
> > This is a major release and includes significant features from 28 KIPs.
> It contains fixes and improvements from 179 JIRSs, including a few critical
> bug fixes. Here is a summary of some notable changes
> >
> > ** Java 11 support
> > ** Support for Zstandard, which achieves compression comparable to gzip
> with higher compression and especially decompression speeds(KIP-110)
> > ** Avoid expiring committed offsets for active consumer group (KIP-211)
> > ** Provide Intuitive User Timeouts in The Producer (KIP-91)
> > ** Kafka's replication protocol now supports improved fencing of
> zombies. Previously, under certain rare conditions, if a broker became
> partitioned from Zookeeper but not the rest of the cluster, then the logs
> of replicated partitions could diverge and cause data loss in the worst
> case (KIP-320)
> > ** Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
> > ** Admin script and admin client API improvements to simplify admin
> operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > ** DNS handling improvements (KIP-235, KIP-302)
> >
> >
> > All of the changes in this release can be found in the release notes:
> > https://www.apache.org/dist/kafka/2.1.0/RELEASE_NOTES.html <
> https://www.apache.org/dist/kafka/2.1.0/RELEASE_NOTES.html>
> >
> >
> > You can download the source and binary release (Scala ) from:
> > https://kafka.apache.org/downloads#2.1.0 <
> https://kafka.apache.org/downloads#2.1.0>
> >
> >
> ---
> >
> >
> > Apache Kafka is a distributed streaming platform with four core APIs:
> >
> >
> > ** The Producer API allows an application to publish a stream records to
> > one or more Kafka topics.
> >
> > ** The Consumer API allows an application to subscribe to one or more
> > topics and process the stream of records produced to them.
> >
> > ** The Streams API allows an application to act as a stream processor,
> > consuming an input stream from one or more topics and producing an
> > output stream to one or more output topics, effectively transforming the
> > input streams to output streams.
> >
> > ** The Connector API allows building and running reusable producers or
> > consumers that connect Kafka topics to existing applications or data
> > systems. For example, a connector to a relational database might
> > capture every change to a table.
> >
> >
> > With these APIs, Kafka can be used for two broad classes of application:
> >
> > ** Building real-time streaming data pipelines that reliably get data
> > between systems or applications.
> >
> > ** Building real-time streaming applications that transform or react
> > to the streams of data.
> >
> >
> > Apache Kafka is in use at large and small companies worldwide, including
> > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> >
> > A big thank you for the following 100 contributors to this release!
> >
> > Ahmed Al Mehdi, Aleksei Izmalkin, Alex Dunayevsky, Amit Sela, Andras
> Katona, Andy Coates, Anna Povzner, Arjun Satish, Attila Sasvari, Aviem Zur,
> Bibin Sebastian, Bill Bejeck, Bob Barrett, Brandon Kirchner, Bridger
> Howell, Chia-Ping Tsai, Colin Hicks, Colin Patrick McCabe, Dhruvil Shah,
> Dong Lin, Edoardo Comar, Eugen Feller, Ewen Cheslack-Postava, Filipe
> Agapito, Flavien Raynaud, Gantigmaa Selenge, Gardner Vi

[ANNOUNCE] Apache Kafka 2.1.0

2018-11-21 Thread Dong Lin
The Apache Kafka community is pleased to announce the release for Apache
Kafka 2.1.0


This is a major release and includes significant features from 28 KIPs. It
contains fixes and improvements from 179 JIRSs, including a few critical
bug fixes. Here is a summary of some notable changes

** Java 11 support
** Support for Zstandard, which achieves compression comparable to gzip
with higher compression and especially decompression speeds(KIP-110)
** Avoid expiring committed offsets for active consumer group (KIP-211)
** Provide Intuitive User Timeouts in The Producer (KIP-91)
** Kafka's replication protocol now supports improved fencing of zombies.
Previously, under certain rare conditions, if a broker became partitioned
from Zookeeper but not the rest of the cluster, then the logs of replicated
partitions could diverge and cause data loss in the worst case (KIP-320)
** Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
** Admin script and admin client API improvements to simplify admin
operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
** DNS handling improvements (KIP-235, KIP-302)


All of the changes in this release can be found in the release notes:
https://www.apache.org/dist/kafka/2.1.0/RELEASE_NOTES.html


You can download the source and binary release (Scala ) from:
https://kafka.apache.org/downloads#2.1.0

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming the
input streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react
to the streams of data.


Apache Kafka is in use at large and small companies worldwide, including
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
Target, The New York Times, Uber, Yelp, and Zalando, among others.

A big thank you for the following 100 contributors to this release!

Ahmed Al Mehdi, Aleksei Izmalkin, Alex Dunayevsky, Amit Sela, Andras
Katona, Andy Coates, Anna Povzner, Arjun Satish, Attila Sasvari, Aviem Zur,
Bibin Sebastian, Bill Bejeck, Bob Barrett, Brandon Kirchner, Bridger
Howell, Chia-Ping Tsai, Colin Hicks, Colin Patrick McCabe, Dhruvil Shah,
Dong Lin, Edoardo Comar, Eugen Feller, Ewen Cheslack-Postava, Filipe
Agapito, Flavien Raynaud, Gantigmaa Selenge, Gardner Vickers, Gitomain,
Gunnar Morling, Guozhang Wang, hashangayasri, huxi, huxihx, Ismael Juma,
Jagadesh Adireddi, Jason Gustafson, Jim Galasyn, Jimin Hsieh, Jimmy Casey,
Joan Goyeau, John Roesler, Jon Lee, jonathanskrzypek, Jun Rao, Kamal
Chandraprakash, Kevin Lafferty, Kevin Lu, Koen De Groote, Konstantine
Karantasis, lambdaliu, Lee Dongjin, Lincong Li, Liquan Pei, lucapette,
Lucas Wang, Maciej Bryński, Magesh Nandakumar, Manikumar Reddy, Manikumar
Reddy O, Mario Molina, Marko Stanković, Matthias J. Sax, Matthias
Wessendorf, Max Zheng, Mayank Tankhiwale, mgharat, Michal Dziemianko,
Michał Borowiecki, Mickael Maison, Mutasem Aldmour, Nikolay, nixsticks,
nprad, okumin, Radai Rosenblatt, radai-rosenblatt, Rajini Sivaram, Randall
Hauch, Robert Yokota, Rohan, Ron Dagostino, Sam Lendle, Sandor Murakozi,
Simon Clark, Stanislav Kozlovski, Stephane Maarek, Sébastien Launay, Sönke
Liebau, Ted Yu, uncleGen, Vahid Hashemian, Viktor Somogyi, wangshao,
xinzhg, Xiongqi Wesley Wu, Xiongqi Wu, ying-zheng, Yishun Guan, Yu Yang,
Zhanxiang (Patrick) Huang

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kafka.apache.org/

Thank you!

Regards,
Dong


[RESULTS] [VOTE] Release Kafka version 2.1.0

2018-11-20 Thread Dong Lin
This vote passes with 10 +1 votes (3 bindings) and no 0 or -1 votes.

+1 votes

PMC Members:

* Guozhang Wang
* Jason Gustafson
* Dong Lin


Committers:
* Vahid Hashemian
* ManiKumar Reddy


Community:
* Jonathan Santilli
* Eno Thereska
* Andras Beni

* Jakub Scholz

* Satish Duggana


0 votes
* No votes


-1 votes
* No votes

Vote thread:https://markmail.org/message/qzg3xhduj3otodkr

I will continue with the release process and send announcement email.

Cheers,

Dong


Re: [VOTE] 2.1.0 RC1

2018-11-20 Thread Dong Lin
Thanks everyone for your test and vote!

We have removed those unnecessary files (as Ismael mentioned) from the
staging repository and this release has collected enough votes. I will
continue with the release process.

On Tue, Nov 20, 2018 at 12:08 AM Jason Gustafson  wrote:

> +1
>
> I verified the release and upgrade notes. I also went through the basic
> quickstart.
>
> Great job running the release, Dong! Thanks for all the effort.
>
> -Jason
>
> On Mon, Nov 19, 2018 at 10:50 AM, Dong Lin  wrote:
>
> > Hey Ismael,
> >
> > I checked out a clean copy of Kafka and reuploaded artifacts for
> 2.1.0-rc1
> > without code change. There are still those new files in
> > https://repository.apache.org/content/groups/staging/org/
> > apache/kafka/kafka_2.12/2.1.0.
> > I compared 2.0 and 2.1 branch but did not find any suspicious change in
> > release.py and build.gradle.
> >
> > Since doing a new release could not address this right away and there is
> no
> > known impact on user due to these redundant files, I am inclined to still
> > release 2.1.0-rc1 so that user can start to use the new features soon.
> What
> > do you think?
> >
> > Thanks,
> > Dong
> >
> >
> > On Mon, Nov 19, 2018 at 2:16 AM Dong Lin  wrote:
> >
> > > Hey Ismael,
> > >
> > > Thanks much for catching this! Sorry I didn't catch this issue before.
> > >
> > > These files were uploaded by release.py scrip in the repo.
> > kafka_2.12-2.1.
> > > 0.mapping contains the following string and the other files are the
> > > signature and hash of the file kafka_2.12-2.1.0.mapping:
> > >
> > >
> > > /home/dolin/research/kafka/.release_work_dir/kafka/core/
> > build/libs/kafka_2.12-2.1.0.jar
> > >
> > >
> /home/dolin/research/kafka/.release_work_dir/kafka/core/build/tmp/scala/
> > compilerAnalysis/compileScala.analysis
> > >
> > > It is weird to have these files.. Let me generate another release
> > > candidate and try to fix this issue.
> > >
> > > Thanks,
> > > Dong
> > >
> >
>


Re: [VOTE] 2.1.0 RC1

2018-11-19 Thread Dong Lin
Hey Ismael,

I checked out a clean copy of Kafka and reuploaded artifacts for 2.1.0-rc1
without code change. There are still those new files in
https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.12/2.1.0.
I compared 2.0 and 2.1 branch but did not find any suspicious change in
release.py and build.gradle.

Since doing a new release could not address this right away and there is no
known impact on user due to these redundant files, I am inclined to still
release 2.1.0-rc1 so that user can start to use the new features soon. What
do you think?

Thanks,
Dong


On Mon, Nov 19, 2018 at 2:16 AM Dong Lin  wrote:

> Hey Ismael,
>
> Thanks much for catching this! Sorry I didn't catch this issue before.
>
> These files were uploaded by release.py scrip in the repo. kafka_2.12-2.1.
> 0.mapping contains the following string and the other files are the
> signature and hash of the file kafka_2.12-2.1.0.mapping:
>
>
> /home/dolin/research/kafka/.release_work_dir/kafka/core/build/libs/kafka_2.12-2.1.0.jar
>
> /home/dolin/research/kafka/.release_work_dir/kafka/core/build/tmp/scala/compilerAnalysis/compileScala.analysis
>
> It is weird to have these files.. Let me generate another release
> candidate and try to fix this issue.
>
> Thanks,
> Dong
>


Re: [VOTE] 2.1.0 RC1

2018-11-19 Thread Dong Lin
Hey Ismael,

Thanks much for catching this! Sorry I didn't catch this issue before.

These files were uploaded by release.py scrip in the repo.
kafka_2.12-2.1.0.mapping
contains the following string and the other files are the signature and
hash of the file kafka_2.12-2.1.0.mapping:

/home/dolin/research/kafka/.release_work_dir/kafka/core/build/libs/kafka_2.12-2.1.0.jar
/home/dolin/research/kafka/.release_work_dir/kafka/core/build/tmp/scala/compilerAnalysis/compileScala.analysis

It is weird to have these files.. Let me generate another release candidate
and try to fix this issue.

Thanks,
Dong


Re: [DISCUSS] KIP-388 Add observer interface to record request and response

2018-11-18 Thread Dong Lin
the observer. Users cannot implement RequestAdapter/ResponseAdapter
>> and they are just internal helper classes to extract pre-defined info for
>> the observer. This does decouple the internal classes
>> (AbstractResponse/RequestChannel.request) but my concerns are:
>>
>>1.  Pre-defining the information that can be exposed to the observer
>>is good for hiding the internal classes but may limit the use cases of the
>>observer. Also, it makes it harder to evolve the RequestInfo/ResponseInfo
>>if there is a need for the observer to access more information in the
>>future because we will need to change the implementation of
>>RequestAdapter/ResponseAdaptor and probably break the existing observers.
>>In this case, I think Dong's suggestion is the right way to go.
>>2.  If we know (for sure) that the information that can be exposed to
>>the observer will never change, it is possible to pre-define them but I
>>think you need to explain more about why this is the case. Assume we 
>> choose
>>to pre-define the info,  we don't need the RequestInfo/ResponseInfo to
>>interface and then implement the adaptors because there will be only one
>>implementation for the interfaces provided internally to extract the info
>>from the requests. We also don't need to separate RequestInfo and
>>ResponseInfo. What we need is just one class (e.g. ObservedInfo) with
>>getters for the pre-defined info and pass it to the observer (e.g. use
>> public void record(ObserverdInfo observedInfo)). Also,
>>getProduceToTopicSizeInBytes/getProduceToTopicRecordCount/
>>
>> getFetchFromTopicPartitionSizeInBytes/getFetchFromTopicPartitionRecordCount
>>are very specific to one type of request and we should make the
>>pre-define info more generic if we choose to go for this route.
>>
>>
>>
>> Best,
>> Zhanxiang (Patrick) Huang
>>
>> --
>> *From:* Dong Lin 
>> *Sent:* Saturday, November 17, 2018 18:37
>> *To:* dev
>> *Subject:* Re: [DISCUSS] KIP-388 Add observer interface to record
>> request and response
>>
>> Hey Lincong,
>>
>> The main concern for the original version of the KIP is that, if we expose
>> internal classes such as RequestChannel.Request and AbstractResponse, it
>> will be difficult to make change to those internal classes in the future.
>>
>> In the latest version of the KIP, it seems that user is going to provide
>> implementation of classes such as RequestAdapter, ResponseAdapter, which
>> again interact with internal classes RequestChannel.Request and
>> AbstractResponse directly. So it seems that we still have the same issue
>> of
>> exposing the same internal classes to the user plugin implementation?
>>
>> Here is some high level idea of how to make this work without exposing
>> internal classes. Basically we want to have the same information as
>> currently contained in AbstractRequest, AbstractResponse and
>> RequestHeader.
>> These classes can be replaced with the following fields. The
>> transformation
>> between ByteBuffer and AbstractRequest/AbstractResponse/RequestHeader is
>> totally specified by the schema of each request/response which is already
>> public interface in Kafka. So this approach will not expose internal
>> classes.
>>
>> AbstractRequest -> int apiKeyId, short apiVersion, ByteBuffer buffer
>> AbstractResponse -> int apiKeyId, short apiVersion, ByteBuffer buffer
>> RequestHeader -> short requsetHeaderSchemaVersion, ByteBuffer buffer
>>
>> And user are also free to compile their plugin together with the Kafka
>> source code so that they can just do the following without re-writing the
>> serialization/deserialization logic:
>>
>> apiKey.parseRequest(apiVersion, buffer)
>> Struct struct = apiKey.parseRequest(apiVersion, buffer);
>> AbstractRequest requset = AbstractRequest.parseRequest(apiKey, apiVersion,
>> struct);
>>
>> Currently I am not sure whether it is going to be expensive to do
>> conversion between Struct and ByteBuffer. My understanding is that this is
>> not relatively expensive because the main CPU overhead in broker appears
>> to
>> be e.g. decompression, re-compression, SSL. But if it is expensive, it may
>> be reasonable to replace ByteBuffer with Struct, where Struct already
>> contain fields such as Schema. I have not carefully considered this
>> directly. The final solution probably needs to explicitly specify whether
>> we want to keep binary compatibility an

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-11-17 Thread Dong Lin
Hey Boyang,

Thanks for the proposal! This is very useful. I have some comments below:

1) The motivation currently explicitly states that the goal is to improve
performance for heavy state application. It seems that the motivation can
be stronger with the following use-case. Currently for MirrorMaker cluster
with e.g. 100 MirrorMaker processes, it will take a long time to rolling
bounce the entire MirrorMaker cluster. Each MirrorMaker process restart
will trigger a rebalance which currently pause the consumption of the all
partitions of the MirrorMaker cluster. With the change stated in this
patch, as long as a MirrorMaker can restart within the specified timeout
(e.g. 2 minutes), then we only need constant number of rebalance (e.g. for
leader restart) for the entire rolling bounce, which will significantly
improves the availability of the MirrorMaker pipeline. In my opinion, the
main benefit of the KIP is to avoid unnecessary rebalance if the consumer
process can be restarted within soon, which helps performance even if
overhead of state shuffling for a given process is small.

2) In order to simplify the KIP reading, can you follow the writeup style
of other KIP (e.g. KIP-98) and list the interface change such as new
configs (e.g. registration timeout), new request/response, new AdminClient
API and new error code (e.g. DUPLICATE_STATIC_MEMBER)? Currently some of
these are specified in the Proposed Change section which makes it a bit
inconvenient to understand the new interface that will be exposed to user.
Explanation of the current two-phase rebalance protocol probably can be
moved out of public interface section.

3) There are currently two version of JoinGroupRequest in the KIP and only
one of them has field memberId. This seems confusing.

4) It is mentioned in the KIP that "An admin API to force rebalance could
be helpful here, but we will make a call once we finished the major
implementation". So this seems to be still an open question in the current
design. We probably want to agree on this before voting for the KIP.

5) The KIP currently adds new config MEMBER_NAME for consumer. Can you
specify the name of the config key and the default config value? Possible
default values include empty string or null (similar to transaction.id in
producer config).

6) Regarding the use of the topic "static_member_map" to persist member
name map, currently if consumer coordinator broker goes offline, rebalance
is triggered and consumers will try connect to the new coordinator. If
these consumers can connect to the new coordinator within
max.poll.interval.ms which by default is 5 minutes, given that broker can
use a deterministic algorithm to determine the partition -> member_name
mapping, each consumer should get assigned the same set of partitions
without requiring state shuffling. So it is not clear whether we have a
strong use-case for this new logic. Can you help clarify what is the
benefit of using topic "static_member_map" to persist member name map?

7) Regarding the introduction of the expensionTimeoutMs config, it is
mentioned that "we are using expansion timeout to replace rebalance
timeout, which is configured by max.poll.intervals from client side, and
using registration timeout to replace session timeout". Currently the
default max.poll.interval.ms is configured to be 5 minutes and there will
be only one rebalance if all new consumers can join within 5 minutes. So it
is not clear whether we have a strong use-case for this new config. Can you
explain what is the benefit of introducing this new config?

8) It is mentioned that "To distinguish between previous version of
protocol, we will also increase the join group request version to v4 when
MEMBER_NAME is set" and "If the broker version is not the latest (< v4),
the join group request shall be downgraded to v3 without setting the member
Id". It is probably simpler to just say that this feature is enabled if
JoinGroupRequest V4 is supported on both client and broker and MEMBER_NAME
is configured with non-empty string.

9) It is mentioned that broker may return NO_STATIC_MEMBER_INFO_SET error
in OffsetCommitResponse for "commit requests under static membership". Can
you clarify how broker determines whether the commit request is under
static membership?

Thanks,
Dong


Re: [DISCUSS] KIP-388 Add observer interface to record request and response

2018-11-17 Thread Dong Lin
Hey Lincong,

The main concern for the original version of the KIP is that, if we expose
internal classes such as RequestChannel.Request and AbstractResponse, it
will be difficult to make change to those internal classes in the future.

In the latest version of the KIP, it seems that user is going to provide
implementation of classes such as RequestAdapter, ResponseAdapter, which
again interact with internal classes RequestChannel.Request and
AbstractResponse directly. So it seems that we still have the same issue of
exposing the same internal classes to the user plugin implementation?

Here is some high level idea of how to make this work without exposing
internal classes. Basically we want to have the same information as
currently contained in AbstractRequest, AbstractResponse and RequestHeader.
These classes can be replaced with the following fields. The transformation
between ByteBuffer and AbstractRequest/AbstractResponse/RequestHeader is
totally specified by the schema of each request/response which is already
public interface in Kafka. So this approach will not expose internal
classes.

AbstractRequest -> int apiKeyId, short apiVersion, ByteBuffer buffer
AbstractResponse -> int apiKeyId, short apiVersion, ByteBuffer buffer
RequestHeader -> short requsetHeaderSchemaVersion, ByteBuffer buffer

And user are also free to compile their plugin together with the Kafka
source code so that they can just do the following without re-writing the
serialization/deserialization logic:

apiKey.parseRequest(apiVersion, buffer)
Struct struct = apiKey.parseRequest(apiVersion, buffer);
AbstractRequest requset = AbstractRequest.parseRequest(apiKey, apiVersion,
struct);

Currently I am not sure whether it is going to be expensive to do
conversion between Struct and ByteBuffer. My understanding is that this is
not relatively expensive because the main CPU overhead in broker appears to
be e.g. decompression, re-compression, SSL. But if it is expensive, it may
be reasonable to replace ByteBuffer with Struct, where Struct already
contain fields such as Schema. I have not carefully considered this
directly. The final solution probably needs to explicitly specify whether
we want to keep binary compatibility and source compatibility. Hopefully
this is good direction to move this KIP forward.

Thanks,
Dong

On Wed, Nov 14, 2018 at 10:41 AM Lincong Li  wrote:

> Hi Wesley,
>
> Thank you very much for your feedback. The concern on memory pressure is
> definitely valid. However it should be the user's job to keep this concern
> in mind and implement the observer in the most reasonable way for their use
> case. In other words, implement it at their own risks.
>
> The alternative approach to mitigate this concern is to implement adapters
> in a way that it extracts all information required by all its getters at
> initialization when a reference to request/response is given so that
> references to request/response could be garbaged collected. However, this
> proactive initialization might be wasteful. For example, methods such as "
> public Map getFetchFromTopicPartitionSizeInBytes()"
> takes non-constant time.
>
> Best regards,
> Lincong Li
>
> On Tue, Nov 13, 2018 at 5:56 PM xiongqi wu  wrote:
>
> > Lincong,
> >
> > Thanks for the KIP.
> > I have a question about the lifecycle of request and response.
> > With the current (requestAdapter, responseAdapter) implementation,
> > the observer can potentially extend the lifecycle of request and response
> > through adapter.
> > Anyone can implement own observer, and some observers may want to do
> async
> > process or batched processing.
> >
> > Could you clarify how could we make sure this do not increase the memory
> > pressure on potentially holding large request/response object?
> >
> >
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Mon, Nov 12, 2018 at 10:23 PM Lincong Li 
> > wrote:
> >
> > > Thanks Mayuresh, Ismael and Colin for your feedback!
> > >
> > > I updated the KIP basing on your feedback. The change is basically that
> > two
> > > interfaces are introduced to prevent the internal classes from being
> > > exposed. These two interfaces contain getters that allow user to
> extract
> > > information from request and response in their own implementation(s) of
> > the
> > > observer interface and they would not constraint future implementation
> > > changes in neither RequestChannel.Request nor AbstractResponse. There
> > could
> > > be more getters defined in these two interfaces. The implementation of
> > > these two interfaces will be provided as part of the KIP.
> > >
> > > I also expanded on "Add code to the broker (in KafkaApis) to allow
> Kafka
> > > servers to invoke any
> > > observers defined. More specifically, change KafkaApis code to invoke
> all
> > > defined observers, in the order in which they were defined, for every
> > > request-response pair" by providing a sample code block which shows how
> > > these interfaces are used in the KafkaApis class.
> > >
> > > Let 

Re: [VOTE] 2.1.0 RC1

2018-11-16 Thread Dong Lin
Hey Eno, Vahid and everyone,

Thanks for reporting the test error!
https://builds.apache.org/job/kafka-2.1-jdk8/ shows the list of recent unit
test runs. 7 out of 10 recent runs have passed all tests. Each of the three
runs shows one unique flaky test failure.

I have opened umbrella JIRA https://issues.apache.org/jira/browse/KAFKA-7645
to track these flake test. There are currently 7 flaky tests reported in
either https://builds.apache.org/job/kafka-2.1-jdk8/  or the voting
threads. Among these 7 flaky tests, 3 tests failed due to issue in the test
logic, 3 tests are related SSL with similar failure in 2.0 branch which has
been running well. So these 6 tests should not be blocking issue for 2.1.0
release.

Regarding the other test failure for
LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
(KAFKA-7647 <https://issues.apache.org/jira/browse/KAFKA-7647>). I did due
diligence to understand the failure but could not find any bug. Since this
test failure happens rarely and I could not find any issue by looking at
the stacktrace, I am again inclined not to consider this as blocking issue.
We can discuss more if there is different opinion in the mailing list.

Thanks,
Dong


On Thu, Nov 15, 2018 at 10:28 PM Guozhang Wang  wrote:

> +1 (binding).
>
> I've verified the signature, and ran quickstart / unit test with scala 2.12
> binary.
>
> On my local laptop the unit test did not fail though on Jenkins it seems
> indeed flaky.
>
> Guozhang
>
> On Fri, Nov 9, 2018 at 3:33 PM Dong Lin  wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second candidate for feature release of Apache Kafka 2.1.0.
> >
> > This is a major version release of Apache Kafka. It includes 28 new KIPs
> > and
> >
> > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > details:
> >
> > *
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > >
> >
> > Here are a few notable highlights:
> >
> > - Java 11 support
> > - Support for Zstandard, which achieves compression comparable to gzip
> with
> > higher compression and especially decompression speeds(KIP-110)
> > - Avoid expiring committed offsets for active consumer group (KIP-211)
> > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > - Kafka's replication protocol now supports improved fencing of zombies.
> > Previously, under certain rare conditions, if a broker became partitioned
> > from Zookeeper but not the rest of the cluster, then the logs of
> replicated
> > partitions could diverge and cause data loss in the worst case (KIP-320)
> > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
> > - Admin script and admin client API improvements to simplify admin
> > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > - DNS handling improvements (KIP-235, KIP-302)
> >
> > Release notes for the 2.1.0 release:
> > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, Nov 15, 12 pm PT ***
> >
> > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~lindong/kafka-2.1.0-rc1/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~lindong/kafka-2.1.0-rc1/javadoc/
> >
> > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc1 tag:
> > https://github.com/apache/kafka/tree/2.1.0-rc1
> >
> > * Documentation:
> > *http://kafka.apache.org/21/documentation.html*
> > <http://kafka.apache.org/21/documentation.html>
> >
> > * Protocol:
> > http://kafka.apache.org/21/protocol.html
> >
> > * Successful Jenkins builds for the 2.1 branch:
> > Unit/integration tests: *
> https://builds.apache.org/job/kafka-2.1-jdk8/50/
> > <https://builds.apache.org/job/kafka-2.1-jdk8/50/>*
> >
> > Please test and verify the release artifacts and submit a vote for this
> RC,
> > or report any issues so we can fix them and get a new RC out ASAP.
> Although
> > this release vote requires PMC votes to pass, testing, votes, and bug
> > reports are valuable and appreciated from everyone.
> >
> > Cheers,
> > Dong
> >
>
>
> --
> -- Guozhang
>


[jira] [Created] (KAFKA-7651) Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7651:
---

 Summary: Flaky test 
SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts
 Key: KAFKA-7651
 URL: https://issues.apache.org/jira/browse/KAFKA-7651
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin


Here is stacktrace from 
https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/

{code}
Error Message
java.lang.AssertionError: Expected an exception of type 
org.apache.kafka.common.errors.TimeoutException; got type 
org.apache.kafka.common.errors.SslAuthenticationException
Stacktrace
java.lang.AssertionError: Expected an exception of type 
org.apache.kafka.common.errors.TimeoutException; got type 
org.apache.kafka.common.errors.SslAuthenticationException
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404)
at 
kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7649) Flaky test SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7649:
---

 Summary: Flaky test 
SslEndToEndAuthorizationTest.testNoProduceWithDescribeAcl
 Key: KAFKA-7649
 URL: https://issues.apache.org/jira/browse/KAFKA-7649
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin


Observed in 
https://builds.apache.org/job/kafka-2.1-jdk8/49/testReport/junit/kafka.api/SslEndToEndAuthorizationTest/testNoProduceWithDescribeAcl/

{code}
Error Message
java.lang.SecurityException: zookeeper.set.acl is true, but the verification of 
the JAAS login file failed.
Stacktrace
java.lang.SecurityException: zookeeper.set.acl is true, but the verification of 
the JAAS login file failed.
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:361)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.utils.TestUtils$.createServer(TestUtils.scala:135)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:101)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:100)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:100)
at 
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81)
at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73)
at 
kafka.api.EndToEndAuthorizationTest.setUp(EndToEndAuthorizationTest.scala:180)
at 
kafka.api.SslEndToEndAuthorizationTest.setUp(SslEndToEndAuthorizationTest.scala:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source

[jira] [Created] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7648:
---

 Summary: Flaky test 
DeleteTopicsRequestTest.testValidDeleteTopicRequests
 Key: KAFKA-7648
 URL: https://issues.apache.org/jira/browse/KAFKA-7648
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin


Observed in 
[https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/]

 
h3. Error Message
org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already 
exists.
h3. Stacktrace
org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already 
exists.
h3. Standard Output
[2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
fetcherId=0] Error for partition topic-3-3 at offset 0 
(kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client 
session timed out, have not heard from server in 4000ms for sessionid 
0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 
17:53:14,806] WARN Unable to read additional data from client sessionid 
0x10051eebf480003, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] WARN 
Client session timed out, have not heard from server in 4002ms for sessionid 
0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 
17:53:14,807] WARN Unable to read additional data from client sessionid 
0x10051eebf480002, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] WARN 
Client session timed out, have not heard from server in 4002ms for sessionid 
0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 
17:53:14,824] WARN Unable to read additional data from client sessionid 
0x10051eebf480001, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] WARN 
Client session timed out, have not heard from server in 4002ms for sessionid 
0x10051eebf48 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 
17:53:15,423] WARN Unable to read additional data from client sessionid 
0x10051eebf48, likely client has closed socket 
(org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] WARN 
fsync-ing the write ahead log in SyncThread:0 took 4456ms which will adversely 
effect operation latency. See the ZooKeeper troubleshooting guide 
(org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 
17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error 
for partition topic-4-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR 
[ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition 
invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR 
[ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7647) Flaky test LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7647:
---

 Summary: Flaky test 
LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic
 Key: KAFKA-7647
 URL: https://issues.apache.org/jira/browse/KAFKA-7647
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin


kafka.log.LogCleanerParameterizedIntegrationTest >
testCleansCombinedCompactAndDeleteTopic[3] FAILED
    java.lang.AssertionError: Contents of the map shouldn't change
expected: (340,340), 5 -> (345,345), 10 -> (350,350), 14 ->
(354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353),
2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
(343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 ->
(348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but
was: (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354),
1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 ->
(342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 ->
(343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 99 ->
(299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 ->
(355,355))>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at
kafka.log.LogCleanerParameterizedIntegrationTest.testCleansCombinedCompactAndDeleteTopic(LogCleanerParameterizedIntegrationTest.scala:129)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7646) Flaky test SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7646:
---

 Summary: Flaky test 
SaslOAuthBearerSslEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaSubscribe
 Key: KAFKA-7646
 URL: https://issues.apache.org/jira/browse/KAFKA-7646
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin


This test is reported to fail by [~enothereska] as part of 2.1.0 RC0 release 
certification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7645) Fix flaky unit test for 2.1 branch

2018-11-16 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7645:
---

 Summary: Fix flaky unit test for 2.1 branch
 Key: KAFKA-7645
 URL: https://issues.apache.org/jira/browse/KAFKA-7645
 Project: Kafka
  Issue Type: Task
Reporter: Dong Lin
Assignee: Dong Lin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-11-12 Thread Dong Lin
> > > partition states map, its highwater mark will be lost after the
> > > > highwater
> > > > > > mark checkpoint thread overwrites the file. (Related codes:
> > > > > >
> > https://github.com/apache/kafka/blob/ed3bd79633ae227ad995dafc3d9f38
> > > > > > 4a5534d4e9/core/src/main/scala/kafka/server/
> > > > ReplicaManager.scala#L1091)
> > > > > > [https://avatars3.githubusercontent.com/u/47359?s=400=4]<
> > > > > >
> > https://github.com/apache/kafka/blob/ed3bd79633ae227ad995dafc3d9f38
> > > > > > 4a5534d4e9/core/src/main/scala/kafka/server/
> > > > ReplicaManager.scala#L1091>
> > > > > >
> > > > > > apache/kafka<https://github.com/apache/kafka/blob/
> > > > > >
> > > > > ed3bd79633ae227ad995dafc3d9f384a5534d4e9/core/src/main/
> > > > scala/kafka/server/
> > > > > > ReplicaManager.scala#L1091>
> > > > > > Mirror of Apache Kafka. Contribute to apache/kafka development by
> > > > > creating
> > > > > > an account on GitHub.
> > > > > > github.com
> > > > > >
> > > > > >
> > > > > > In your example, assume the first LeaderAndIsrRequest broker A
> > > receives
> > > > > is
> > > > > > the one initiated in the controlled shutdown logic in Controller
> to
> > > > move
> > > > > > leadership away from broker A. This LeaderAndIsrRequest only
> > contains
> > > > > > partitions that broker A leads, not all the partitions that
> broker
> > A
> > > > > hosts
> > > > > > (i.e. no follower partitions), so the highwater mark for the
> > follower
> > > > > > partitions will be lost. Also, the first LeaderAndIsrRequst
> broker
> > A
> > > > > > receives may not necessarily be the one initiated in controlled
> > > > shutdown
> > > > > > logic (e.g. there can be an ongoing preferred leader election),
> > > > although
> > > > > I
> > > > > > think this may not be very common.
> > > > > >
> > > > > > Here the controller will start processing the BrokerChange event
> > > (that
> > > > > says
> > > > > > that broker A shutdown) after the broker has come back up and
> > > > > re-registered
> > > > > > himself in ZK?
> > > > > > How will the Controller miss the restart, won't he subsequently
> > > receive
> > > > > > another ZK event saying that broker A has come back up?
> > > > > > Controller will not miss the BrokerChange event and actually
> there
> > > will
> > > > > be
> > > > > > two BrokerChange events fired in this case (one for broker
> > > > deregistration
> > > > > > in zk and one for registration). However, when processing the
> > > > > > BrokerChangeEvent, controller needs to do a read from zookeeper
> to
> > > get
> > > > > back
> > > > > > the current brokers in the cluster and if the bounced broker
> > already
> > > > > joined
> > > > > > the cluster by this time, controller will not know this broker
> has
> > > been
> > > > > > bounced because it sees no diff between zk and its in-memory
> cache.
> > > So
> > > > > > basically both of the BrokerChange event processing become no-op.
> > > > > >
> > > > > >
> > > > > > Hope that I answer your questions. Feel free to follow up if I am
> > > > missing
> > > > > > something.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Zhanxiang (Patrick) Huang
> > > > > >
> > > > > > 
> > > > > > From: Stanislav Kozlovski 
> > > > > > Sent: Wednesday, October 10, 2018 7:22
> > > > > > To: dev@kafka.apache.org
> > > > > > Subject: Re: [DISCUSS] KIP-380: Detect outdated control requests
> > and
> > > > > > bounced brokers using broker generation
> > > > > >
> > > > > > Hi Patrick,
> > > > > >
> > > > > > Thanks for t

Re: [VOTE] KIP-354 Time-based log compaction policy

2018-11-09 Thread Dong Lin
Thanks for the KIP Xiongqi. LGTM. +1 (binding)

One minor comment: it may be a bit better to clarify in the public
interface section that the value of the newly added metric is determined
based by applying that formula across all compactable segments. For example:

The maximum value of Math.max(now -
earliest_timestamp_in_ms_of_uncompacted_segment - max.compaction.lag.ms,
0)/1000 across all compactable partitions, where the max.compaction.lag.ms
can be overridden on per-topic basis.



On Fri, Nov 9, 2018 at 5:16 PM xiongqi wu  wrote:

> Thanks Joel.
> Tracking the delay at second granularity makes sense
> I have updated KIP.
>
> Xiongqi (Wesley) Wu
>
>
> On Fri, Nov 9, 2018 at 5:05 PM Joel Koshy  wrote:
>
> > +1 with one suggestion on the proposed metric. You should probably
> include
> > the unit. So for e.g., max-compaction-delay-secs.
> >
> > Joel
> >
> > On Tue, Nov 6, 2018 at 5:30 PM xiongqi wu  wrote:
> >
> > > bump
> > > Xiongqi (Wesley) Wu
> > >
> > >
> > > On Thu, Sep 27, 2018 at 4:20 PM xiongqi wu 
> wrote:
> > >
> > > >
> > > > Thanks Eno, Brett, Dong, Guozhang, Colin,  and Xiaohe for feedback.
> > > > Can I have more feedback or VOTE on this KIP?
> > > >
> > > >
> > > > Xiongqi (Wesley) Wu
> > > >
> > > >
> > > > On Wed, Sep 19, 2018 at 10:52 AM xiongqi wu 
> > wrote:
> > > >
> > > >> Any other votes or comments?
> > > >>
> > > >> Xiongqi (Wesley) Wu
> > > >>
> > > >>
> > > >> On Tue, Sep 11, 2018 at 11:45 AM xiongqi wu 
> > > wrote:
> > > >>
> > > >>> Yes, more votes and code review.
> > > >>>
> > > >>> Xiongqi (Wesley) Wu
> > > >>>
> > > >>>
> > > >>> On Mon, Sep 10, 2018 at 11:37 PM Brett Rann
> >  > > >
> > > >>> wrote:
> > > >>>
> > >  +1 (non binding) from on 0 then, and on the KIP.
> > > 
> > >  Where do we go from here? More votes?
> > > 
> > >  On Tue, Sep 11, 2018 at 5:34 AM Colin McCabe 
> > >  wrote:
> > > 
> > >  > On Mon, Sep 10, 2018, at 11:44, xiongqi wu wrote:
> > >  > > Thank you for comments. I will use '0' for now.
> > >  > >
> > >  > > If we create topics through admin client in the future, we
> might
> > >  perform
> > >  > > some useful checks. (but the assumption is all brokers in the
> > same
> > >  > cluster
> > >  > > have the same default configurations value, otherwise,it might
> > >  still be
> > >  > > tricky to do such cross validation check.)
> > >  >
> > >  > This isn't something that we might do in the future-- this is
> > >  something we
> > >  > are doing now. We already have Create Topic policies which are
> > >  enforced by
> > >  > the broker. Check KIP-108 and KIP-170 for details. This is one
> of
> > > the
> > >  > motivations for getting rid of direct ZK access-- making sure
> that
> > >  these
> > >  > policies are applied.
> > >  >
> > >  > I agree that having different configurations on different
> brokers
> > > can
> > >  be
> > >  > confusing and frustrating . That's why more configurations are
> > being
> > >  made
> > >  > dynamic using KIP-226. Dynamic configurations are stored
> centrally
> > > in
> > >  ZK,
> > >  > so they are the same on all brokers (modulo propagation delays).
> > In
> > >  any
> > >  > case, this is a general issue, not specific to "create topics".
> > >  >
> > >  > cheers,
> > >  > Colin
> > >  >
> > >  >
> > >  > >
> > >  > >
> > >  > > Xiongqi (Wesley) Wu
> > >  > >
> > >  > >
> > >  > > On Mon, Sep 10, 2018 at 11:15 AM Colin McCabe <
> > cmcc...@apache.org
> > > >
> > >  > wrote:
> > >  > >
> > >  > > > I don't have a strong opinion. But I think we should
> probably
> > be
> > >  > > > consistent with how segment.ms works, and just use 0.
> > >  > > >
> > >  > > > best,
> > >  > > > Colin
> > >  > > >
> > >  > > >
> > >  > > > On Wed, Sep 5, 2018, at 21:19, Brett Rann wrote:
> > >  > > > > OK thanks for that clarification. I see why you're
> > > uncomfortable
> > >  > with 0
> > >  > > > now.
> > >  > > > >
> > >  > > > > I'm not really fussed. I just prefer consistency in
> > >  configuration
> > >  > > > options.
> > >  > > > >
> > >  > > > > Personally I lean towards treating 0 and 1 similarly in
> that
> > >  > scenario,
> > >  > > > > because it favours the person thinking about setting the
> > >  > configurations,
> > >  > > > > and a person doesn't care about a 1ms edge case especially
> > > when
> > >  the
> > >  > > > context
> > >  > > > > is the true minimum is tied to the log cleaner cadence.
> > >  > > > >
> > >  > > > > Introducing 0 to mean "disabled" because there is some
> > >  uniquness in
> > >  > > > > segment.ms not being able to be set to 0, reduces
> > > configuration
> > >  > > > consistency
> > >  > > > > in favour of capturing a MS gap in an edge case that
> nobody
> > >  

Re: [VOTE] 2.1.0 RC0

2018-11-09 Thread Dong Lin
Hey Mani,

Thanks again for catching this. The issue has been fixed and the configs
are properly displayed in
http://kafka.apache.org/20/documentation.html#producerconfigs now.

Thanks,
Dong

On Fri, Nov 9, 2018 at 3:53 PM Dong Lin  wrote:

> Hey Mani,
>
> Thanks for catching these! Since kafka-site (
> https://github.com/apache/kafka-site) can be updated separately of the
> kafka release, I will file ticket and fix these later.
>
> Thanks,
> Dong
>
> On Fri, Oct 26, 2018 at 6:58 AM Manikumar 
> wrote:
>
>> minor observation: config sections are empty in the documentation page.
>> http://kafka.apache.org/21/documentation.html#producerconfigs
>>
>> On Wed, Oct 24, 2018 at 10:49 PM Ted Yu  wrote:
>>
>> > +1
>> >
>> > InternalTopicIntegrationTest failed during test suite run but passed
>> with
>> > rerun.
>> >
>> > On Wed, Oct 24, 2018 at 3:48 AM Andras Beni > > .invalid>
>> > wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > Verified signatures and checksums of release artifacts
>> > > Performed quickstart steps on rc artifacts (both scala 2.11 and 2.12)
>> and
>> > > one built from tag 2.1.0-rc0
>> > >
>> > > Andras
>> > >
>> > > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin 
>> wrote:
>> > >
>> > > > Hello Kafka users, developers and client-developers,
>> > > >
>> > > > This is the first candidate for feature release of Apache Kafka
>> 2.1.0.
>> > > >
>> > > > This is a major version release of Apache Kafka. It includes 28 new
>> > KIPs
>> > > > and
>> > > >
>> > > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
>> > > > details:
>> > > >
>> > > > *
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
>> > > > <
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> > > > >
>> > > >
>> > > > Here are a few notable highlights:
>> > > >
>> > > > - Java 11 support
>> > > > - Support for Zstandard, which achieves compression comparable to
>> gzip
>> > > with
>> > > > higher compression and especially decompression speeds(KIP-110)
>> > > > - Avoid expiring committed offsets for active consumer group
>> (KIP-211)
>> > > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
>> > > > - Kafka's replication protocol now supports improved fencing of
>> > zombies.
>> > > > Previously, under certain rare conditions, if a broker became
>> > partitioned
>> > > > from Zookeeper but not the rest of the cluster, then the logs of
>> > > replicated
>> > > > partitions could diverge and cause data loss in the worst case
>> > (KIP-320)
>> > > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
>> > KIP-356)
>> > > > - Admin script and admin client API improvements to simplify admin
>> > > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
>> > > > - DNS handling improvements (KIP-235, KIP-302)
>> > > >
>> > > > Release notes for the 2.1.0 release:
>> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
>> > > >
>> > > > *** Please download, test and vote ***
>> > > >
>> > > > * Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > > http://kafka.apache.org/KEYS
>> > > >
>> > > > * Release artifacts to be voted upon (source and binary):
>> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
>> > > >
>> > > > * Maven artifacts to be voted upon:
>> > > > https://repository.apache.org/content/groups/staging/
>> > > >
>> > > > * Javadoc:
>> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
>> > > >
>> > > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
>> > > > https://github.com/apache/kafka/tree/2.1.0-rc0
>> > > >
>> > > > * Documentation:
>> > > > *http://kafka.apache.org/21/documentation.html*
>> > > > <http://kafka.apache.org/21/documentation.html>
>> > > >
>> > > > * Protocol:
>> > > > http://kafka.apache.org/21/protocol.html
>> > > >
>> > > > * Successful Jenkins builds for the 2.1 branch:
>> > > > Unit/integration tests: *
>> > > https://builds.apache.org/job/kafka-2.1-jdk8/38/
>> > > > <https://builds.apache.org/job/kafka-2.1-jdk8/38/>*
>> > > >
>> > > > Please test and verify the release artifacts and submit a vote for
>> this
>> > > RC,
>> > > > or report any issues so we can fix them and get a new RC out ASAP.
>> > > Although
>> > > > this release vote requires PMC votes to pass, testing, votes, and
>> bug
>> > > > reports are valuable and appreciated from everyone.
>> > > >
>> > > > Cheers,
>> > > > Dong
>> > > >
>> > >
>> >
>>
>


Re: [VOTE] 2.1.0 RC0

2018-11-09 Thread Dong Lin
Hey Mani,

Thanks for catching these! Since kafka-site (
https://github.com/apache/kafka-site) can be updated separately of the
kafka release, I will file ticket and fix these later.

Thanks,
Dong

On Fri, Oct 26, 2018 at 6:58 AM Manikumar  wrote:

> minor observation: config sections are empty in the documentation page.
> http://kafka.apache.org/21/documentation.html#producerconfigs
>
> On Wed, Oct 24, 2018 at 10:49 PM Ted Yu  wrote:
>
> > +1
> >
> > InternalTopicIntegrationTest failed during test suite run but passed with
> > rerun.
> >
> > On Wed, Oct 24, 2018 at 3:48 AM Andras Beni  > .invalid>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Verified signatures and checksums of release artifacts
> > > Performed quickstart steps on rc artifacts (both scala 2.11 and 2.12)
> and
> > > one built from tag 2.1.0-rc0
> > >
> > > Andras
> > >
> > > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the first candidate for feature release of Apache Kafka
> 2.1.0.
> > > >
> > > > This is a major version release of Apache Kafka. It includes 28 new
> > KIPs
> > > > and
> > > >
> > > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > > > details:
> > > >
> > > > *
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > > >
> > > >
> > > > Here are a few notable highlights:
> > > >
> > > > - Java 11 support
> > > > - Support for Zstandard, which achieves compression comparable to
> gzip
> > > with
> > > > higher compression and especially decompression speeds(KIP-110)
> > > > - Avoid expiring committed offsets for active consumer group
> (KIP-211)
> > > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > > - Kafka's replication protocol now supports improved fencing of
> > zombies.
> > > > Previously, under certain rare conditions, if a broker became
> > partitioned
> > > > from Zookeeper but not the rest of the cluster, then the logs of
> > > replicated
> > > > partitions could diverge and cause data loss in the worst case
> > (KIP-320)
> > > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> > KIP-356)
> > > > - Admin script and admin client API improvements to simplify admin
> > > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > > - DNS handling improvements (KIP-235, KIP-302)
> > > >
> > > > Release notes for the 2.1.0 release:
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote ***
> > > >
> > > > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> > > > https://github.com/apache/kafka/tree/2.1.0-rc0
> > > >
> > > > * Documentation:
> > > > *http://kafka.apache.org/21/documentation.html*
> > > > <http://kafka.apache.org/21/documentation.html>
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/21/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 2.1 branch:
> > > > Unit/integration tests: *
> > > https://builds.apache.org/job/kafka-2.1-jdk8/38/
> > > > <https://builds.apache.org/job/kafka-2.1-jdk8/38/>*
> > > >
> > > > Please test and verify the release artifacts and submit a vote for
> this
> > > RC,
> > > > or report any issues so we can fix them and get a new RC out ASAP.
> > > Although
> > > > this release vote requires PMC votes to pass, testing, votes, and bug
> > > > reports are valuable and appreciated from everyone.
> > > >
> > > > Cheers,
> > > > Dong
> > > >
> > >
> >
>


Re: [VOTE] 2.1.0 RC0

2018-11-09 Thread Dong Lin
Hey Eno,

Thanks for the test! Yes https://builds.apache.org/job/kafka-2.1-jdk8 also
shows flaky test. It does not block the release. I will create ticket for
them and we will fix these one by one later.

Thanks,
Dong

On Tue, Oct 30, 2018 at 6:22 AM Eno Thereska  wrote:

> 2 tests failed for me and 4 were skipped when doing ./gradlew test:
>
> Failed tests:
> DeleteTopicTest. testAddPartitionDuringDeleteTopic
> SaslOAuthBearerSslEndToEndAuthorizationTest.
> testNoConsumeWithDescribeAclViaSubscribe
>
> Ignored tests:
> ConsumerBounceTest. testConsumptionWithBrokerFailures
> UncleanLeaderElectionTest. testCleanLeaderElectionDisabledByTopicOverride
> UncleanLeaderElectionTest. testUncleanLeaderElectionDisabled
> DynamicBrokerReconfigurationTest. testAddRemoveSslListener
>
> Thanks
> Eno
>
> On Mon, Oct 29, 2018 at 8:49 AM Magnus Edenhill 
> wrote:
>
> > +1 (non-binding)
> >
> > passes librdkafka integration test suite
> >
> > Den fre 26 okt. 2018 kl 15:58 skrev Manikumar  >:
> >
> > > minor observation: config sections are empty in the documentation page.
> > > http://kafka.apache.org/21/documentation.html#producerconfigs
> > >
> > > On Wed, Oct 24, 2018 at 10:49 PM Ted Yu  wrote:
> > >
> > > > +1
> > > >
> > > > InternalTopicIntegrationTest failed during test suite run but passed
> > with
> > > > rerun.
> > > >
> > > > On Wed, Oct 24, 2018 at 3:48 AM Andras Beni  > > > .invalid>
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Verified signatures and checksums of release artifacts
> > > > > Performed quickstart steps on rc artifacts (both scala 2.11 and
> 2.12)
> > > and
> > > > > one built from tag 2.1.0-rc0
> > > > >
> > > > > Andras
> > > > >
> > > > > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin 
> > wrote:
> > > > >
> > > > > > Hello Kafka users, developers and client-developers,
> > > > > >
> > > > > > This is the first candidate for feature release of Apache Kafka
> > > 2.1.0.
> > > > > >
> > > > > > This is a major version release of Apache Kafka. It includes 28
> new
> > > > KIPs
> > > > > > and
> > > > > >
> > > > > > critical bug fixes. Please see the Kafka 2.1.0 release plan for
> > more
> > > > > > details:
> > > > > >
> > > > > > *
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > > > > >
> > > > > >
> > > > > > Here are a few notable highlights:
> > > > > >
> > > > > > - Java 11 support
> > > > > > - Support for Zstandard, which achieves compression comparable to
> > > gzip
> > > > > with
> > > > > > higher compression and especially decompression speeds(KIP-110)
> > > > > > - Avoid expiring committed offsets for active consumer group
> > > (KIP-211)
> > > > > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > > > > - Kafka's replication protocol now supports improved fencing of
> > > > zombies.
> > > > > > Previously, under certain rare conditions, if a broker became
> > > > partitioned
> > > > > > from Zookeeper but not the rest of the cluster, then the logs of
> > > > > replicated
> > > > > > partitions could diverge and cause data loss in the worst case
> > > > (KIP-320)
> > > > > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> > > > KIP-356)
> > > > > > - Admin script and admin client API improvements to simplify
> admin
> > > > > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > > > > - DNS handling improvements (KIP-235, KIP-302)
> > > > > >
> > > > > > Release notes for the 2.1.0 release:
> > > > > >
> http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > > > &g

Re: [VOTE] 2.1.0 RC0

2018-11-09 Thread Dong Lin
Hey everyone,

Thanks for your help with testing the release! It has taken us some time to
validate the system tests for 2.1.0, fix flaky test and discuss/fix upgrade
note. All these issues are resolved now.

I have started another email thread to vote for 2.1.0 RC1. Please help test
and vote for RC1 when you get time. Hopefully we can officially release
Apache Kafka 2.1.0 next week.

Cheers,
Dong

On Tue, Nov 6, 2018 at 11:28 AM Dong Lin  wrote:

> Hey Satish,
>
> Yes! We will have another RC to include e.g.
> https://github.com/apache/kafka/pull/5857.
>
> Thanks,
> Dong
>
> On Mon, Nov 5, 2018 at 8:14 PM Satish Duggana 
> wrote:
>
>> Hi Dong,
>> Is there a RC1 planned with configs documentation fixes and
>> https://github.com/apache/kafka/pull/5857 ?
>>
>> Thanks,
>> Satish.
>> On Thu, Nov 1, 2018 at 4:05 PM Jakub Scholz  wrote:
>> >
>> > +1 (non-binding) ... I used the staged binaries and checked it with
>> > different clients.
>> >
>> > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:
>> >
>> > > Hello Kafka users, developers and client-developers,
>> > >
>> > > This is the first candidate for feature release of Apache Kafka 2.1.0.
>> > >
>> > > This is a major version release of Apache Kafka. It includes 28 new
>> KIPs
>> > > and
>> > >
>> > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
>> > > details:
>> > >
>> > > *
>> > >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
>> > > <
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> > > >
>> > >
>> > > Here are a few notable highlights:
>> > >
>> > > - Java 11 support
>> > > - Support for Zstandard, which achieves compression comparable to
>> gzip with
>> > > higher compression and especially decompression speeds(KIP-110)
>> > > - Avoid expiring committed offsets for active consumer group (KIP-211)
>> > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
>> > > - Kafka's replication protocol now supports improved fencing of
>> zombies.
>> > > Previously, under certain rare conditions, if a broker became
>> partitioned
>> > > from Zookeeper but not the rest of the cluster, then the logs of
>> replicated
>> > > partitions could diverge and cause data loss in the worst case
>> (KIP-320)
>> > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
>> KIP-356)
>> > > - Admin script and admin client API improvements to simplify admin
>> > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
>> > > - DNS handling improvements (KIP-235, KIP-302)
>> > >
>> > > Release notes for the 2.1.0 release:
>> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
>> > >
>> > > *** Please download, test and vote ***
>> > >
>> > > * Kafka's KEYS file containing PGP keys we use to sign the release:
>> > > http://kafka.apache.org/KEYS
>> > >
>> > > * Release artifacts to be voted upon (source and binary):
>> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
>> > >
>> > > * Maven artifacts to be voted upon:
>> > > https://repository.apache.org/content/groups/staging/
>> > >
>> > > * Javadoc:
>> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
>> > >
>> > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
>> > > https://github.com/apache/kafka/tree/2.1.0-rc0
>> > >
>> > > * Documentation:
>> > > *http://kafka.apache.org/21/documentation.html*
>> > > <http://kafka.apache.org/21/documentation.html>
>> > >
>> > > * Protocol:
>> > > http://kafka.apache.org/21/protocol.html
>> > >
>> > > * Successful Jenkins builds for the 2.1 branch:
>> > > Unit/integration tests: *
>> https://builds.apache.org/job/kafka-2.1-jdk8/38/
>> > > <https://builds.apache.org/job/kafka-2.1-jdk8/38/>*
>> > >
>> > > Please test and verify the release artifacts and submit a vote for
>> this RC,
>> > > or report any issues so we can fix them and get a new RC out ASAP.
>> Although
>> > > this release vote requires PMC votes to pass, testing, votes, and bug
>> > > reports are valuable and appreciated from everyone.
>> > >
>> > > Cheers,
>> > > Dong
>> > >
>>
>


[VOTE] 2.1.0 RC1

2018-11-09 Thread Dong Lin
Hello Kafka users, developers and client-developers,

This is the second candidate for feature release of Apache Kafka 2.1.0.

This is a major version release of Apache Kafka. It includes 28 new KIPs and

critical bug fixes. Please see the Kafka 2.1.0 release plan for more
details:

*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*


Here are a few notable highlights:

- Java 11 support
- Support for Zstandard, which achieves compression comparable to gzip with
higher compression and especially decompression speeds(KIP-110)
- Avoid expiring committed offsets for active consumer group (KIP-211)
- Provide Intuitive User Timeouts in The Producer (KIP-91)
- Kafka's replication protocol now supports improved fencing of zombies.
Previously, under certain rare conditions, if a broker became partitioned
from Zookeeper but not the rest of the cluster, then the logs of replicated
partitions could diverge and cause data loss in the worst case (KIP-320)
- Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
- Admin script and admin client API improvements to simplify admin
operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
- DNS handling improvements (KIP-235, KIP-302)

Release notes for the 2.1.0 release:
http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Thursday, Nov 15, 12 pm PT ***

* Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~lindong/kafka-2.1.0-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~lindong/kafka-2.1.0-rc1/javadoc/

* Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc1 tag:
https://github.com/apache/kafka/tree/2.1.0-rc1

* Documentation:
*http://kafka.apache.org/21/documentation.html*


* Protocol:
http://kafka.apache.org/21/protocol.html

* Successful Jenkins builds for the 2.1 branch:
Unit/integration tests: *https://builds.apache.org/job/kafka-2.1-jdk8/50/
*

Please test and verify the release artifacts and submit a vote for this RC,
or report any issues so we can fix them and get a new RC out ASAP. Although
this release vote requires PMC votes to pass, testing, votes, and bug
reports are valuable and appreciated from everyone.

Cheers,
Dong


Re: [VOTE] KIP-386: Standardize on Min/Avg/Max metrics' default values

2018-11-08 Thread Dong Lin
Thanks for the KIP Stanislav. +1 (binding)

On Thu, Nov 8, 2018 at 2:40 PM Jun Rao  wrote:

> Hi, Stanislav,
>
> Thanks for the KIP. +1. I guess this only covers the Kafka metrics (not the
> Yammer metrics). It would be useful to make this clear.
>
> Jun
>
> On Tue, Nov 6, 2018 at 1:00 AM, Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > Hey everybody,
> >
> > I'm starting a vote thread on KIP-386: Standardize on Min/Avg/Max
> metrics'
> > default values
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652345
> > >
> > In short, after the discussion thread
> >  > 4b9014fbb50f663bf14e5aec67@%3Cdev.kafka.apache.org%3E>,
> > we decided to have all min/avg/max metrics output `NaN` as a default
> value.
> >
> > --
> > Best,
> > Stanislav
> >
>


[jira] [Resolved] (KAFKA-6262) KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field

2018-11-07 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6262.
-
Resolution: Duplicate

The design in this Jira has been moved to KIP-320.

> KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field
> -
>
> Key: KAFKA-6262
> URL: https://issues.apache.org/jira/browse/KAFKA-6262
> Project: Kafka
>  Issue Type: Bug
>        Reporter: Dong Lin
>    Assignee: Dong Lin
>Priority: Major
>
> Currently the following sequence of events may happen that cause consumer to 
> rewind back to the earliest offset even if there is no log truncation in 
> Kafka. This can be a problem for MM by forcing MM to lag behind significantly 
> and duplicate a large amount of data.
> - Say there are three brokers 1,2,3 for a given partition P. Broker 1 is the 
> leader. Initially they are all in ISR. HW and LEO are both 10.
> - SRE does controlled shutdown for broker 1. Controller sends 
> LeaderAndIsrRequest to all three brokers so that leader = broker 2 and 
> isr_set = [broker 2, broker 3].
> - Broker 2 and 3 receives and processes LeaderAndIsrRequest almost 
> instantaneously. Now broker 2 and broker 3 can accept ProduceRequest and 
> FetchRequest for the partition P. 
> However, broker 1 has not processed this LeaderAndIsrRequest due to backlog 
> in its request queue. So broker 1 still think it is leader for the partition 
> P.
> - Because there is leadership movement, a consumer receives 
> NotLeaderForPartitionException, which triggers this consumer to send 
> MetadataRequest to a randomly selected broker, say broker 2. Broker 2 tells 
> consumer that itself is the leader for partition P. Consumer fetches date of 
> partition P from broker 2. The latest data has offset 20.
> - Later this consumer receives NotLeaderForPartitionException for another 
> partition. It sends MetadataRequest to a randomly selected broker again. This 
> time it sends MetadataRequest to broker 1, which tells the consumer that 
> itself is the leader for partition P.
> - This consumer issues FetchRequest for the partition P at offset 21. Broker 
> 1 returns OffsetOutOfRangeExeption because it thinks the LogEndOffset for 
> this partition is 10.
> There are two possible solutions for this problem. The long term solution is 
> probably to include version in the MetadataResponse so that consumer knows 
> whether the medata is outdated. This requires a KIP.
> The short term solution, which should solve the problem in most cases, is to 
> let consumer keep fetching metadata from the same (initially randomly picked) 
> broker until the connection to this broker is disconnected. The metadata 
> version will not go back in time if consumer keeps fetching metadata from the 
> same broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7603) Producer should negotiate message format version with broker

2018-11-07 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7603:
---

 Summary: Producer should negotiate message format version with 
broker
 Key: KAFKA-7603
 URL: https://issues.apache.org/jira/browse/KAFKA-7603
 Project: Kafka
  Issue Type: Improvement
Reporter: Dong Lin


Currently Producer will always send the record with the highest magic format 
version that is supported by both the produce and broker library regardless of 
log.message.format.version config in the broker.

This causes unnecessary message downconvert overhead if 
log.message.format.version has not been upgraded and producer/broker library 
has been upgraded. It is preferred for produce to produce message with format 
version no higher than the log.message.format.version configured in the broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-11-06 Thread Dong Lin
Hey Xiongqi,

Thanks for the update. A few more comments below

1) According to the definition of
kafka.log:type=LogCleaner,name=max-compaction-delay, it seems that the
metric value will be a large negative number if max.compaction.lag.ms is
MAX_LONG. Would this be a problem? Also, it seems weird that the value of
the metric is defined w.r.t. how often the log cleaner is run.

2) Not sure if we need the metric num-logs-compacted-by-max-compaction-lag
in addition to max-compaction-delay. It seems that operator can just use
max-compaction-delay to determine whether the max.compaction.lag is
properly enforced in a quantitative manner. Also, the metric name
`num-logs-compacted-by-max-compaction-lag` is inconsistent with its
intended meaning, i.e. the number of logs that needs to be compacted due to
max.compaction.lag but not yet compacted. So it is probably simple to just
remove this metric.

3) The KIP currently says that "a message record has a guaranteed
upper-bound in time to become mandatory for compaction". The word
"guarantee" may be misleading because the message may still not be
compacted within max.compaction.lag after its creation. Could you clarify
the exact semantics of the max.compaction.lag.ms in the Public Interface
section?

4) The KIP's proposed change will estimate earliest message timestamp for
un-compacted log segments. Can you explain how broker determines whether a
segment has been compacted after the broker is restarted?

5) 2.b in Proposed Change section provides two way to get timestamp. To
make the KIP easier to read for future reference, could we just mention the
method that we plan to use and move the other solution to the rejected
alternative section?

6) Based on the discussion (i.e. point 2 in the previous email), it is said
that we can assume all messages have timestamp and the feature added in
this KIP can be skipped for those messages which do not have timestamp. So
do we still need to use "segment.largestTimestamp - maxSegmentMs" in
Proposed Change section 2.a?

7) Based on the discussion (i.e. point 8 in the previous email), if this
KIP requires user to monitor certain existing metrics for performance
impact added in this KIP, can we list the metrics in the KIP for user's
convenience?


Thanks,
Dong

On Mon, Oct 29, 2018 at 3:16 PM xiongqi wu  wrote:

> Hi Dong,
> I have updated the KIP to address your comments.
> One correction to previous Email:
> after offline discussion with Dong,  we decide to use MAX_LONG as default
> value for max.compaction.lag.ms.
>
>
> Xiongqi (Wesley) Wu
>
>
> On Mon, Oct 29, 2018 at 12:15 PM xiongqi wu  wrote:
>
> > Hi Dong,
> >
> > Thank you for your comment.  See my inline comments.
> > I will update the KIP shortly.
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Sun, Oct 28, 2018 at 9:17 PM Dong Lin  wrote:
> >
> >> Hey Xiongqi,
> >>
> >> Sorry for late reply. I have some comments below:
> >>
> >> 1) As discussed earlier in the email list, if the topic is configured
> with
> >> both deletion and compaction, in some cases messages produced a long
> time
> >> ago can not be deleted based on time. This is a valid use-case because
> we
> >> actually have topic which is configured with both deletion and
> compaction
> >> policy. And we should enforce the semantics for both policy. Solution A
> >> sounds good. We do not need interface change (e.g. extra config) to
> >> enforce
> >> solution A. All we need is to update implementation so that when broker
> >> compacts a topic, if the message has timestamp (which is the common
> case),
> >> messages that are too old (based on the time-based retention config)
> will
> >> be discarded. Since this is a valid issue and it is also related to the
> >> guarantee of when a message can be deleted, can we include the solution
> of
> >> this problem in the KIP?
> >>
> > ==  This makes sense.  We can use similar approach to increase the
> log
> > start offset.
> >
> >>
> >> 2) It is probably OK to assume that all messages have timestamp. The
> >> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and
> >> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer
> >> supported. Also, since the use-case for this feature is primarily for
> >> GDPR,
> >> we can assume that client library has already been upgraded to support
> >> SSL,
> >> which feature is added after KIP-31 and KIP-32.
> >>
> >>  =>  Ok. We can use message timestamp to delete expired records
> > if both compaction and retention are enabled.
> >
> >
> > 3) In Pro

[jira] [Resolved] (KAFKA-7313) StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7313.
-
Resolution: Fixed

> StopReplicaRequest should attempt to remove future replica for the partition 
> only if future replica exists
> --
>
> Key: KAFKA-7313
> URL: https://issues.apache.org/jira/browse/KAFKA-7313
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.1, 2.1.0
>
>
> This patch fixes two issues:
> 1) Currently if a broker received StopReplicaRequest with delete=true for the 
> same offline replica, the first StopRelicaRequest will show 
> KafkaStorageException and the second StopRelicaRequest will show 
> ReplicaNotAvailableException. This is because the first StopRelicaRequest 
> will remove the mapping (tp -> ReplicaManager.OfflinePartition) from 
> ReplicaManager.allPartitions before returning KafkaStorageException, thus the 
> second StopRelicaRequest will not find this partition as offline.
> This result appears to be inconsistent. And since the replica is already 
> offline and broker will not be able to delete file for this replica, the 
> StopReplicaRequest should fail without making any change and broker should 
> still remember that this replica is offline. 
> 2) Currently if broker receives StopReplicaRequest with delete=true, the 
> broker will attempt to remove future replica for the partition, which will 
> cause KafkaStorageException in the StopReplicaResponse if this replica does 
> not have future replica. It is problematic to always return 
> KafkaStorageException in the response if future replica does not exist.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-7481:
-

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7481.
-
Resolution: Fixed

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7559) ConnectStandaloneFileTest system tests do not pass

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7559.
-
Resolution: Fixed

> ConnectStandaloneFileTest system tests do not pass
> --
>
> Key: KAFKA-7559
> URL: https://issues.apache.org/jira/browse/KAFKA-7559
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Stanislav Kozlovski
>Assignee: Randall Hauch
>Priority: Major
> Fix For: 2.0.1, 2.1.0
>
>
> Both tests `test_skip_and_log_to_dlq` and `test_file_source_and_sink` under 
> `kafkatest.tests.connect.connect_test.ConnectStandaloneFileTest` fail with 
> error messages similar to:
> "TimeoutError: Kafka Connect failed to start on node: ducker@ducker04 in 
> condition mode: LISTEN"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] 2.1.0 RC0

2018-11-06 Thread Dong Lin
Hey Satish,

Yes! We will have another RC to include e.g.
https://github.com/apache/kafka/pull/5857.

Thanks,
Dong

On Mon, Nov 5, 2018 at 8:14 PM Satish Duggana 
wrote:

> Hi Dong,
> Is there a RC1 planned with configs documentation fixes and
> https://github.com/apache/kafka/pull/5857 ?
>
> Thanks,
> Satish.
> On Thu, Nov 1, 2018 at 4:05 PM Jakub Scholz  wrote:
> >
> > +1 (non-binding) ... I used the staged binaries and checked it with
> > different clients.
> >
> > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for feature release of Apache Kafka 2.1.0.
> > >
> > > This is a major version release of Apache Kafka. It includes 28 new
> KIPs
> > > and
> > >
> > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > > details:
> > >
> > > *
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > >
> > >
> > > Here are a few notable highlights:
> > >
> > > - Java 11 support
> > > - Support for Zstandard, which achieves compression comparable to gzip
> with
> > > higher compression and especially decompression speeds(KIP-110)
> > > - Avoid expiring committed offsets for active consumer group (KIP-211)
> > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > - Kafka's replication protocol now supports improved fencing of
> zombies.
> > > Previously, under certain rare conditions, if a broker became
> partitioned
> > > from Zookeeper but not the rest of the cluster, then the logs of
> replicated
> > > partitions could diverge and cause data loss in the worst case
> (KIP-320)
> > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> KIP-356)
> > > - Admin script and admin client API improvements to simplify admin
> > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > - DNS handling improvements (KIP-235, KIP-302)
> > >
> > > Release notes for the 2.1.0 release:
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote ***
> > >
> > > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> > > https://github.com/apache/kafka/tree/2.1.0-rc0
> > >
> > > * Documentation:
> > > *http://kafka.apache.org/21/documentation.html*
> > > <http://kafka.apache.org/21/documentation.html>
> > >
> > > * Protocol:
> > > http://kafka.apache.org/21/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.1 branch:
> > > Unit/integration tests: *
> https://builds.apache.org/job/kafka-2.1-jdk8/38/
> > > <https://builds.apache.org/job/kafka-2.1-jdk8/38/>*
> > >
> > > Please test and verify the release artifacts and submit a vote for
> this RC,
> > > or report any issues so we can fix them and get a new RC out ASAP.
> Although
> > > this release vote requires PMC votes to pass, testing, votes, and bug
> > > reports are valuable and appreciated from everyone.
> > >
> > > Cheers,
> > > Dong
> > >
>


Re: [DISCUSS] KIP-370: Remove Orphan Partitions

2018-10-28 Thread Dong Lin
Hey Xiongqi,

Thanks for the KIP. Here are some comments:

1) KIP provides two motivation for the timeout/correction phase. One
motivation is to handle outdated requests. Would this still be an issue
after KIP-380? The second motivation seems to be mainly for performance
optimization when there is reassignment. In general we expect data movement
when we reassign partitions to new brokers. So this is probably not a
strong reason for adding a new config.

2) The KIP says "Adding metrics to keep track of the number of orphan
partitions and the size of these orphan partitions". Can you add the
specification of these new metrics? Here are example doc in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
.

Thanks,
Dong

On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu  wrote:

> Colin,
>
> Thanks for the comment.
> 1)
> auto.orphan.partition.removal.delay.ms refers to timeout since the first
> leader and ISR request was received.  The idea is we want to wait enough
> time to receive up-to-dated leaderandISR request and any old or new
> partitions reassignment requests.
>
> 2)
> Is there any logic to remove the partition folders on disk?  I can only
> find references to removing older log segments, but not the folder, in the
> KIP.
> ==> yes, the plan is to remove partition folders as well.
>
> I will update the KIP to make it more clear.
>
>
> Xiongqi (Wesley) Wu
>
>
> On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe  wrote:
>
> > Hi Xiongqi,
> >
> > Thanks for the KIP.
> >
> > Can you be a bit more clear what the timeout
> > auto.orphan.partition.removal.delay.ms refers to?  Is the timeout
> > measured since the partition was supposed to be on the broker?  Or is the
> > timeout measured since the broker started up?
> >
> > Is there any logic to remove the partition folders on disk?  I can only
> > find references to removing older log segments, but not the folder, in
> the
> > KIP.
> >
> > best,
> > Colin
> >
> > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote:
> > > Any comments?
> > >
> > > Xiongqi (Wesley) Wu
> > >
> > >
> > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu 
> wrote:
> > >
> > > > Here is the implementation for the KIP 370.
> > > >
> > > >
> > > >
> >
> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9
> > > >
> > > >
> > > > The purpose is to do one time cleanup (after a configured delay) of
> > orphan
> > > > partitions when a broker starts up.
> > > >
> > > >
> > > > Xiongqi (Wesley) Wu
> > > >
> > > >
> > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu 
> > wrote:
> > > >
> > > >>
> > > >> This KIP enables broker to remove orphan partitions automatically.
> > > >>
> > > >>
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions
> > > >>
> > > >>
> > > >> Xiongqi (Wesley) Wu
> > > >>
> > > >
> >
>


Re: [VOTE] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-28 Thread Dong Lin
Thanks for the updated KIP.

+1 (binding)

On Wed, Oct 24, 2018 at 4:52 PM Patrick Huang  wrote:

> Hi Jun,
>
> Sure. I already updated the KIP. Thanks!
>
> Best,
> Zhanxiang (Patrick) Huang
>
> 
> From: Jun Rao 
> Sent: Wednesday, October 24, 2018 14:17
> To: dev
> Subject: Re: [VOTE] KIP-380: Detect outdated control requests and bounced
> brokers using broker generation
>
> Hi, Patrick,
>
> Could you update the KIP with the changes to ControlledShutdownRequest
> based on the discussion thread?
>
> Thanks,
>
> Jun
>
>
> On Sun, Oct 21, 2018 at 2:25 PM, Mickael Maison 
> wrote:
>
> > +1( non-binding)
> > Thanks for the KIP!
> >
> > On Sun, Oct 21, 2018, 03:31 Harsha Chintalapani  wrote:
> >
> > > +1(binding). LGTM.
> > > -Harsha
> > > On Oct 20, 2018, 4:49 PM -0700, Dong Lin , wrote:
> > > > Thanks much for the KIP Patrick. Looks pretty good.
> > > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Oct 19, 2018 at 10:17 AM Patrick Huang 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to call for a vote on KIP-380:
> > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+
> > generation
> > > > >
> > > > > Here is the discussion thread:
> > > > >
> > > > >
> > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3E
> > > > > KIP-380
> > > > > <
> > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3EKIP-380
> > > >:
> > > > > Detect outdated control requests and bounced ...<
> > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+
> > generation
> > > > > >
> > > > > Note: Normalizing the schema is a good-to-have optimization because
> > the
> > > > > memory footprint for the control requests hinders the controller
> from
> > > > > scaling up if we have many topics with large partition counts.
> > > > > cwiki.apache.org
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Zhanxiang (Patrick) Huang
> > > > >
> > >
> >
>


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-10-28 Thread Dong Lin
mum time.
> > > > > > >>
> > > > > > >> One popular reason that maximum time for deletion is desirable
> > > right
> > > > > now
> > > > > > >> is
> > > > > > >> GDPR with
> > > > > > >> PII. But we're not proposing any GDPR awareness in kafka, just
> > > being
> > > > > > able
> > > > > > >> to guarantee
> > > > > > >> a max time where a tombstoned key will be removed from the
> > > compacted
> > > > > > >> topic.
> > > > > > >>
> > > > > > >> on 2)
> > > > > > >> huh, i thought it kept track of the first dirty segment and
> > didn't
> > > > > > >> recompact older "clean" ones.
> > > > > > >> But I didn't look at code or test for that.
> > > > > > >>
> > > > > > >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu <
> > xiongq...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > 1, Owner of data (in this sense, kafka is the not the owner
> of
> > > data)
> > > > > > >> > should keep track of lifecycle of the data in some external
> > > > > > storage/DB.
> > > > > > >> > The owner determines when to delete the data and send the
> > delete
> > > > > > >> request to
> > > > > > >> > kafka. Kafka doesn't know about the content of data but to
> > > provide a
> > > > > > >> mean
> > > > > > >> > for deletion.
> > > > > > >> >
> > > > > > >> > 2 , each time compaction runs, it will start from first
> > > segments (no
> > > > > > >> > matter if it is compacted or not). The time estimation here
> is
> > > only
> > > > > > used
> > > > > > >> > to determine whether we should run compaction on this log
> > > partition.
> > > > > > So
> > > > > > >> we
> > > > > > >> > only need to estimate uncompacted segments.
> > > > > > >> >
> > > > > > >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin <
> > lindon...@gmail.com>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hey Xiongqi,
> > > > > > >> > >
> > > > > > >> > > Thanks for the update. I have two questions for the latest
> > > KIP.
> > > > > > >> > >
> > > > > > >> > > 1) The motivation section says that one use case is to
> > delete
> > > PII
> > > > > > >> > (Personal
> > > > > > >> > > Identifiable information) data within 7 days while keeping
> > > non-PII
> > > > > > >> > > indefinitely in compacted format. I suppose the use-case
> > > depends
> > > > > on
> > > > > > >> the
> > > > > > >> > > application to determine when to delete those PII data.
> > Could
> > > you
> > > > > > >> explain
> > > > > > >> > > how can application reliably determine the set of keys
> that
> > > should
> > > > > > be
> > > > > > >> > > deleted? Is application required to always messages from
> the
> > > topic
> > > > > > >> after
> > > > > > >> > > every restart and determine the keys to be deleted by
> > looking
> > > at
> > > > > > >> message
> > > > > > >> > > timestamp, or is application supposed to persist the key->
> > > > > timstamp
> > > > > > >> > > information in a separate persistent storage system?
> > > > > > >> > >
> > > > > > >> > > 2) It is mentioned in the KIP that "we only need to
> estimate
> > > > > > earliest
> > > > > > >> > > message timestamp for un-compacted log segments because
> the

[VOTE] 2.1.0 RC0

2018-10-24 Thread Dong Lin
Hello Kafka users, developers and client-developers,

This is the first candidate for feature release of Apache Kafka 2.1.0.

This is a major version release of Apache Kafka. It includes 28 new KIPs and

critical bug fixes. Please see the Kafka 2.1.0 release plan for more
details:

*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*


Here are a few notable highlights:

- Java 11 support
- Support for Zstandard, which achieves compression comparable to gzip with
higher compression and especially decompression speeds(KIP-110)
- Avoid expiring committed offsets for active consumer group (KIP-211)
- Provide Intuitive User Timeouts in The Producer (KIP-91)
- Kafka's replication protocol now supports improved fencing of zombies.
Previously, under certain rare conditions, if a broker became partitioned
from Zookeeper but not the rest of the cluster, then the logs of replicated
partitions could diverge and cause data loss in the worst case (KIP-320)
- Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
- Admin script and admin client API improvements to simplify admin
operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
- DNS handling improvements (KIP-235, KIP-302)

Release notes for the 2.1.0 release:
http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html

*** Please download, test and vote ***

* Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~lindong/kafka-2.1.0-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/

* Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
https://github.com/apache/kafka/tree/2.1.0-rc0

* Documentation:
*http://kafka.apache.org/21/documentation.html*


* Protocol:
http://kafka.apache.org/21/protocol.html

* Successful Jenkins builds for the 2.1 branch:
Unit/integration tests: *https://builds.apache.org/job/kafka-2.1-jdk8/38/
*

Please test and verify the release artifacts and submit a vote for this RC,
or report any issues so we can fix them and get a new RC out ASAP. Although
this release vote requires PMC votes to pass, testing, votes, and bug
reports are valuable and appreciated from everyone.

Cheers,
Dong


Re: [VOTE] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-20 Thread Dong Lin
Thanks much for the KIP Patrick. Looks pretty good.

+1 (binding)

On Fri, Oct 19, 2018 at 10:17 AM Patrick Huang  wrote:

> Hi All,
>
> I would like to call for a vote on KIP-380:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+generation
>
> Here is the discussion thread:
>
> https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14bf8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3E
> KIP-380
> :
> Detect outdated control requests and bounced ...<
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+generation
> >
> Note: Normalizing the schema is a good-to-have optimization because the
> memory footprint for the control requests hinders the controller from
> scaling up if we have many topics with large partition counts.
> cwiki.apache.org
>
>
>
> Thanks,
> Zhanxiang (Patrick) Huang
>


[jira] [Resolved] (KAFKA-7464) Fail to shutdown ReplicaManager during broker cleaned shutdown

2018-10-18 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7464.
-
Resolution: Fixed

> Fail to shutdown ReplicaManager during broker cleaned shutdown
> --
>
> Key: KAFKA-7464
> URL: https://issues.apache.org/jira/browse/KAFKA-7464
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Zhanxiang (Patrick) Huang
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Critical
> Fix For: 2.1.0
>
>
> In 2.0 deployment, we saw the following log when shutting down the 
> ReplicaManager in broker cleaned shutdown:
> {noformat}
> 2018/09/27 08:22:18.699 WARN [CoreUtils$] [Thread-1] [kafka-server] [] null
> java.lang.IllegalArgumentException: null
> at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_121]
> at sun.nio.ch.IOUtil.write(IOUtil.java:68) ~[?:1.8.0_121]
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) 
> ~[?:1.8.0_121]
> at 
> org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:214)
>  ~[kafka-clients-2.0.0.22.jar:?]
> at 
> org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:164)
>  ~[kafka-clients-2.0.0.22.jar:?]
> at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:806) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at 
> org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:107) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at 
> org.apache.kafka.common.network.Selector.doClose(Selector.java:751) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at org.apache.kafka.common.network.Selector.close(Selector.java:739) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at org.apache.kafka.common.network.Selector.close(Selector.java:701) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at org.apache.kafka.common.network.Selector.close(Selector.java:315) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at 
> org.apache.kafka.clients.NetworkClient.close(NetworkClient.java:595) 
> ~[kafka-clients-2.0.0.22.jar:?]
> at 
> kafka.server.ReplicaFetcherBlockingSend.close(ReplicaFetcherBlockingSend.scala:107)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> kafka.server.ReplicaFetcherThread.initiateShutdown(ReplicaFetcherThread.scala:108)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:183)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> kafka.server.AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:182)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>  ~[scala-library-2.11.12.jar:?]
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) 
> ~[scala-library-2.11.12.jar:?]
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) 
> ~[scala-library-2.11.12.jar:?]
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) 
> ~[scala-library-2.11.12.jar:?]
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) 
> ~[scala-library-2.11.12.jar:?]
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) 
> ~[scala-library-2.11.12.jar:?]
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>  ~[scala-library-2.11.12.jar:?]
> at 
> kafka.server.AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:182)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> kafka.server.ReplicaFetcherManager.shutdown(ReplicaFetcherManager.scala:37) 
> ~[kafka_2.11-2.0.0.22.jar:?]
> at kafka.server.ReplicaManager.shutdown(ReplicaManager.scala:1471) 
> ~[kafka_2.11-2.0.0.22.jar:?]
> at 
> kafka.server.KafkaServer$$anonfun$shutdown$12.apply$mcV$sp(KafkaServer.scala:616)
>  ~[kafka_2.11-2.0.0.22.jar:?]
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86) 
> ~[kafka_2.11-2.0.0.22.jar:?]
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:616) 
> ~[kafka_2.11-2.0.0.22.jar:?]
> {noformat}
> After that, we noticed that some of the replica fetcher thread fail to 
> shutdown:
> {noformat}
> 2018/09/27 08:22:46.176 ERROR [LogDirFailureChannel] 
> [ReplicaFetcherThread-26-13085] [kafka-server] [] Error while rolling log 
> segment for 

Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-10-16 Thread Dong Lin
Hey everyone,

Thanks for all the contribution! Just a kind reminder that the code is now
frozen for 2.1.0 release.

Thanks,
Dong

On Mon, Oct 1, 2018 at 4:31 PM Dong Lin  wrote:

> Hey everyone,
>
> Hope things are going well!
>
> Just a kind reminder that the feature freeze time is end of day today.
> Major features (e.g. KIP implementation) need to be merged and minor
> features need to be have PR ready. After today I will move pending JIRAs to
> the next release as appropriate.
>
> Cheers,
> Dong
>
>
>
> On Wed, 26 Sep 2018 at 1:06 AM Dong Lin  wrote:
>
>> Hey Edoardo,
>>
>> Certainly, let's add it to this release since it has passed vote. KIP-81
>> was previously marked as WIP for 2.0.0 release. I updated to be WIP for
>> 2.1.0 release in
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>  and
>> added it to
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> .
>>
>> Thanks,
>> Dong
>>
>> On Tue, Sep 25, 2018 at 1:52 AM Edoardo Comar  wrote:
>>
>>> Hi Dong
>>> many thanks for driving the release!
>>>
>>> KIP-81 previously voted as adopted has a ready-to-review JIRA and PR.
>>> Shall we just amend the wiki ?
>>> --
>>> Edoardo Comar
>>> IBM Event Streams
>>> IBM UK Ltd, Hursley Park, SO21 2JN
>>>
>>>
>>>
>>>
>>> From:Dong Lin 
>>> To:dev , Users ,
>>> kafka-clients 
>>> Date:25/09/2018 08:17
>>> Subject:Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan
>>> --
>>>
>>>
>>>
>>> Hey everyone,
>>>
>>> According to the previously discussed schedule, I have updated
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>>> to include all the KIPs that have passed vote at this moment to be
>>> included
>>> in Apache Kafka 2.1.0 release. Other KIPs that have not passed vote will
>>> need to be included in the next feature release.
>>>
>>> Just a reminder that the feature freeze data is Oct 1, 2018. In order to
>>> be
>>> included in the release, major features need to be merged and minor
>>> features need to be have PR ready. Any feature not in this state will be
>>> automatically moved to the next release after Oct 1.
>>>
>>> Regards,
>>> Dong
>>>
>>> On Tue, Sep 25, 2018 at 12:02 AM Dong Lin  wrote:
>>>
>>> > cc us...@kafka.apache.org and kafka-clie...@googlegroups.com
>>> >
>>> > On Sun, Sep 9, 2018 at 5:31 PM Dong Lin  wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> I would like to be the release manager for our next time-based feature
>>> >> release 2.1.0.
>>> >>
>>> >> The recent Kafka release history can be found at
>>> >> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
>>> .
>>> >> The release plan (with open issues and planned KIPs) for 2.1.0 can be
>>> found
>>> >> at
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>>> >> .
>>> >>
>>> >> Here are the dates we have planned for Apache Kafka 2.1.0 release:
>>> >>
>>> >> 1) KIP Freeze: Sep 24, 2018.
>>> >> A KIP must be accepted by this date in order to be considered for this
>>> >> release)
>>> >>
>>> >> 2) Feature Freeze: Oct 1, 2018
>>> >> Major features merged & working on stabilization, minor features have
>>> PR,
>>> >> release branch cut; anything not in this state will be automatically
>>> moved
>>> >> to the next release in JIRA.
>>> >>
>>> >> 3) Code Freeze: Oct 15, 2018 (Tentatively)
>>> >>
>>> >> The KIP and feature freeze date is about 3-4 weeks from now. Please
>>> plan
>>> >> accordingly for the features you want push into Apache Kafka 2.1.0
>>> release.
>>> >>
>>> >>
>>> >> Cheers,
>>> >> Dong
>>> >>
>>> >
>>>
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>> 3AU
>>>
>>


Re: [ANNOUNCE] New Committer: Manikumar Reddy

2018-10-11 Thread Dong Lin
Congratulations Manikumar!!

On Thu, Oct 11, 2018 at 10:39 AM Jason Gustafson  wrote:

> Hi all,
>
> The PMC for Apache Kafka has invited Manikumar Reddy as a committer and we
> are
> pleased to announce that he has accepted!
>
> Manikumar has contributed 134 commits including significant work to add
> support for delegation tokens in Kafka:
>
> KIP-48:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka
> KIP-249
> 
> :
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
>
> He has broad experience working with many of the core components in Kafka
> and he has reviewed over 80 PRs. He has also made huge progress addressing
> some of our technical debt.
>
> We appreciate the contributions and we are looking forward to more.
> Congrats Manikumar!
>
> Jason, on behalf of the Apache Kafka PMC
>


Re: [DISCUSS] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-10 Thread Dong Lin
Hey Patrick,

Thanks much for the KIP. The KIP is very well written.

LGTM.  +1 (binding)

Thanks,
Dong


On Tue, Oct 9, 2018 at 11:46 PM Patrick Huang  wrote:

> Hi All,
>
> Please find the below KIP which proposes the concept of broker generation
> to resolve issues caused by controller missing broker state changes and
> broker processing outdated control requests.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-380%3A+Detect+outdated+control+requests+and+bounced+brokers+using+broker+generation
>
> All comments are appreciated.
>
> Best,
> Zhanxiang (Patrick) Huang
>


Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-10-09 Thread Dong Lin
Hey Lucas,

Thanks for the KIP. Looks good overall. +1

I have two trivial comments which may be a bit useful to reader.

- Can we include the default value for the new config in Public Interface
section? Typically the default value of the new config is an important part
of public interface and we usually specify it in the KIP's public interface
section.
- Can we change "whose default capacity is 20" to  "whose capacity is 20"
in the section "How are controller requests handled over the dedicated
connections"? The use of word "default" seems to suggest that this is
configurable.

Thanks,
Dong

On Mon, Jun 18, 2018 at 1:04 PM Lucas Wang  wrote:

> Hi All,
>
> I've addressed a couple of comments in the discussion thread for KIP-291,
> and
> got no objections after making the changes. Therefore I would like to start
> the voting thread.
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%3A+Have+separate+queues+for+control+requests+and+data+requests
>
> Thanks for your time!
> Lucas
>


[jira] [Created] (KAFKA-7484) Fix test SuppressionDurabilityIntegrationTest.shouldRecoverBufferAfterShutdown()

2018-10-05 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7484:
---

 Summary: Fix test 
SuppressionDurabilityIntegrationTest.shouldRecoverBufferAfterShutdown()
 Key: KAFKA-7484
 URL: https://issues.apache.org/jira/browse/KAFKA-7484
 Project: Kafka
  Issue Type: Improvement
Reporter: Dong Lin


The test 
SuppressionDurabilityIntegrationTest.shouldRecoverBufferAfterShutdown() fails 
in the 2.1.0 branch Jekin job. See 
[https://builds.apache.org/job/kafka-2.1-jdk8/1/testReport/junit/org.apache.kafka.streams.integration/SuppressionDurabilityIntegrationTest/shouldRecoverBufferAfterShutdown_1__eosEnabled_true_/.|https://builds.apache.org/job/kafka-2.1-jdk8/1/testReport/junit/org.apache.kafka.streams.integration/SuppressionDurabilityIntegrationTest/shouldRecoverBufferAfterShutdown_1__eosEnabled_true_/]

Here is the stack trace: 

java.lang.AssertionError: Condition not met within timeout 3. Did not 
receive all 3 records from topic output-raw-shouldRecoverBufferAfterShutdown at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:278) at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinRecordsReceived(IntegrationTestUtils.java:462)
 at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinRecordsReceived(IntegrationTestUtils.java:343)
 at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.verifyKeyValueTimestamps(IntegrationTestUtils.java:543)
 at 
org.apache.kafka.streams.integration.SuppressionDurabilityIntegrationTest.verifyOutput(SuppressionDurabilityIntegrationTest.java:239)
 at 
org.apache.kafka.streams.integration.SuppressionDurabilityIntegrationTest.shouldRecoverBufferAfterShutdown(SuppressionDurabilityIntegrationTest.java:206)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
org.junit.runners.Suite.runChild(Suite.java:128) at 
org.junit.runners.Suite.runChild(Suite.java:27) at 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at 
org.junit.rules.RunRules.evaluate(RunRules.java:20) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


New release branch 2.1.0

2018-10-04 Thread Dong Lin
Hello Kafka developers and users,

As promised in the release plan
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044,
we now have a release branch for 2.1.0 release. Trunk will soon be bumped
to 2.2.0-SNAPSHOT.

I'll be going over the JIRAs to move every non-blocker from this release to
the next release.

>From this point, most changes should go to trunk. Blockers (existing and
new that we discover while testing the release) will be double-committed.
Please discuss with your reviewer whether your PR should go to trunk or to
trunk+release so they can merge accordingly.

Please help us test the release!

Thanks!
Dong


[jira] [Resolved] (KAFKA-7441) Allow LogCleanerManager.resumeCleaning() to be used concurrently

2018-10-04 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7441.
-
Resolution: Fixed

> Allow LogCleanerManager.resumeCleaning() to be used concurrently
> 
>
> Key: KAFKA-7441
> URL: https://issues.apache.org/jira/browse/KAFKA-7441
> Project: Kafka
>  Issue Type: Improvement
>Reporter: xiongqi wu
>Assignee: xiongqi wu
>Priority: Blocker
> Fix For: 2.1.0
>
>
> LogCleanerManger provides APIs abortAndPauseCleaning(TopicPartition) and 
> resumeCleaning(Iterable[TopicPartition]). The abortAndPauseCleaning(...) will 
> do nothing if the partition is already in paused state. And 
> resumeCleaning(..) will always clear the state for the partition if the 
> partition is in paused state. Also, resumeCleaning(...) will throw 
> IllegalStateException if the partition does not have any state (e.g. its 
> state is cleared).
>  
> This will cause problem in the following scenario:
> 1) Background thread invokes LogManager.cleanupLogs() which in turn does  
> abortAndPauseCleaning(...) for a given partition. Now this partition is in 
> paused state.
> 2) User requests deletion for this partition. Controller sends 
> StopReplicaRequest with delete=true for this partition. RequestHanderThread 
> calls abortAndPauseCleaning(...) followed by resumeCleaning(...) for the same 
> partition. Now there is no state for this partition.
> 3) Background thread invokes resumeCleaning(...) as part of 
> LogManager.cleanupLogs(). Because there is no state for this partition, it 
> causes IllegalStateException.
>  
> This issue can also happen before KAFKA-7322 if unclean leader election 
> triggers log truncation for a partition at the same time that the partition 
> is deleted upon user request. But unclean leader election is very rare. The 
> fix made in https://issues.apache.org/jira/browse/KAFKA-7322 makes this issue 
> much more frequent.
> The solution is to record the number of pauses.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7096) Consumer should drop the data for unassigned topic partitions

2018-10-03 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7096.
-
Resolution: Fixed

> Consumer should drop the data for unassigned topic partitions
> -
>
> Key: KAFKA-7096
> URL: https://issues.apache.org/jira/browse/KAFKA-7096
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Mayuresh Gharat
>Assignee: Mayuresh Gharat
>Priority: Major
> Fix For: 2.2.0
>
>
> currently if a client has assigned topics : T1, T2, T3 and calls poll(), the 
> poll might fetch data for partitions for all 3 topics T1, T2, T3. Now if the 
> client unassigns some topics (for example T3) and calls poll() we still hold 
> the data (for T3) in the completedFetches queue until we actually reach the 
> buffered data for the unassigned Topics (T3 in our example) on subsequent 
> poll() calls, at which point we drop that data. This process of holding the 
> data is unnecessary.
> When a client creates a topic, it takes time for the broker to fetch ACLs for 
> the topic. But during this time, the client will issue fetchRequest for the 
> topic, it will get response for the partitions of this topic. The response 
> consist of TopicAuthorizationException for each of the partitions. This 
> response for each partition is wrapped with a completedFetch and added to the 
> completedFetches queue. Now when the client calls the next poll() it sees the 
> TopicAuthorizationException from the first buffered CompletedFetch. At this 
> point the client chooses to sleep for 1.5 min as a backoff (as per the 
> design), hoping that the Broker fetches the ACL from ACL store in the 
> meantime. Actually the Broker has already fetched the ACL by this time. When 
> the client calls poll() after the sleep, it again sees the 
> TopicAuthorizationException from the second completedFetch and it sleeps 
> again. So it takes (1.5 * 60 * partitions) seconds before the client can see 
> any data. With this patch, the client when it sees the first 
> TopicAuthorizationException, it can all assign(EmptySet), which will get rid 
> of the buffered completedFetches (those with TopicAuthorizationException) and 
> it can again call assign(TopicPartitions) before calling poll(). With this 
> patch we found that client was able to get the records as soon as the Broker 
> fetched the ACLs from ACL store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6045) All access to log should fail if log is closed

2018-10-03 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6045.
-
Resolution: Won't Fix

> All access to log should fail if log is closed
> --
>
> Key: KAFKA-6045
> URL: https://issues.apache.org/jira/browse/KAFKA-6045
> Project: Kafka
>  Issue Type: Bug
>        Reporter: Dong Lin
>Priority: Major
> Fix For: 2.1.0
>
>
> After log.close() or log.closeHandlers() is called for a given log, all uses 
> of the Log's API should fail with proper exception. For example, 
> log.appendAsLeader() should throw KafkaStorageException. APIs such as 
> Log.activeProducersWithLastSequence() should also fail but not necessarily 
> with KafkaStorageException, since the KafkaStorageException indicates failure 
> to access disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5950) AdminClient should retry based on returned error codes

2018-10-03 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5950.
-
Resolution: Fixed

Per comment in the PR, it appears that the issue has been addressed in 
KAFKA-6299 with PR [https://github.com/apache/kafka/pull/4295.] Please feel 
free to re-open if this is still an issue.

> AdminClient should retry based on returned error codes
> --
>
> Key: KAFKA-5950
> URL: https://issues.apache.org/jira/browse/KAFKA-5950
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Andrey Dyachkov
>Priority: Major
>  Labels: needs-kip
> Fix For: 2.2.0
>
>
> The AdminClient only retries if the request fails with a retriable error. If 
> a response is returned, then a retry is never attempted. This is inconsistent 
> with other clients that check the error codes in the response and retry for 
> each retriable error code.
> We should consider if it makes sense to adopt this behaviour in the 
> AdminClient so that users don't have to do it themselves.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5857) Excessive heap usage on controller node during reassignment

2018-10-03 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5857.
-
Resolution: Cannot Reproduce

In general we need heapdump to investigate the issue. Since we don't have 
heapdump when the issue happened and there is significant improvement in the 
heap size used by controller in 1.1.0 per Ismael's comment, it seems reasonable 
to just close this JIRA. We can re-open it if there is still the same issue 
with Kafka 1.1.0 or later and we have heapdump.

> Excessive heap usage on controller node during reassignment
> ---
>
> Key: KAFKA-5857
> URL: https://issues.apache.org/jira/browse/KAFKA-5857
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.11.0.0
> Environment: CentOs 7, Java 1.8
>Reporter: Raoufeh Hashemian
>Priority: Major
>  Labels: reliability
> Fix For: 2.1.0
>
> Attachments: CPU.png, disk_write_x.png, memory.png, 
> reassignment_plan.txt
>
>
> I was trying to expand our kafka cluster of 6 broker nodes to 12 broker 
> nodes. 
> Before expansion, we had a single topic with 960 partitions and a replication 
> factor of 3. So each node had 480 partitions. The size of data in each node 
> was 3TB . 
> To do the expansion, I submitted a partition reassignment plan (see attached 
> file for the current/new assignments). The plan was optimized to minimize 
> data movement and be rack aware. 
> When I submitted the plan, it took approximately 3 hours for moving data from 
> old to new nodes to complete. After that, it started deleting source 
> partitions (I say this based on the number of file descriptors) and 
> rebalancing leaders which has not been successful. Meanwhile, the heap usage 
> in the controller node started to go up with a large slope (along with long 
> GC times) and it took 5 hours for the controller to go out of memory and 
> another controller started to have the same behaviour for another 4 hours. At 
> this time the zookeeper ran out of disk and the service stopped.
> To recover from this condition:
> 1) Removed zk logs to free up disk and restarted all 3 zk nodes
> 2) Deleted /kafka/admin/reassign_partitions node from zk
> 3) Had to do unclean restarts of kafka service on oom controller nodes which 
> took 3 hours to complete  . After this stage there was still 676 under 
> replicated partitions.
> 4) Do a clean restart on all 12 broker nodes.
> After step 4 , number of under replicated nodes went to 0.
> So I was wondering if this memory footprint from controller is expected for 
> 1k partitions ? Did we do sth wrong or it is a bug?
> Attached are some resource usage graph during this 30 hours event and the 
> reassignment plan. I'll try to add log files as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5018) LogCleaner tests to verify behaviour of message format v2

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5018.
-
Resolution: Won't Do

> LogCleaner tests to verify behaviour of message format v2
> -
>
> Key: KAFKA-5018
> URL: https://issues.apache.org/jira/browse/KAFKA-5018
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Major
> Fix For: 2.1.0
>
>
> It would be good to add LogCleaner tests to verify the behaviour of fields 
> like baseOffset after compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KAFKA-5018) LogCleaner tests to verify behaviour of message format v2

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-5018:
-

> LogCleaner tests to verify behaviour of message format v2
> -
>
> Key: KAFKA-5018
> URL: https://issues.apache.org/jira/browse/KAFKA-5018
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Major
> Fix For: 2.2.0
>
>
> It would be good to add LogCleaner tests to verify the behaviour of fields 
> like baseOffset after compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6761) Reduce Kafka Streams Footprint

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6761.
-
Resolution: Fixed

> Reduce Kafka Streams Footprint
> --
>
> Key: KAFKA-6761
> URL: https://issues.apache.org/jira/browse/KAFKA-6761
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Major
> Fix For: 2.1.0
>
>
> The persistent storage footprint of a Kafka Streams application contains the 
> following aspects:
>  # The internal topics created on the Kafka cluster side.
>  # The materialized state stores on the Kafka Streams application instances 
> side.
> There have been some questions about reducing these footprints, especially 
> since many of them are not necessary. For example, there are redundant 
> internal topics, as well as unnecessary state stores that takes up space but 
> also affect performance. When people are pushing Streams to production with 
> high traffic, this issue would be more common and severe. Reducing the 
> footprint of Streams have clear benefits for reducing resource utilization of 
> Kafka Streams applications, and also not creating pressure on broker's 
> capacities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6438) NSEE while concurrently creating and deleting a topic

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6438.
-
Resolution: Won't Fix

> NSEE while concurrently creating and deleting a topic
> -
>
> Key: KAFKA-6438
> URL: https://issues.apache.org/jira/browse/KAFKA-6438
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
> Environment: kafka_2.11-1.0.0.jar
> OpenJDK Runtime Environment (build 1.8.0_102-b14), OpenJDK 64-Bit Server VM 
> (build 25.102-b14, mixed mode)
> CentOS Linux release 7.3.1611 (Core)
>Reporter: Adam Kotwasinski
>Priority: Major
>  Labels: reliability
>
> It appears that deleting a topic and creating it at the same time can cause 
> NSEE, what later results in a forced controller shutdown.
> Most probably topics are being created because consumers/producers are still 
> active (yes, this means the deletion is happening blindly).
> The main problem here (for me) is the controller switch, the data loss and 
> following unclean election is acceptable (as we admit to deleting blindly).
> Environment description:
> 20 kafka brokers
> 80k partitions (20k topics 4partitions each)
> 3 node ZK
> Incident:
> {code:java}
> [2018-01-09 11:19:05,912] INFO [Topic Deletion Manager 6], Partition deletion 
> callback for mytopic-2,mytopic-0,mytopic-1,mytopic-3 
> (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:06,237] INFO [Controller id=6] New leader and ISR for 
> partition mytopic-0 is {"leader":-1,"leader_epoch":1,"isr":[]} 
> (kafka.controller.KafkaController)
> [2018-01-09 11:19:06,412] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,218] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,304] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,383] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,510] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,661] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 12,9,10,11 for partition mytopic-3,mytopic-0,mytopic-1,mytopic-2 of 
> topic mytopic in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,728] INFO [Topic Deletion Manager 6], Deletion for 
> replicas 9,10,11 for partition mytopic-0,mytopic-1,mytopic-2 of topic mytopic 
> in progress (kafka.controller.TopicDeletionManager)
> [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] 
> Invoking state change to OfflinePartition for partitions 
> mytopic-2,mytopic-0,mytopic-1,mytopic-3 
> (kafka.controller.PartitionStateMachine)
> [2018-01-09 11:19:07,924] INFO [PartitionStateMachine controllerId=6] 
> Invoking state change to NonExistentPartition for partitions 
> mytopic-2,mytopic-0,mytopic-1,mytopic-3 
> (kafka.controller.PartitionStateMachine)
> [2018-01-09 11:19:08,592] INFO [Controller id=6] New topics: [Set(mytopic, 
> other, other2)], deleted topics: [Set()], new partition replica assignment 
> [Map(other-0 -> Vector(8), mytopic-2 -> Vector(6), mytopic-0 -> Vector(4), 
> other-2 -> Vector(10), mytopic-1 -> Vector(5), mytopic-3 -> Vector(7), 
> other-1 -> Vector(9), other-3 -> Vector(11))] 
> (kafka.controller.KafkaController)
> [2018-01-09 11:19:08,593] INFO [Controller id=6] New topic creation callback 
> for other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 
> (kafka.controller.KafkaController)
> [2018-01-09 11:19:08,596] INFO [Controller id=6] New partition creation 
> callback for 
> other-0,mytopic-2,mytopic-0,other-2,mytopic-1,mytopic-3,other-1,other-3 
> (kafka.controller.KafkaController)
> [2018-01-09 11:19:08,596] INFO [PartitionStateMachine controllerId=6] 
> Invoking state change to NewPartition for partitions 
> other-0,m

[jira] [Resolved] (KAFKA-6415) KafkaLog4jAppender deadlocks when logging from producer network thread

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6415.
-
Resolution: Fixed

> KafkaLog4jAppender deadlocks when logging from producer network thread
> --
>
> Key: KAFKA-6415
> URL: https://issues.apache.org/jira/browse/KAFKA-6415
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.1.0
>
>
> When a log entry is appended to a Kafka topic using KafkaLog4jAppender, the 
> producer.send operation may block waiting for metadata. This can result in 
> deadlocks in a couple of scenarios if a log entry from the producer network 
> thread is also at a log level that results in the entry being appended to a 
> Kafka topic.
> 1. Producer's network thread will attempt to send data to a Kafka topic and 
> this is unsafe since producer.send may block waiting for metadata, causing a 
> deadlock since the thread will not process the metadata request/response.
> 2. KafkaLog4jAppender#append is invoked while holding the lock of the logger. 
> So the thread waiting for metadata in the initial send will be holding the 
> logger lock. If the producer network thread has.a log entry that needs to be 
> appended, it will attempt to acquire the logger lock and deadlock.
> This was probably the case right from the beginning when KafkaLog4jAppender 
> was introduced, but did not cause any issues so far since there were only 
> debug log entries in that path which were not logged to a Kafka topic by any 
> of the tests. A recent info level log entry introduced by the commit 
> https://github.com/apache/kafka/commit/a3aea3cf4dbedb293f2d7859e0298bebc8e2185f
>  is causing system test failures in log4j_appender_test.py due to the 
> deadlock.
> The asynchronous append case can be fixed by moving all send operations to a 
> separate thread. But KafkaLog4jAppender also has a syncSend option which 
> blocks append while holding the logger lock until the send completes. Not 
> sure how this can be fixed if we want to support log appends from the 
> producer network thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7406) Naming Join and Grouping Repartition Topics

2018-10-02 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7406.
-
Resolution: Fixed

> Naming Join and Grouping Repartition Topics
> ---
>
> Key: KAFKA-7406
> URL: https://issues.apache.org/jira/browse/KAFKA-7406
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Major
>  Labels: needs-kip
> Fix For: 2.1.0
>
>
> To help make Streams compatible with topology changes, we will need to give 
> users the ability to name some operators so after adjusting the topology a 
> rolling upgrade is possible.  
> This Jira is the first in this effort to allow for giving operators 
> deterministic names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-10-01 Thread Dong Lin
Hey everyone,

Hope things are going well!

Just a kind reminder that the feature freeze time is end of day today.
Major features (e.g. KIP implementation) need to be merged and minor
features need to be have PR ready. After today I will move pending JIRAs to
the next release as appropriate.

Cheers,
Dong



On Wed, 26 Sep 2018 at 1:06 AM Dong Lin  wrote:

> Hey Edoardo,
>
> Certainly, let's add it to this release since it has passed vote. KIP-81
> was previously marked as WIP for 2.0.0 release. I updated to be WIP for
> 2.1.0 release in
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals 
> and
> added it to
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044.
>
> Thanks,
> Dong
>
> On Tue, Sep 25, 2018 at 1:52 AM Edoardo Comar  wrote:
>
>> Hi Dong
>> many thanks for driving the release!
>>
>> KIP-81 previously voted as adopted has a ready-to-review JIRA and PR.
>> Shall we just amend the wiki ?
>> --
>> Edoardo Comar
>> IBM Event Streams
>> IBM UK Ltd, Hursley Park, SO21 2JN
>>
>>
>>
>>
>> From:Dong Lin 
>> To:dev , Users ,
>> kafka-clients 
>> Date:25/09/2018 08:17
>> Subject:Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan
>> --
>>
>>
>>
>> Hey everyone,
>>
>> According to the previously discussed schedule, I have updated
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> to include all the KIPs that have passed vote at this moment to be
>> included
>> in Apache Kafka 2.1.0 release. Other KIPs that have not passed vote will
>> need to be included in the next feature release.
>>
>> Just a reminder that the feature freeze data is Oct 1, 2018. In order to
>> be
>> included in the release, major features need to be merged and minor
>> features need to be have PR ready. Any feature not in this state will be
>> automatically moved to the next release after Oct 1.
>>
>> Regards,
>> Dong
>>
>> On Tue, Sep 25, 2018 at 12:02 AM Dong Lin  wrote:
>>
>> > cc us...@kafka.apache.org and kafka-clie...@googlegroups.com
>> >
>> > On Sun, Sep 9, 2018 at 5:31 PM Dong Lin  wrote:
>> >
>> >> Hi all,
>> >>
>> >> I would like to be the release manager for our next time-based feature
>> >> release 2.1.0.
>> >>
>> >> The recent Kafka release history can be found at
>> >> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.
>> >> The release plan (with open issues and planned KIPs) for 2.1.0 can be
>> found
>> >> at
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> >> .
>> >>
>> >> Here are the dates we have planned for Apache Kafka 2.1.0 release:
>> >>
>> >> 1) KIP Freeze: Sep 24, 2018.
>> >> A KIP must be accepted by this date in order to be considered for this
>> >> release)
>> >>
>> >> 2) Feature Freeze: Oct 1, 2018
>> >> Major features merged & working on stabilization, minor features have
>> PR,
>> >> release branch cut; anything not in this state will be automatically
>> moved
>> >> to the next release in JIRA.
>> >>
>> >> 3) Code Freeze: Oct 15, 2018 (Tentatively)
>> >>
>> >> The KIP and feature freeze date is about 3-4 weeks from now. Please
>> plan
>> >> accordingly for the features you want push into Apache Kafka 2.1.0
>> release.
>> >>
>> >>
>> >> Cheers,
>> >> Dong
>> >>
>> >
>>
>>
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>


[jira] [Reopened] (KAFKA-4218) Enable access to key in ValueTransformer

2018-10-01 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-4218:
-

Thanks for working on this [~jeyhunkarimov]!

For KIP-149, currently there are 4 PRs:

- [https://github.com/apache/kafka/pull/3570]

- [https://github.com/apache/kafka/pull/3599]

- [https://github.com/apache/kafka/pull/3600]

- [https://github.com/apache/kafka/pull/3601]

And 3 JIRAs:

- KAFKA-4218

- KAFKA-3745

- KAFKA-4726

Can we create an umbrella Jira for KIP-149 to include these PRs and JIRAs, so 
that we can easily reference and track the entire progress for KIP-149 in e.g. 
Apache Kafka 2.1.0 release 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044?

 

> Enable access to key in ValueTransformer
> 
>
> Key: KAFKA-4218
> URL: https://issues.apache.org/jira/browse/KAFKA-4218
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Assignee: Jeyhun Karimov
>Priority: Major
>  Labels: api, kip
> Fix For: 1.1.0
>
>
> While transforming values via {{KStream.transformValues}} and 
> {{ValueTransformer}}, the key associated with the value may be needed, even 
> if it is not changed.  For instance, it may be used to access stores.  
> As of now, the key is not available within these methods and interfaces, 
> leading to the use of {{KStream.transform}} and {{Transformer}}, and the 
> unnecessary creation of new {{KeyValue}} objects.
> KIP-149: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-149%3A+Enabling+key+access+in+ValueTransformer%2C+ValueMapper%2C+and+ValueJoiner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-232: Detect outdated metadata using leaderEpoch and partitionEpoch

2018-09-30 Thread Dong Lin
Hey Matthias,

Thanks for checking back  on the status. The KIP has been marked as
replaced by KIP-320 in the KIP list wiki page and the status has been
updated in the discussion and voting email thread.

Thanks,
Dong

On Sun, 30 Sep 2018 at 11:51 AM Matthias J. Sax 
wrote:

> It seems that KIP-320 was accepted. Thus, I am wondering what the status
> of this KIP is?
>
> -Matthias
>
> On 7/11/18 10:59 AM, Dong Lin wrote:
> > Hey Jun,
> >
> > Certainly. We can discuss later after KIP-320 settles.
> >
> > Thanks!
> > Dong
> >
> >
> > On Wed, Jul 11, 2018 at 8:54 AM, Jun Rao  wrote:
> >
> >> Hi, Dong,
> >>
> >> Sorry for the late response. Since KIP-320 is covering some of the
> similar
> >> problems described in this KIP, perhaps we can wait until KIP-320
> settles
> >> and see what's still left uncovered in this KIP.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Mon, Jun 4, 2018 at 7:03 PM, Dong Lin  wrote:
> >>
> >>> Hey Jun,
> >>>
> >>> It seems that we have made considerable progress on the discussion of
> >>> KIP-253 since February. Do you think we should continue the discussion
> >>> there, or can we continue the voting for this KIP? I am happy to submit
> >> the
> >>> PR and move forward the progress for this KIP.
> >>>
> >>> Thanks!
> >>> Dong
> >>>
> >>>
> >>> On Wed, Feb 7, 2018 at 11:42 PM, Dong Lin  wrote:
> >>>
> >>>> Hey Jun,
> >>>>
> >>>> Sure, I will come up with a KIP this week. I think there is a way to
> >>> allow
> >>>> partition expansion to arbitrary number without introducing new
> >> concepts
> >>>> such as read-only partition or repartition epoch.
> >>>>
> >>>> Thanks,
> >>>> Dong
> >>>>
> >>>> On Wed, Feb 7, 2018 at 5:28 PM, Jun Rao  wrote:
> >>>>
> >>>>> Hi, Dong,
> >>>>>
> >>>>> Thanks for the reply. The general idea that you had for adding
> >>> partitions
> >>>>> is similar to what we had in mind. It would be useful to make this
> >> more
> >>>>> general, allowing adding an arbitrary number of partitions (instead
> of
> >>>>> just
> >>>>> doubling) and potentially removing partitions as well. The following
> >> is
> >>>>> the
> >>>>> high level idea from the discussion with Colin, Jason and Ismael.
> >>>>>
> >>>>> * To change the number of partitions from X to Y in a topic, the
> >>>>> controller
> >>>>> marks all existing X partitions as read-only and creates Y new
> >>> partitions.
> >>>>> The new partitions are writable and are tagged with a higher
> >> repartition
> >>>>> epoch (RE).
> >>>>>
> >>>>> * The controller propagates the new metadata to every broker. Once
> the
> >>>>> leader of a partition is marked as read-only, it rejects the produce
> >>>>> requests on this partition. The producer will then refresh the
> >> metadata
> >>>>> and
> >>>>> start publishing to the new writable partitions.
> >>>>>
> >>>>> * The consumers will then be consuming messages in RE order. The
> >>> consumer
> >>>>> coordinator will only assign partitions in the same RE to consumers.
> >>> Only
> >>>>> after all messages in an RE are consumed, will partitions in a higher
> >> RE
> >>>>> be
> >>>>> assigned to consumers.
> >>>>>
> >>>>> As Colin mentioned, if we do the above, we could potentially (1) use
> a
> >>>>> globally unique partition id, or (2) use a globally unique topic id
> to
> >>>>> distinguish recreated partitions due to topic deletion.
> >>>>>
> >>>>> So, perhaps we can sketch out the re-partitioning KIP a bit more and
> >> see
> >>>>> if
> >>>>> there is any overlap with KIP-232. Would you be interested in doing
> >>> that?
> >>>>> If not, we can do that next week.
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>>
> >>&g

[jira] [Created] (KAFKA-7441) Allow LogCleanerManager.resumeCleaning() to be used concurrently

2018-09-25 Thread Dong Lin (JIRA)
Dong Lin created KAFKA-7441:
---

 Summary: Allow LogCleanerManager.resumeCleaning() to be used 
concurrently
 Key: KAFKA-7441
 URL: https://issues.apache.org/jira/browse/KAFKA-7441
 Project: Kafka
  Issue Type: Improvement
Reporter: Dong Lin
Assignee: xiongqi wu


LogCleanerManger provides APIs abortAndPauseCleaning(TopicPartition) and 
resumeCleaning(Iterable[TopicPartition]). The abortAndPauseCleaning(...) will 
do nothing if the partition is already in paused state. And resumeCleaning(..) 
will always clear the state for the partition if the partition is in paused 
state. Also, resumeCleaning(...) will throw IllegalStateException if the 
partition does not have any state (e.g. its state is cleared).

 

This will cause problem in the following scenario:

1) Background thread invokes LogManager.cleanupLogs() which in turn does  
abortAndPauseCleaning(...) for a given partition. Now this partition is in 
paused state.

2) User requests deletion for this partition. Controller sends 
StopReplicaRequest with delete=true for this partition. RequestHanderThread 
calls abortAndPauseCleaning(...) followed by resumeCleaning(...) for the same 
partition. Now there is no state for this partition.

3) Background thread invokes resumeCleaning(...) as part of 
LogManager.cleanupLogs(). Because there is no state for this partition, it 
causes IllegalStateException.

 

This issue can also happen before KAFKA-7322 if unclean leader election 
triggers log truncation for a partition at the same time that the partition is 
deleted upon user request. But unclean leader election is very rare. The fix 
made in https://issues.apache.org/jira/browse/KAFKA-7322 makes this issue much 
more frequent.

The solution is to record the number of pauses.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field

2018-09-25 Thread Dong Lin
For record purpose, this KIP is closed as its design has been merged into
KIP-320. See
https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation
.

On Wed, Jan 31, 2018 at 12:16 AM Dong Lin  wrote:

> Hey Jun, Jason,
>
> Thanks for all the comments. Could you see if you can give +1 for the KIP?
> I am open to make further improvements for the KIP.
>
> Thanks,
> Dong
>
> On Tue, Jan 23, 2018 at 3:44 PM, Dong Lin  wrote:
>
>> Hey Jun, Jason,
>>
>> Thanks much for all the review! I will open the voting thread.
>>
>> Regards,
>> Dong
>>
>> On Tue, Jan 23, 2018 at 3:37 PM, Jun Rao  wrote:
>>
>>> Hi, Dong,
>>>
>>> The current KIP looks good to me.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Tue, Jan 23, 2018 at 12:29 PM, Dong Lin  wrote:
>>>
>>> > Hey Jun,
>>> >
>>> > Do you think the current KIP looks OK? I am wondering if we can open
>>> the
>>> > voting thread.
>>> >
>>> > Thanks!
>>> > Dong
>>> >
>>> > On Fri, Jan 19, 2018 at 3:08 PM, Dong Lin  wrote:
>>> >
>>> > > Hey Jun,
>>> > >
>>> > > I think we can probably have a static method in Util class to decode
>>> the
>>> > > byte[]. Both KafkaConsumer implementation and the user application
>>> will
>>> > be
>>> > > able to decode the byte array and log its content for debug purpose.
>>> So
>>> > it
>>> > > seems that we can still print the information we want. It is just not
>>> > > explicitly exposed in the consumer interface. Would this address the
>>> > > problem here?
>>> > >
>>> > > Yeah we can include OffsetEpoch in AdminClient. This can be added in
>>> > > KIP-222? Is there something you would like me to add in this KIP?
>>> > >
>>> > > Thanks!
>>> > > Dong
>>> > >
>>> > > On Fri, Jan 19, 2018 at 3:00 PM, Jun Rao  wrote:
>>> > >
>>> > >> Hi, Dong,
>>> > >>
>>> > >> The issue with using just byte[] for OffsetEpoch is that it won't be
>>> > >> printable, which makes debugging harder.
>>> > >>
>>> > >> Also, KIP-222 proposes a listGroupOffset() method in AdminClient. If
>>> > that
>>> > >> gets adopted before this KIP, we probably want to include
>>> OffsetEpoch in
>>> > >> the AdminClient too.
>>> > >>
>>> > >> Thanks,
>>> > >>
>>> > >> Jun
>>> > >>
>>> > >>
>>> > >> On Thu, Jan 18, 2018 at 6:30 PM, Dong Lin 
>>> wrote:
>>> > >>
>>> > >> > Hey Jun,
>>> > >> >
>>> > >> > I agree. I have updated the KIP to remove the class OffetEpoch and
>>> > >> replace
>>> > >> > OffsetEpoch with byte[] in APIs that use it. Can you see if it
>>> looks
>>> > >> good?
>>> > >> >
>>> > >> > Thanks!
>>> > >> > Dong
>>> > >> >
>>> > >> > On Thu, Jan 18, 2018 at 6:07 PM, Jun Rao 
>>> wrote:
>>> > >> >
>>> > >> > > Hi, Dong,
>>> > >> > >
>>> > >> > > Thanks for the updated KIP. It looks good to me now. The only
>>> thing
>>> > is
>>> > >> > > for OffsetEpoch.
>>> > >> > > If we expose the individual fields in the class, we probably
>>> don't
>>> > >> need
>>> > >> > the
>>> > >> > > encode/decode methods. If we want to hide the details of
>>> > OffsetEpoch,
>>> > >> we
>>> > >> > > probably don't want expose the individual fields.
>>> > >> > >
>>> > >> > > Jun
>>> > >> > >
>>> > >> > > On Wed, Jan 17, 2018 at 10:10 AM, Dong Lin >> >
>>> > >> wrote:
>>> > >> > >
>>> > >> > > > Thinking about point 61 more, I realize that the async
>>> zookeeper
>>> > >> read
>>>

Re: [VOTE] KIP-232: Detect outdated metadata using leaderEpoch and partitionEpoch

2018-09-25 Thread Dong Lin
For record purpose, this KIP is closed as its design has been merged into
KIP-320. See
https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation
.


Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-09-25 Thread Dong Lin
Hey Edoardo,

Certainly, let's add it to this release since it has passed vote. KIP-81
was previously marked as WIP for 2.0.0 release. I updated to be WIP for
2.1.0 release in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
and
added it to
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044.

Thanks,
Dong

On Tue, Sep 25, 2018 at 1:52 AM Edoardo Comar  wrote:

> Hi Dong
> many thanks for driving the release!
>
> KIP-81 previously voted as adopted has a ready-to-review JIRA and PR.
> Shall we just amend the wiki ?
> --
> Edoardo Comar
> IBM Event Streams
> IBM UK Ltd, Hursley Park, SO21 2JN
>
>
>
>
> From:Dong Lin 
> To:dev , Users ,
> kafka-clients 
> Date:25/09/2018 08:17
> Subject:Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan
> --
>
>
>
> Hey everyone,
>
> According to the previously discussed schedule, I have updated
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> to include all the KIPs that have passed vote at this moment to be included
> in Apache Kafka 2.1.0 release. Other KIPs that have not passed vote will
> need to be included in the next feature release.
>
> Just a reminder that the feature freeze data is Oct 1, 2018. In order to be
> included in the release, major features need to be merged and minor
> features need to be have PR ready. Any feature not in this state will be
> automatically moved to the next release after Oct 1.
>
> Regards,
> Dong
>
> On Tue, Sep 25, 2018 at 12:02 AM Dong Lin  wrote:
>
> > cc us...@kafka.apache.org and kafka-clie...@googlegroups.com
> >
> > On Sun, Sep 9, 2018 at 5:31 PM Dong Lin  wrote:
> >
> >> Hi all,
> >>
> >> I would like to be the release manager for our next time-based feature
> >> release 2.1.0.
> >>
> >> The recent Kafka release history can be found at
> >> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.
> >> The release plan (with open issues and planned KIPs) for 2.1.0 can be
> found
> >> at
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> >> .
> >>
> >> Here are the dates we have planned for Apache Kafka 2.1.0 release:
> >>
> >> 1) KIP Freeze: Sep 24, 2018.
> >> A KIP must be accepted by this date in order to be considered for this
> >> release)
> >>
> >> 2) Feature Freeze: Oct 1, 2018
> >> Major features merged & working on stabilization, minor features have
> PR,
> >> release branch cut; anything not in this state will be automatically
> moved
> >> to the next release in JIRA.
> >>
> >> 3) Code Freeze: Oct 15, 2018 (Tentatively)
> >>
> >> The KIP and feature freeze date is about 3-4 weeks from now. Please plan
> >> accordingly for the features you want push into Apache Kafka 2.1.0
> release.
> >>
> >>
> >> Cheers,
> >> Dong
> >>
> >
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [ANNOUNCE] New committer: Colin McCabe

2018-09-25 Thread Dong Lin
Congratulations Colin!! Very well deserved!

On Tue, Sep 25, 2018 at 1:39 AM Ismael Juma  wrote:

> Hi all,
>
> The PMC for Apache Kafka has invited Colin McCabe as a committer and we are
> pleased to announce that he has accepted!
>
> Colin has contributed 101 commits and 8 KIPs including significant
> improvements to replication, clients, code quality and testing. A few
> highlights were KIP-97 (Improved Clients Compatibility Policy), KIP-117
> (AdminClient), KIP-227 (Incremental FetchRequests to Increase Partition
> Scalability), the introduction of findBugs and adding Trogdor (fault
> injection and benchmarking tool).
>
> In addition, Colin has reviewed 38 pull requests and participated in more
> than 50 KIP discussions.
>
> Thank you for your contributions Colin! Looking forward to many more. :)
>
> Ismael, for the Apache Kafka PMC
>


Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-09-25 Thread Dong Lin
Hey everyone,

According to the previously discussed schedule, I have updated
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
to include all the KIPs that have passed vote at this moment to be included
in Apache Kafka 2.1.0 release. Other KIPs that have not passed vote will
need to be included in the next feature release.

Just a reminder that the feature freeze data is Oct 1, 2018. In order to be
included in the release, major features need to be merged and minor
features need to be have PR ready. Any feature not in this state will be
automatically moved to the next release after Oct 1.

Regards,
Dong

On Tue, Sep 25, 2018 at 12:02 AM Dong Lin  wrote:

> cc us...@kafka.apache.org and kafka-clie...@googlegroups.com
>
> On Sun, Sep 9, 2018 at 5:31 PM Dong Lin  wrote:
>
>> Hi all,
>>
>> I would like to be the release manager for our next time-based feature
>> release 2.1.0.
>>
>> The recent Kafka release history can be found at
>> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.
>> The release plan (with open issues and planned KIPs) for 2.1.0 can be found
>> at
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
>> .
>>
>> Here are the dates we have planned for Apache Kafka 2.1.0 release:
>>
>> 1) KIP Freeze: Sep 24, 2018.
>> A KIP must be accepted by this date in order to be considered for this
>> release)
>>
>> 2) Feature Freeze: Oct 1, 2018
>> Major features merged & working on stabilization, minor features have PR,
>> release branch cut; anything not in this state will be automatically moved
>> to the next release in JIRA.
>>
>> 3) Code Freeze: Oct 15, 2018 (Tentatively)
>>
>> The KIP and feature freeze date is about 3-4 weeks from now. Please plan
>> accordingly for the features you want push into Apache Kafka 2.1.0 release.
>>
>>
>> Cheers,
>> Dong
>>
>


Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-09-25 Thread Dong Lin
cc us...@kafka.apache.org and kafka-clie...@googlegroups.com

On Sun, Sep 9, 2018 at 5:31 PM Dong Lin  wrote:

> Hi all,
>
> I would like to be the release manager for our next time-based feature
> release 2.1.0.
>
> The recent Kafka release history can be found at
> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.
> The release plan (with open issues and planned KIPs) for 2.1.0 can be found
> at
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044.
>
> Here are the dates we have planned for Apache Kafka 2.1.0 release:
>
> 1) KIP Freeze: Sep 24, 2018.
> A KIP must be accepted by this date in order to be considered for this
> release)
>
> 2) Feature Freeze: Oct 1, 2018
> Major features merged & working on stabilization, minor features have PR,
> release branch cut; anything not in this state will be automatically moved
> to the next release in JIRA.
>
> 3) Code Freeze: Oct 15, 2018 (Tentatively)
>
> The KIP and feature freeze date is about 3-4 weeks from now. Please plan
> accordingly for the features you want push into Apache Kafka 2.1.0 release.
>
>
> Cheers,
> Dong
>


[jira] [Resolved] (KAFKA-7332) Improve error message when trying to produce message without key for compacted topic

2018-09-18 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7332.
-
Resolution: Fixed

> Improve error message when trying to produce message without key for 
> compacted topic
> 
>
> Key: KAFKA-7332
> URL: https://issues.apache.org/jira/browse/KAFKA-7332
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Patrik Kleindl
>Priority: Trivial
> Fix For: 2.1.0
>
>
> Goal:
> Return a specific error message like e.g. "Message without a key is not valid 
> for a compacted topic" when trying to produce such a message instead of a 
> CorruptRecordException.
>  
> > Yesterday we had the following exception:
> > 
> > Exception thrown when sending a message with key='null' and payload='...'
> > to topic sometopic:: org.apache.kafka.common.errors.CorruptRecordException:
> > This message has failed its CRC checksum, exceeds the valid size, or is
> > otherwise corrupt.
> > 
> > The cause was identified with the help of
> > 
> >[https://stackoverflow.com/questions/49098274/kafka-stream-get-corruptrecordexception]
> > 
> > Is it possible / would it makes sense to open an issue to improve the error
> > message for this case?
> > A simple "Message without a key is not valid for a compacted topic" would
> > suffice and point a user  in the right direction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7322) Fix race condition between log cleaner thread and log retention thread when topic cleanup policy is updated

2018-09-18 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7322.
-
Resolution: Fixed

> Fix race condition between log cleaner thread and log retention thread when 
> topic cleanup policy is updated
> ---
>
> Key: KAFKA-7322
> URL: https://issues.apache.org/jira/browse/KAFKA-7322
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: xiongqi wu
>Assignee: xiongqi wu
>Priority: Major
> Fix For: 2.1.0
>
>
> The deletion thread will grab the log.lock when it tries to rename log 
> segment and schedule for actual deletion.
> The compaction thread only grabs the log.lock when it tries to replace the 
> original segments with the cleaned segment. The compaction thread doesn't 
> grab the log when it reads records from the original segments to build 
> offsetmap and new segments. As a result, if both deletion and compaction 
> threads work on the same log partition. We have a race condition. 
> This race happens when the topic cleanup policy is updated on the fly.  
> One case to hit this race condition:
> 1: topic clean up policy is "compact" initially 
> 2: log cleaner (compaction) thread picks up the partition for compaction and 
> still in progress
> 3: the topic clean up policy has been updated to "deletion"
> 4: retention thread pick up the topic partition and delete some old segments.
> 5: log cleaner thread reads from the deleted log and raise an IO exception. 
>  
> The proposed solution is to use "inprogress" map that cleaner manager has to 
> protect such a race.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-110: Add Codec for ZStandard Compression

2018-09-17 Thread Dong Lin
Hey Dongjin,

The KIP passes vote after having 3 binding +1 votes and more binding +1
votes than -1 votes. So it is not necessary to have more vote for this KIP.

Thanks,
Dong

On Mon, Sep 17, 2018 at 5:37 PM, Dongjin Lee  wrote:

> @Ismael
>
> +3 (binding), +3 (non-binding) until now. Do we need more vote?
>
> Thanks,
> Dongjin
>
> On Sat, Sep 15, 2018 at 4:03 AM Priyank Shah 
> wrote:
>
> > +1(non-binding)
> >
> > On 9/13/18, 1:44 AM, "Mickael Maison"  wrote:
> >
> > +1 (non binding)
> > Thanks for your perseverance, it's great to see this KIP finally
> > reaching completion!
> > On Thu, Sep 13, 2018 at 4:58 AM Harsha  wrote:
> > >
> > > +1 (binding).
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Wed, Sep 12, 2018, at 4:56 PM, Jason Gustafson wrote:
> > > > Great contribution! +1
> > > >
> > > > On Wed, Sep 12, 2018 at 10:20 AM, Manikumar <
> > manikumar.re...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > Thanks for the KIP.
> > > > >
> > > > > On Wed, Sep 12, 2018 at 10:44 PM Ismael Juma <
> ism...@juma.me.uk>
> > wrote:
> > > > >
> > > > > > Thanks for the KIP, +1 (binding).
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, Sep 12, 2018 at 10:02 AM Dongjin Lee <
> > dong...@apache.org> wrote:
> > > > > >
> > > > > > > Hello, I would like to start a VOTE on KIP-110: Add Codec
> > for ZStandard
> > > > > > > Compression.
> > > > > > >
> > > > > > > The KIP:
> > > > > > >
> > > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > Discussion thread:
> > > > > > >
> > https://www.mail-archive.com/dev@kafka.apache.org/msg88673.html
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > >
> > > > > > > *github:  github.com/dongjinleekr
> > > > > > > linkedin:
> > > > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > slideshare:
> > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > *
> > > > > > >
> > > > > >
> > > > >
> >
> >
> >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare:
> www.slideshare.net/dongjinleekr
> *
>


[jira] [Resolved] (KAFKA-5690) kafka-acls command should be able to list per principal

2018-09-17 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5690.
-
Resolution: Fixed

> kafka-acls command should be able to list per principal
> ---
>
> Key: KAFKA-5690
> URL: https://issues.apache.org/jira/browse/KAFKA-5690
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.11.0.0
>Reporter: Koelli Mungee
>Assignee: Manikumar
>Priority: Major
> Fix For: 2.1.0
>
>
> Currently the `kafka-acls` command has a `--list` option that can list per 
> resource which is --topic  or --group  or --cluster. In order 
> to look at the ACLs for a particular principal the user needs to iterate 
> through the entire list to figure out what privileges a particular principal 
> has been granted. An option to list the ACL per principal would simplify this 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-09-17 Thread Dong Lin
Hey everyone,

Thanks for your support!

Just a reminder that your new KIP needs to pass vote by end of next Monday
(Sep 24) to be included in Apache Kafka 2.1.0 release.

Cheers,
Dong

On Thu, Sep 13, 2018 at 10:38 PM, Dongjin Lee  wrote:

> Great. Thanks!
>
> - Dongjin
>
> On Mon, Sep 10, 2018 at 11:48 AM Matthias J. Sax 
> wrote:
>
> > Thanks a lot! You are on a run after 1.1.1 release.
> >
> > I see something coming up for myself in 4 month. :)
> >
> > On 9/9/18 6:15 PM, Guozhang Wang wrote:
> > > Dong, thanks for driving the release!
> > >
> > > On Sun, Sep 9, 2018 at 5:57 PM, Ismael Juma  wrote:
> > >
> > >> Thanks for volunteering Dong!
> > >>
> > >> Ismael
> > >>
> > >> On Sun, 9 Sep 2018, 17:32 Dong Lin,  wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I would like to be the release manager for our next time-based
> feature
> > >>> release 2.1.0.
> > >>>
> > >>> The recent Kafka release history can be found at
> > >>> https://cwiki.apache.org/confluence/display/KAFKA/Future+
> release+plan.
> > >> The
> > >>> release plan (with open issues and planned KIPs) for 2.1.0 can be
> found
> > >> at
> > >>> https://cwiki.apache.org/confluence/pages/viewpage.
> > >> action?pageId=91554044.
> > >>>
> > >>> Here are the dates we have planned for Apache Kafka 2.1.0 release:
> > >>>
> > >>> 1) KIP Freeze: Sep 24, 2018.
> > >>> A KIP must be accepted by this date in order to be considered for
> this
> > >>> release)
> > >>>
> > >>> 2) Feature Freeze: Oct 1, 2018
> > >>> Major features merged & working on stabilization, minor features have
> > PR,
> > >>> release branch cut; anything not in this state will be automatically
> > >> moved
> > >>> to the next release in JIRA.
> > >>>
> > >>> 3) Code Freeze: Oct 15, 2018 (Tentatively)
> > >>>
> > >>> The KIP and feature freeze date is about 3-4 weeks from now. Please
> > plan
> > >>> accordingly for the features you want push into Apache Kafka 2.1.0
> > >> release.
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Dong
> > >>>
> > >>
> > >
> > >
> > >
> >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>


Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-09-16 Thread Dong Lin
Thanks for the KIP Manikumar. The feature makes sense.

+1 (binding)

On Mon, Sep 10, 2018 at 7:07 AM, Manikumar 
wrote:

> Bump!
>
> On Tue, Sep 4, 2018 at 11:58 PM Adam Bellemare 
> wrote:
>
> > +1 (non binding) - would really like to see this one.
> >
> > On Mon, Sep 3, 2018 at 12:18 PM, Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> > > +1 (non binding)
> > > On Mon, Sep 3, 2018 at 3:14 PM Manikumar 
> > > wrote:
> > > >
> > > > bump up!  waiting for  2 more binding votes!
> > > >
> > > > On Tue, Aug 28, 2018 at 7:36 AM Satish Duggana <
> > satish.dugg...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Tue, Aug 28, 2018 at 2:59 AM, Harsha  wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > -Harsha
> > > > > >
> > > > > > On Mon, Aug 27, 2018, at 12:46 PM, Jakub Scholz wrote:
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > On Mon, Aug 27, 2018 at 6:24 PM Manikumar <
> > > manikumar.re...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I would like to start voting on KIP-357 which allows to list
> > > ACLs per
> > > > > > > > principal using AclCommand (kafka-acls.sh)
> > > > > > > >
> > > > > > > > KIP:
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 357%3A++Add+support+to+list+ACLs+per+principal
> > > > > > > >
> > > > > > > > Discussion Thread:
> > > > > > > >
> > > > > > > > https://lists.apache.org/thread.html/
> > > dc7f6005845a372a0a48a40872a32d
> > > > > > 9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Manikumar
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
>


[jira] [Reopened] (KAFKA-5690) kafka-acls command should be able to list per principal

2018-09-16 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-5690:
-

> kafka-acls command should be able to list per principal
> ---
>
> Key: KAFKA-5690
> URL: https://issues.apache.org/jira/browse/KAFKA-5690
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.11.0.0
>Reporter: Koelli Mungee
>Assignee: Manikumar
>Priority: Major
> Fix For: 2.1.0
>
>
> Currently the `kafka-acls` command has a `--list` option that can list per 
> resource which is --topic  or --group  or --cluster. In order 
> to look at the ACLs for a particular principal the user needs to iterate 
> through the entire list to figure out what privileges a particular principal 
> has been granted. An option to list the ACL per principal would simplify this 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5690) kafka-acls command should be able to list per principal

2018-09-16 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5690.
-
Resolution: Fixed

> kafka-acls command should be able to list per principal
> ---
>
> Key: KAFKA-5690
> URL: https://issues.apache.org/jira/browse/KAFKA-5690
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.11.0.0
>Reporter: Koelli Mungee
>Assignee: Manikumar
>Priority: Major
> Fix For: 2.1.0
>
>
> Currently the `kafka-acls` command has a `--list` option that can list per 
> resource which is --topic  or --group  or --cluster. In order 
> to look at the ACLs for a particular principal the user needs to iterate 
> through the entire list to figure out what privileges a particular principal 
> has been granted. An option to list the ACL per principal would simplify this 
> process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Apache Kafka 2.1.0 Release Plan

2018-09-09 Thread Dong Lin
Hi all,

I would like to be the release manager for our next time-based feature
release 2.1.0.

The recent Kafka release history can be found at
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan. The
release plan (with open issues and planned KIPs) for 2.1.0 can be found at
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044.

Here are the dates we have planned for Apache Kafka 2.1.0 release:

1) KIP Freeze: Sep 24, 2018.
A KIP must be accepted by this date in order to be considered for this
release)

2) Feature Freeze: Oct 1, 2018
Major features merged & working on stabilization, minor features have PR,
release branch cut; anything not in this state will be automatically moved
to the next release in JIRA.

3) Code Freeze: Oct 15, 2018 (Tentatively)

The KIP and feature freeze date is about 3-4 weeks from now. Please plan
accordingly for the features you want push into Apache Kafka 2.1.0 release.


Cheers,
Dong


[jira] [Resolved] (KAFKA-7333) Protocol changes for KIP-320

2018-09-09 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7333.
-
Resolution: Fixed

> Protocol changes for KIP-320
> 
>
> Key: KAFKA-7333
> URL: https://issues.apache.org/jira/browse/KAFKA-7333
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.1.0
>
>
> Implement protocol changes for 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6082) consider fencing zookeeper updates with controller epoch zkVersion

2018-09-07 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6082.
-
Resolution: Fixed

> consider fencing zookeeper updates with controller epoch zkVersion
> --
>
> Key: KAFKA-6082
> URL: https://issues.apache.org/jira/browse/KAFKA-6082
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Onur Karaman
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Major
> Fix For: 2.1.0
>
>
>  
> Kafka controller may fail to function properly (even after repeated 
> controller movement) due to the following sequence of events:
>  - User requests topic deletion
>  - Controller A deletes the partition znode
>  - Controller B becomes controller and reads the topic znode
>  - Controller A deletes the topic znode and remove the topic from the topic 
> deletion znode
>  - Controller B reads the partition znode and topic deletion znode
>  - According to controller B's context, the topic znode exists, the topic is 
> not listed for deletion, and some partition is not found for the given topic. 
> Then controller B will create topic znode with empty data (i.e. partition 
> assignment) and create the partition znodes.
>  - All controller after controller B will fail because there is not data in 
> the topic znode.
> The long term solution is to have a way to prevent old controller from 
> writing to zookeeper if it is not the active controller. One idea is to use 
> the zookeeper multi API (See 
> [https://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable))]
>  such that controller only writes to zookeeper if the zk version of the 
> controller epoch znode has not been changed.
> The short term solution is to let controller reads the topic deletion znode 
> first. If the topic is still listed in the topic deletion znode, then the new 
> controller will properly handle partition states of this topic without 
> creating partition znodes for this topic. And if the topic is not listed in 
> the topic deletion znode, then both the topic znode and the partition znodes 
> of this topic should have been deleted by the time the new controller tries 
> to read them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7211) MM should handle timeouts in commitSync

2018-09-04 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7211.
-
Resolution: Fixed

> MM should handle timeouts in commitSync
> ---
>
> Key: KAFKA-7211
> URL: https://issues.apache.org/jira/browse/KAFKA-7211
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: huxihx
>Priority: Major
> Fix For: 2.1.0
>
>
> Now that we have KIP-266, the user can override `default.api.timeout.ms` for 
> the consumer so that commitSync does not block indefinitely. MM needs to be 
> updated to handle TimeoutException. We may also need some logic to handle 
> deleted topics. If MM attempts to commit an offset for a deleted topic, the 
> call will timeout and we should probably check if the topic exists and remove 
> the offset if it doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-359: Verify leader epoch in produce requests

2018-08-30 Thread Dong Lin
Thanks for the KIP!

+1 (binding)

On Thu, Aug 30, 2018 at 3:56 PM, Jason Gustafson  wrote:

> Hi All,
>
> I'd like to start the vote on KIP-359: https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+
> produce+requests.
> Thanks in advance for reviewing.
>
> -Jason
>


[jira] [Reopened] (KAFKA-7295) Fix RequestHandlerAvgIdlePercent metric calculation

2018-08-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-7295:
-

> Fix RequestHandlerAvgIdlePercent metric calculation
> ---
>
> Key: KAFKA-7295
> URL: https://issues.apache.org/jira/browse/KAFKA-7295
> Project: Kafka
>  Issue Type: Improvement
>        Reporter: Dong Lin
>    Assignee: Dong Lin
>Priority: Major
>
> Currently the RequestHandlerAvgIdlePercent metric may be larger than 1 due to 
> the way it is calculated. This is counter-intuitive since by definition it is 
> supposed to be a percentage metric and its value should be in range [0, 1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7295) Fix RequestHandlerAvgIdlePercent metric calculation

2018-08-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7295.
-
Resolution: Fixed

The issue is mitigated in https://issues.apache.org/jira/browse/KAFKA-7354

> Fix RequestHandlerAvgIdlePercent metric calculation
> ---
>
> Key: KAFKA-7295
> URL: https://issues.apache.org/jira/browse/KAFKA-7295
> Project: Kafka
>  Issue Type: Improvement
>        Reporter: Dong Lin
>    Assignee: Dong Lin
>Priority: Major
>
> Currently the RequestHandlerAvgIdlePercent metric may be larger than 1 due to 
> the way it is calculated. This is counter-intuitive since by definition it is 
> supposed to be a percentage metric and its value should be in range [0, 1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7354) Fix IdlePercent and NetworkProcessorAvgIdlePercent metric calculation

2018-08-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7354.
-
Resolution: Fixed

> Fix IdlePercent and NetworkProcessorAvgIdlePercent metric calculation
> -
>
> Key: KAFKA-7354
> URL: https://issues.apache.org/jira/browse/KAFKA-7354
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: huxihx
>Assignee: huxihx
>Priority: Major
> Fix For: 1.1.2, 2.0.1, 2.1.0
>
>
> Currently, MBean 
> `kafka.network:type=Processor,name=IdlePercent,networkProcessor=*` and 
> `afka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent` could be 
> greater than 1. However, these two values represent a percentage which should 
> not exceed 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >