Build failed in Jenkins: kafka-trunk-jdk10 #439

2018-08-27 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] MINOR: Correct folder for package object scala (#5573)

--
[...truncated 1.53 MB...]

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAppendNewMetadataToLogOnAddPartitionsWhenPartitionsAdded PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestOnEndTxnWhenTransactionalIdIsNull STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestOnEndTxnWhenTransactionalIdIsNull PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithCoordinatorLoadInProgressOnEndTxnWhenCoordinatorIsLoading 
STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithCoordinatorLoadInProgressOnEndTxnWhenCoordinatorIsLoading 
PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithSuccessOnAddPartitionsWhenStateIsCompleteAbort STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithSuccessOnAddPartitionsWhenStateIsCompleteAbort PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareAbortState
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareAbortState
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithErrorsNoneOnAddPartitionWhenNoErrorsAndPartitionsTheSame 
STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithErrorsNoneOnAddPartitionWhenNoErrorsAndPartitionsTheSame PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestOnEndTxnWhenTransactionalIdIsEmpty STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestOnEndTxnWhenTransactionalIdIsEmpty PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestAddPartitionsToTransactionWhenTransactionalIdIsEmpty
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestAddPartitionsToTransactionWhenTransactionalIdIsEmpty
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAppendPrepareAbortToLogOnEndTxnWhenStatusIsOngoingAndResultIsAbort STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAppendPrepareAbortToLogOnEndTxnWhenStatusIsOngoingAndResultIsAbort PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAcceptInitPidAndReturnNextPidWhenTransactionalIdIsNull STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAcceptInitPidAndReturnNextPidWhenTransactionalIdIsNull PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRemoveTransactionsForPartitionOnEmigration STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRemoveTransactionsForPartitionOnEmigration PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareCommitState
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareCommitState
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAbortExpiredTransactionsInOngoingStateAndBumpEpoch STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAbortExpiredTransactionsInOngoingStateAndBumpEpoch PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnInvalidTxnRequestOnEndTxnRequestWhenStatusIsCompleteCommitAndResultIsNotCommit
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnInvalidTxnRequestOnEndTxnRequestWhenStatusIsCompleteCommitAndResultIsNotCommit
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnOkOnEndTxnWhenStatusIsCompleteCommitAndResultIsCommit STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnOkOnEndTxnWhenStatusIsCompleteCommitAndResultIsCommit PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionsOnAddPartitionsWhenStateIsPrepareCommit 
STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionsOnAddPartitionsWhenStateIsPrepareCommit 
PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldIncrementEpochAndUpdateMetadataOnHandleInitPidWhenExistingCompleteTransaction
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldIncrementEpochAndUpdateMetadataOnHandleInitPidWhenExistingCompleteTransaction
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 

Jenkins build is back to normal : kafka-trunk-jdk8 #2928

2018-08-27 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Satish Duggana
+1 (non-binding)

On Tue, Aug 28, 2018 at 2:59 AM, Harsha  wrote:

> +1 (binding)
>
> -Harsha
>
> On Mon, Aug 27, 2018, at 12:46 PM, Jakub Scholz wrote:
> > +1 (non-binding)
> >
> > On Mon, Aug 27, 2018 at 6:24 PM Manikumar 
> wrote:
> >
> > > Hi All,
> > >
> > > I would like to start voting on KIP-357 which allows to list ACLs per
> > > principal using AclCommand (kafka-acls.sh)
> > >
> > > KIP:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 357%3A++Add+support+to+list+ACLs+per+principal
> > >
> > > Discussion Thread:
> > >
> > > https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d
> 9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E
> > >
> > > Thanks,
> > > Manikumar
> > >
>


Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-08-27 Thread Lucas Wang
Thanks for the comments, Jun.

1. I think the answer should be no, since the "inter.broker.listener.name"
are also used
for replication traffic, and merging these two types of request to the
single threaded tunnel
would defeat the purpose of this KIP and also hurt replication throughput.
So I think that means
we should validate to make sure when the new config is set, it's different
from "inter.broker.listener.name"
or "security.inter.broker.protocol", whichever is set.

2. Normally all broker configs in a given cluster are changed at the same
time. If there is a typo in the
controller.listener.name and it's not available in the endpoints list, we
could catch it, give an error
and block restart of the first broker in the cluster. With that, we could
keep the current behavior
in the KIP write up that falls back to "inter.broker.listener.nam" when the
"controller.listener.name"
is not found during the migration phase of this KIP. Thoughts?

3. That makes sense, and I've changed it.

Thanks,
Lucas

On Thu, Aug 23, 2018 at 3:46 PM Jun Rao  wrote:

> Hi, Lucas,
>
> Sorry for the delay. The new proposal looks good to me overall. A few minor
> comments below.
>
> 1. It's possible that listener.name.for.controller is set, but set to the
> same value as inter.broker.listener.name. In that case, should we have a
> single network thread and the request handling thread for that listener?
>
> 2. Currently, the controller always picks the listener specified by
> inter.broker.listener.name even if the listener name is not present in the
> receiving broker. This KIP proposes a slightly different approach for
> picking listener.name.for.controller only when the receiving end has the
> listener and switches listener.name.for.controller otherwise. There are
> some tradeoffs between the two approaches. To change the inter broker
> listener, the former requires 2 steps: (1) adding the new listener to
> listener list in every broker and (2) changing
> listener.name.for.controller.
> The latter can do both changes in 1 step. On the hand, if
> listener.name.for.controller
> is mis-configured, the former will report an error and the latter will hide
> it (so the user may not know the misconfiguration). It seems that we should
> pick one approach to handle both listener.name.for.controller and
> inter.broker.listener.name consistently. To me, the former seems slightly
> better.
>
> 3. To be consistent with the existing naming, should
> listener.name.for.controller
> be controller.listener.name?
>
> Thanks,
>
> Jun
>
>
> On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang  wrote:
>
> > Hi Jun and Joel,
> >
> > The KIP writeup has changed by quite a bit since your +1.
> > Can you please take another review? Thanks a lot for your time!
> >
> > Lucas
> >
> > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy 
> wrote:
> >
> > > +1 on the KIP.
> > >
> > > (I'm not sure we actually necessary to introduce the condition
> variables
> > > for the concern that Jun raised, but it's an implementation detail that
> > we
> > > can defer to a discussion in the PR.)
> > >
> > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang 
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I agree by using the conditional variables, there is no need to add
> > such
> > > a
> > > > new config.
> > > > Also thanks for approving this KIP.
> > > >
> > > > Lucas
> > > >
> > >
> >
>


Re: 答复: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest

2018-08-27 Thread Yishun Guan
Hi Ted, you are right! I fixed it to List.

I would prefer the first solution because all the logics will be added
on NetworkClient.java, and changing the current AbstractRequest class
seems a little unnecessary.

Thanks
Yishun
On Mon, Aug 27, 2018 at 4:57 PM Ted Yu  wrote:
>
> Looking at the code for solution #1:
>  } else if (builder.build(version)
> 
> instanceof List){
>
> wouldn't AbstractRequest be gone due to type erasure ?
>
> Which solution do you favor ?
>
> Cheers
>
> On Mon, Aug 27, 2018 at 4:20 PM Yishun Guan  wrote:
>
> > Sorry for the delay, I have been struggling to come up with a nice
> > solution:
> >
> > I have updated the KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-347%3A++Enable+batching+in+FindCoordinatorRequest
> >
> > In summary, to solve the question Guozhang raised:
> >
> > "One tricky question is, how do we know if a higher version API has a
> > batching optimization...
> >
> > a) One solution is to let the request's builder.build() return either
> > ONE request or a LIST of requests. This is backward compatible. We can
> > have a list of one single element.
> >
> > b) An alternative solution is to add extra fields in
> > AbstractRequest.java including Boolean isBatchingEnable() and
> > List buildFromBatch(). This way will decouple the two
> > different build functions.
> >
> > Then we update the send logic in doSend() correspondingly."
> >
> >
> > You can read about these solutions in more details in this KIP.
> >
> > Thanks,
> > Yishun
> >
> > On Fri, Aug 17, 2018 at 12:12 PM Yishun Guan  wrote:
> > >
> > > Thanks for the clarification. I will address this in my KIP.
> > >
> > > On Fri, Aug 17, 2018, 12:06 PM Guozhang Wang  wrote:
> > >>
> > >> Today we do have logic for auto down-conversion, but it is assuming a
> > one-to-one mapping. The actual logic is here:
> > >>
> > >>
> > https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L775
> > >>
> > >> As you can see, NetworkClient maintains a "apiVersions" map that keeps
> > which node supports which version. And then when sending the request to the
> > node it will use this map to form the supported version of the request.
> > >>
> > >> But current logic do not consider that we may need multiple lower
> > versioned requests to substitute a single higher versioned request, and
> > that would be the logic your PR need to address.
> > >>
> > >>
> > >> Guozhang
> > >>
> > >> On Fri, Aug 17, 2018 at 11:59 AM, Yishun Guan 
> > wrote:
> > >>>
> > >>> @Guozhang Wang One thing that I remain confused about (and forgive me
> > if you have explained this to me before), is that if we don't have any
> > transformation helpers (between versions) implemented before, how do we
> > deal with other incompatibility issues when we try to update requests and
> > bump up their versions? Or we never have this problem until this version
> > update and now that's why we need to implement a converter from V3 to V2?
> > >>>
> > >>> On Fri, Aug 17, 2018 at 10:18 AM Guozhang Wang 
> > wrote:
> > 
> >  Yishun, some more comments:
> > 
> >  1. "All the coordinator ids " + "for this request": it should be "all
> > the
> >  requested group ids looking for their coordinators" right?
> > 
> >  2. I just thought about this a bit more, regarding "*Compatibility
> > issues
> >  between old and new versions need to be considered, we should think
> > about
> >  how to convert requests from a newer version to a old version."*
> > 
> > 
> >  One thing I realized is that for FindCoordinatorRequest, today both
> >  consumer / admin client would need it. I.e. to complete the KIP for
> >  compatibility, you'll have to implement this function along with this
> > KIP,
> >  since otherwise consumer talking to an old broker will fail to.
> > 
> >  So I'd suggest you update the `Compatibility` section with a detailed
> >  proposal on how to let new versioned clients to talk to old versioned
> >  brokers. We've talked about some high-level implementation guidelines
> > in
> >  the DISCUSS thread, which you can try it out and see if it works:
> > i.e. by
> >  starting a Kafka broker with version 2.0, and then starting a consumer
> >  client with trunk (it will have a new version), and the added logic
> > should
> >  make sure the consumer still proceeds normally with the compatibility
> > logic
> >  that we are going to add.
> > 
> > 
> >  Guozhang
> > 
> >  On Thu, Aug 16, 2018 at 5:46 PM, Hu Xi  wrote:
> > 
> >  > +1 (non-binding)
> >  >
> >  > 
> >  > 发件人: Yishun Guan 
> >  > 发送时间: 2018年8月17日 8:14
> >  > 收件人: dev@kafka.apache.org
> >  > 主题: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest
> >  >
> >  > Hi all,
> >  >
> >  > I 

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-08-27 Thread Lucas Wang
Thanks for the comments, Joel.

I addressed all but the last one, where Jun also shared a comment in the
Vote thread to
change it to "controller.listener.name". I actually feel CONTROLLER is
better since it's a well defined
concept in Kafka, while it's easier to confuse people with CONTROL since
in the code we refer to some request used for transactional producing as a
CONTROL batch.

Thanks,
Lucas


[jira] [Resolved] (KAFKA-7347) Wrong error code returned for OffsetsForLeaderEpoch from non-replica

2018-08-27 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7347.

   Resolution: Fixed
Fix Version/s: 2.1.0

> Wrong error code returned for OffsetsForLeaderEpoch from non-replica
> 
>
> Key: KAFKA-7347
> URL: https://issues.apache.org/jira/browse/KAFKA-7347
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.1.0
>
>
> We should return NOT_LEADER_FOR_PARTITION from OffsetsForLeaderEpoch requests 
> to non-replicas instead of UNKNOWN_TOPIC_OR_PARTITION.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-365: Materialized, Serialized, Joined, Consumed and Produced with implicit Serde

2018-08-27 Thread Dongjin Lee
+1 (non-binding)

On Tue, Aug 28, 2018 at 8:53 AM Bill Bejeck  wrote:

> +1
>
> -Bill
>
> On Mon, Aug 27, 2018 at 3:24 PM Ted Yu  wrote:
>
> > +1
> >
> > On Mon, Aug 27, 2018 at 12:18 PM John Roesler  wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Sat, Aug 25, 2018 at 1:16 PM Joan Goyeau  wrote:
> > >
> > > > Hi,
> > > >
> > > > We want to make sure that we always have a serde for all
> Materialized,
> > > > Serialized, Joined, Consumed and Produced.
> > > > For that we can make use of the implicit parameters in Scala.
> > > >
> > > > KIP:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-365%3A+Materialized%2C+Serialized%2C+Joined%2C+Consumed+and+Produced+with+implicit+Serde
> > > >
> > > > Github PR: https://github.com/apache/kafka/pull/5551
> > > >
> > > > Please make your votes.
> > > > Thanks
> > > >
> > >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  github.com/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> slideshare: 
> www.slideshare.net/dongjinleekr
> *
>


Re: 答复: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest

2018-08-27 Thread Ted Yu
Looking at the code for solution #1:
 } else if (builder.build(version)

instanceof List){

wouldn't AbstractRequest be gone due to type erasure ?

Which solution do you favor ?

Cheers

On Mon, Aug 27, 2018 at 4:20 PM Yishun Guan  wrote:

> Sorry for the delay, I have been struggling to come up with a nice
> solution:
>
> I have updated the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-347%3A++Enable+batching+in+FindCoordinatorRequest
>
> In summary, to solve the question Guozhang raised:
>
> "One tricky question is, how do we know if a higher version API has a
> batching optimization...
>
> a) One solution is to let the request's builder.build() return either
> ONE request or a LIST of requests. This is backward compatible. We can
> have a list of one single element.
>
> b) An alternative solution is to add extra fields in
> AbstractRequest.java including Boolean isBatchingEnable() and
> List buildFromBatch(). This way will decouple the two
> different build functions.
>
> Then we update the send logic in doSend() correspondingly."
>
>
> You can read about these solutions in more details in this KIP.
>
> Thanks,
> Yishun
>
> On Fri, Aug 17, 2018 at 12:12 PM Yishun Guan  wrote:
> >
> > Thanks for the clarification. I will address this in my KIP.
> >
> > On Fri, Aug 17, 2018, 12:06 PM Guozhang Wang  wrote:
> >>
> >> Today we do have logic for auto down-conversion, but it is assuming a
> one-to-one mapping. The actual logic is here:
> >>
> >>
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L775
> >>
> >> As you can see, NetworkClient maintains a "apiVersions" map that keeps
> which node supports which version. And then when sending the request to the
> node it will use this map to form the supported version of the request.
> >>
> >> But current logic do not consider that we may need multiple lower
> versioned requests to substitute a single higher versioned request, and
> that would be the logic your PR need to address.
> >>
> >>
> >> Guozhang
> >>
> >> On Fri, Aug 17, 2018 at 11:59 AM, Yishun Guan 
> wrote:
> >>>
> >>> @Guozhang Wang One thing that I remain confused about (and forgive me
> if you have explained this to me before), is that if we don't have any
> transformation helpers (between versions) implemented before, how do we
> deal with other incompatibility issues when we try to update requests and
> bump up their versions? Or we never have this problem until this version
> update and now that's why we need to implement a converter from V3 to V2?
> >>>
> >>> On Fri, Aug 17, 2018 at 10:18 AM Guozhang Wang 
> wrote:
> 
>  Yishun, some more comments:
> 
>  1. "All the coordinator ids " + "for this request": it should be "all
> the
>  requested group ids looking for their coordinators" right?
> 
>  2. I just thought about this a bit more, regarding "*Compatibility
> issues
>  between old and new versions need to be considered, we should think
> about
>  how to convert requests from a newer version to a old version."*
> 
> 
>  One thing I realized is that for FindCoordinatorRequest, today both
>  consumer / admin client would need it. I.e. to complete the KIP for
>  compatibility, you'll have to implement this function along with this
> KIP,
>  since otherwise consumer talking to an old broker will fail to.
> 
>  So I'd suggest you update the `Compatibility` section with a detailed
>  proposal on how to let new versioned clients to talk to old versioned
>  brokers. We've talked about some high-level implementation guidelines
> in
>  the DISCUSS thread, which you can try it out and see if it works:
> i.e. by
>  starting a Kafka broker with version 2.0, and then starting a consumer
>  client with trunk (it will have a new version), and the added logic
> should
>  make sure the consumer still proceeds normally with the compatibility
> logic
>  that we are going to add.
> 
> 
>  Guozhang
> 
>  On Thu, Aug 16, 2018 at 5:46 PM, Hu Xi  wrote:
> 
>  > +1 (non-binding)
>  >
>  > 
>  > 发件人: Yishun Guan 
>  > 发送时间: 2018年8月17日 8:14
>  > 收件人: dev@kafka.apache.org
>  > 主题: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest
>  >
>  > Hi all,
>  >
>  > I want to start a vote on this KIP:
>  > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  > 347%3A++Enable+batching+in+FindCoordinatorRequest
>  >
>  > Here is the discussion thread:
>  > https://lists.apache.org/thread.html/fd727cc7d5b0956d64255c35d5ed46
>  > 403c3495a7052ba8ffbc55e084@%3Cdev.kafka.apache.org%3E
>  >
>  > Thanks everyone for your input!
>  >
>  > Best,
>  > Yishun
>  >
> 
> 
> 
>  --
>  -- Guozhang
> >>
> >>
> >>
> >>
> >> --

Re: [VOTE] KIP-365: Materialized, Serialized, Joined, Consumed and Produced with implicit Serde

2018-08-27 Thread Bill Bejeck
+1

-Bill

On Mon, Aug 27, 2018 at 3:24 PM Ted Yu  wrote:

> +1
>
> On Mon, Aug 27, 2018 at 12:18 PM John Roesler  wrote:
>
> > +1 (non-binding)
> >
> > On Sat, Aug 25, 2018 at 1:16 PM Joan Goyeau  wrote:
> >
> > > Hi,
> > >
> > > We want to make sure that we always have a serde for all Materialized,
> > > Serialized, Joined, Consumed and Produced.
> > > For that we can make use of the implicit parameters in Scala.
> > >
> > > KIP:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-365%3A+Materialized%2C+Serialized%2C+Joined%2C+Consumed+and+Produced+with+implicit+Serde
> > >
> > > Github PR: https://github.com/apache/kafka/pull/5551
> > >
> > > Please make your votes.
> > > Thanks
> > >
> >
>


[jira] [Created] (KAFKA-7350) Improve or remove the PreferredReplicaImbalanceCount metric

2018-08-27 Thread Lucas Wang (JIRA)
Lucas Wang created KAFKA-7350:
-

 Summary: Improve or remove the PreferredReplicaImbalanceCount 
metric
 Key: KAFKA-7350
 URL: https://issues.apache.org/jira/browse/KAFKA-7350
 Project: Kafka
  Issue Type: Improvement
Reporter: Lucas Wang
Assignee: Lucas Wang


In KAFKA-6753, we identified that in the ControllerEventManager, updating two 
metrics after processing every controller event ends up consuming too much CPU. 
The first metric OfflinePartitionCount is resolved in KAFKA-6753, and this 
ticket is for tracking progress on the 2nd metric 
PreferredReplicaImbalanceCount. 

The options we have about this metric include
1. Remove this metric given that if necessary, the value of this metric can be 
derived by getting the metadata of all topics in the cluster
2. Piggyback the update of the metric every time the auto leader balancer runs. 
The benefit is keeping this metric. However the downside is that this metric 
may then get obsolete and incorrect depending on when it's checked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6753) Speed up event processing on the controller

2018-08-27 Thread Lucas Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Wang resolved KAFKA-6753.
---
Resolution: Fixed

> Speed up event processing on the controller 
> 
>
> Key: KAFKA-6753
> URL: https://issues.apache.org/jira/browse/KAFKA-6753
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Screen Shot 2018-04-04 at 7.08.55 PM.png
>
>
> The existing controller code updates metrics after processing every event. 
> This can slow down event processing on the controller tremendously. In one 
> profiling we see that updating metrics takes nearly 100% of the CPU for the 
> controller event processing thread. Specifically the slowness can be 
> attributed to two factors:
> 1. Each invocation to update the metrics is expensive. Specifically trying to 
> calculate the offline partitions count requires iterating through all the 
> partitions in the cluster to check if the partition is offline; and 
> calculating the preferred replica imbalance count requires iterating through 
> all the partitions in the cluster to check if a partition has a leader other 
> than the preferred leader. In a large cluster, the number of partitions can 
> be quite large, all seen by the controller. Even if the time spent to check a 
> single partition is small, the accumulation effect of so many partitions in 
> the cluster can make the invocation to update metrics quite expensive. One 
> might argue that maybe the logic for processing each single partition is not 
> optimized, we checked the CPU percentage of leaf nodes in the profiling 
> result, and found that inside the loops of collection objects, e.g. the set 
> of all partitions, no single function dominates the processing. Hence the 
> large number of the partitions in a cluster is the main contributor to the 
> slowness of one invocation to update the metrics.
> 2. The invocation to update metrics is called many times when the is a high 
> number of events to be processed by the controller, one invocation after 
> processing any event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: 答复: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest

2018-08-27 Thread Yishun Guan
Sorry for the delay, I have been struggling to come up with a nice solution:

I have updated the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-347%3A++Enable+batching+in+FindCoordinatorRequest

In summary, to solve the question Guozhang raised:

"One tricky question is, how do we know if a higher version API has a
batching optimization...

a) One solution is to let the request's builder.build() return either
ONE request or a LIST of requests. This is backward compatible. We can
have a list of one single element.

b) An alternative solution is to add extra fields in
AbstractRequest.java including Boolean isBatchingEnable() and
List buildFromBatch(). This way will decouple the two
different build functions.

Then we update the send logic in doSend() correspondingly."


You can read about these solutions in more details in this KIP.

Thanks,
Yishun

On Fri, Aug 17, 2018 at 12:12 PM Yishun Guan  wrote:
>
> Thanks for the clarification. I will address this in my KIP.
>
> On Fri, Aug 17, 2018, 12:06 PM Guozhang Wang  wrote:
>>
>> Today we do have logic for auto down-conversion, but it is assuming a 
>> one-to-one mapping. The actual logic is here:
>>
>> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L775
>>
>> As you can see, NetworkClient maintains a "apiVersions" map that keeps which 
>> node supports which version. And then when sending the request to the node 
>> it will use this map to form the supported version of the request.
>>
>> But current logic do not consider that we may need multiple lower versioned 
>> requests to substitute a single higher versioned request, and that would be 
>> the logic your PR need to address.
>>
>>
>> Guozhang
>>
>> On Fri, Aug 17, 2018 at 11:59 AM, Yishun Guan  wrote:
>>>
>>> @Guozhang Wang One thing that I remain confused about (and forgive me if 
>>> you have explained this to me before), is that if we don't have any 
>>> transformation helpers (between versions) implemented before, how do we 
>>> deal with other incompatibility issues when we try to update requests and 
>>> bump up their versions? Or we never have this problem until this version 
>>> update and now that's why we need to implement a converter from V3 to V2?
>>>
>>> On Fri, Aug 17, 2018 at 10:18 AM Guozhang Wang  wrote:

 Yishun, some more comments:

 1. "All the coordinator ids " + "for this request": it should be "all the
 requested group ids looking for their coordinators" right?

 2. I just thought about this a bit more, regarding "*Compatibility issues
 between old and new versions need to be considered, we should think about
 how to convert requests from a newer version to a old version."*


 One thing I realized is that for FindCoordinatorRequest, today both
 consumer / admin client would need it. I.e. to complete the KIP for
 compatibility, you'll have to implement this function along with this KIP,
 since otherwise consumer talking to an old broker will fail to.

 So I'd suggest you update the `Compatibility` section with a detailed
 proposal on how to let new versioned clients to talk to old versioned
 brokers. We've talked about some high-level implementation guidelines in
 the DISCUSS thread, which you can try it out and see if it works: i.e. by
 starting a Kafka broker with version 2.0, and then starting a consumer
 client with trunk (it will have a new version), and the added logic should
 make sure the consumer still proceeds normally with the compatibility logic
 that we are going to add.


 Guozhang

 On Thu, Aug 16, 2018 at 5:46 PM, Hu Xi  wrote:

 > +1 (non-binding)
 >
 > 
 > 发件人: Yishun Guan 
 > 发送时间: 2018年8月17日 8:14
 > 收件人: dev@kafka.apache.org
 > 主题: [VOTE] KIP-347: Enable batching in FindCoordinatorRequest
 >
 > Hi all,
 >
 > I want to start a vote on this KIP:
 > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 > 347%3A++Enable+batching+in+FindCoordinatorRequest
 >
 > Here is the discussion thread:
 > https://lists.apache.org/thread.html/fd727cc7d5b0956d64255c35d5ed46
 > 403c3495a7052ba8ffbc55e084@%3Cdev.kafka.apache.org%3E
 >
 > Thanks everyone for your input!
 >
 > Best,
 > Yishun
 >



 --
 -- Guozhang
>>
>>
>>
>>
>> --
>> -- Guozhang


[jira] [Created] (KAFKA-7349) Long Disk Writes cause Zookeeper Disconnects

2018-08-27 Thread Adam Kafka (JIRA)
Adam Kafka created KAFKA-7349:
-

 Summary: Long Disk Writes cause Zookeeper Disconnects
 Key: KAFKA-7349
 URL: https://issues.apache.org/jira/browse/KAFKA-7349
 Project: Kafka
  Issue Type: Bug
  Components: zkclient
Affects Versions: 0.11.0.1
Reporter: Adam Kafka
 Attachments: SpikeInWriteTime.png

We run our Kafka cluster on a cloud service provider. As a consequence, we 
notice a large tail latency write time that is out of our control. Some writes 
take on the order of seconds. We have noticed that often these long write times 
are correlated with subsequent Zookeeper disconnects from the brokers. It 
appears that during the long write time, the Zookeeper heartbeat thread does 
not get scheduled CPU time, resulting in a long gap of heartbeats sent. After 
the write, the ZK thread does get scheduled CPU time, but it detects that it 
has not received a heartbeat from Zookeeper in a while, so it drops its 
connection then rejoins the cluster.

Note that the timeout reported is inconsistent with the timeout as set by the 
configuration ({{zookeeper.session.timeout.ms}} = default of 6 seconds). We 
have seen a range of values reported here, including 5950ms (less than 
threshold), 12032ms (double the threshold), 25999ms (much larger than the 
threshold).

We noticed that during a service degradation of the storage service of our 
cloud provider, these Zookeeper disconnects increased drastically in frequency. 

We are hoping there is a way to decouple these components. Do you agree with 
our diagnosis that the ZK disconnects are occurring due to thread contention 
caused by long disk writes? Perhaps the ZK thread could be scheduled at a 
higher priority? Do you have any suggestions for how to avoid the ZK 
disconnects?

Here is an example of one of these events:
Logs on the Broker:
{code}
[2018-08-25 04:10:19,695] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:21,697] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:23,700] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:25,701] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:27,702] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:29,704] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:31,707] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:33,709] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:35,712] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:37,714] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:39,716] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:41,719] DEBUG Got ping response for sessionid: 
0x36202ab4337002c after 1ms (org.apache.zookeeper.ClientCnxn)
...
[2018-08-25 04:10:53,752] WARN Client session timed out, have not heard from 
server in 12032ms for sessionid 0x36202ab4337002c 
(org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:53,754] INFO Client session timed out, have not heard from 
server in 12032ms for sessionid 0x36202ab4337002c, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-08-25 04:10:53,920] INFO zookeeper state changed (Disconnected) 
(org.I0Itec.zkclient.ZkClient)
[2018-08-25 04:10:53,920] INFO Waiting for keeper state SyncConnected 
(org.I0Itec.zkclient.ZkClient)
...
{code}

GC logs during the same time (demonstrating this is not just a long GC):
{code}
2018-08-25T04:10:36.434+: 35150.779: [GC (Allocation Failure)  
3074119K->2529089K(6223360K), 0.0137342 secs]
2018-08-25T04:10:37.367+: 35151.713: [GC (Allocation Failure)  
3074433K->2528524K(6223360K), 0.0127938 secs]
2018-08-25T04:10:38.274+: 35152.620: [GC (Allocation Failure)  
3073868K->2528357K(6223360K), 0.0131040 secs]
2018-08-25T04:10:39.220+: 35153.566: [GC (Allocation Failure)  
3073701K->2528885K(6223360K), 0.0133247 secs]
2018-08-25T04:10:40.175+: 35154.520: [GC (Allocation Failure)  
3074229K->2528639K(6223360K), 0.0127870 secs]
2018-08-25T04:10:41.084+: 35155.429: [GC (Allocation Failure)  
3073983K->2530769K(6223360K), 0.0135058 secs]
2018-08-25T04:10:42.012+: 35156.358: [GC (Allocation Failure)  
3076113K->2531772K(6223360K), 0.0165919 secs]
2018-08-25T04:10:53.737+: 

Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Harsha
+1 (binding)

-Harsha

On Mon, Aug 27, 2018, at 12:46 PM, Jakub Scholz wrote:
> +1 (non-binding)
> 
> On Mon, Aug 27, 2018 at 6:24 PM Manikumar  wrote:
> 
> > Hi All,
> >
> > I would like to start voting on KIP-357 which allows to list ACLs per
> > principal using AclCommand (kafka-acls.sh)
> >
> > KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-357%3A++Add+support+to+list+ACLs+per+principal
> >
> > Discussion Thread:
> >
> > https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E
> >
> > Thanks,
> > Manikumar
> >


Re: [VOTE] KIP-363: Make FunctionConversions private

2018-08-27 Thread Joan Goyeau
John, no this is for internal use only.
I fact I expect this object to go away with the drop of Scala 2.11 since in
Scala 2.12 we have support for SAM.

Thanks

On Mon, 27 Aug 2018 at 15:41 John Roesler  wrote:

> Hey Joan,
>
> I was thinking more about this... Do any of the conversions in
> FunctionConversions convert to types that are used in the public Scala
> interface?
>
> If you've already checked, then carry on.
>
> Otherwise, we should leave public any that might be in use.
>
> Thanks,
> -John
>
> On Sat, Aug 25, 2018 at 12:19 PM Joan Goyeau  wrote:
>
> > Thanks Attila, it's done.
> >
> > On Sat, 25 Aug 2018 at 02:57 Ted Yu  wrote:
> >
> > > +1
> > >
> > > On Fri, Aug 24, 2018 at 5:17 PM Attila Sasvári 
> > > wrote:
> > >
> > > > Hi there,
> > > >
> > > > There is a conflicting KIP with the same number, see
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Allow+performance+tools+to+print+final+results+to+output+file
> > > >
> > > > Its discussion was started earlier, on August 23
> > > > https://www.mail-archive.com/dev@kafka.apache.org/msg91132.html and
> > KIP
> > > > page already includes it:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > >
> > > > Please update KIP number to resolve the conflict.
> > > >
> > > > Apart from this, +1 (non-binding) and thanks for the KIP!
> > > >
> > > > Regards,
> > > > - Attila
> > > >
> > > >
> > > > Guozhang Wang  (időpont: 2018. aug. 24., P,
> 20:26)
> > > ezt
> > > > írta:
> > > >
> > > > > +1 from me (binding).
> > > > >
> > > > > On Fri, Aug 24, 2018 at 11:24 AM, Joan Goyeau 
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > As pointed out in this comment #5539 (comment)
> > > > > >  >
> > > > "This
> > > > > > class was already defaulted to public visibility, and we can't
> > > retract
> > > > it
> > > > > > now, without a KIP.", the object FunctionConversions is only of
> > > > internal
> > > > > > use and therefore should be private to the lib only so that we
> can
> > do
> > > > > > changes without going through KIP like this one.
> > > > > >
> > > > > > Please make your vote.
> > > > > >
> > > > > > On Fri, 24 Aug 2018 at 19:14 John Roesler 
> > wrote:
> > > > > >
> > > > > > > I'm also in favor of this. I don't think it's controversial
> > either.
> > > > > > Should
> > > > > > > we just move to a vote?
> > > > > > >
> > > > > > > On Thu, Aug 23, 2018 at 7:01 PM Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1.
> > > > > > > >
> > > > > > > > On Thu, Aug 23, 2018 at 12:47 PM, Ted Yu <
> yuzhih...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > In the Motivation section, you can quote the comment from
> > pull
> > > > > > request
> > > > > > > so
> > > > > > > > > that reader doesn't have to click through.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Thu, Aug 23, 2018 at 12:13 PM Joan Goyeau <
> > j...@goyeau.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > As pointed out in this comment #5539 (comment)
> > > > > > > > > > <
> > > > https://github.com/apache/kafka/pull/5539#discussion_r212380648
> > > > > >
> > > > > > > the
> > > > > > > > > > object FunctionConversions is only of internal use and
> > > > therefore
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > private to the lib only so that we can do changes without
> > > going
> > > > > > > through
> > > > > > > > > KIP
> > > > > > > > > > like this one.
> > > > > > > > > >
> > > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Make+
> > > > > > > > > FunctionConversions+private
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-363: Make FunctionConversions private

2018-08-27 Thread Bill Bejeck
+1

-Bill

On Mon, Aug 27, 2018 at 10:41 AM John Roesler  wrote:

> Hey Joan,
>
> I was thinking more about this... Do any of the conversions in
> FunctionConversions convert to types that are used in the public Scala
> interface?
>
> If you've already checked, then carry on.
>
> Otherwise, we should leave public any that might be in use.
>
> Thanks,
> -John
>
> On Sat, Aug 25, 2018 at 12:19 PM Joan Goyeau  wrote:
>
> > Thanks Attila, it's done.
> >
> > On Sat, 25 Aug 2018 at 02:57 Ted Yu  wrote:
> >
> > > +1
> > >
> > > On Fri, Aug 24, 2018 at 5:17 PM Attila Sasvári 
> > > wrote:
> > >
> > > > Hi there,
> > > >
> > > > There is a conflicting KIP with the same number, see
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Allow+performance+tools+to+print+final+results+to+output+file
> > > >
> > > > Its discussion was started earlier, on August 23
> > > > https://www.mail-archive.com/dev@kafka.apache.org/msg91132.html and
> > KIP
> > > > page already includes it:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > >
> > > > Please update KIP number to resolve the conflict.
> > > >
> > > > Apart from this, +1 (non-binding) and thanks for the KIP!
> > > >
> > > > Regards,
> > > > - Attila
> > > >
> > > >
> > > > Guozhang Wang  (időpont: 2018. aug. 24., P,
> 20:26)
> > > ezt
> > > > írta:
> > > >
> > > > > +1 from me (binding).
> > > > >
> > > > > On Fri, Aug 24, 2018 at 11:24 AM, Joan Goyeau 
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > As pointed out in this comment #5539 (comment)
> > > > > >  >
> > > > "This
> > > > > > class was already defaulted to public visibility, and we can't
> > > retract
> > > > it
> > > > > > now, without a KIP.", the object FunctionConversions is only of
> > > > internal
> > > > > > use and therefore should be private to the lib only so that we
> can
> > do
> > > > > > changes without going through KIP like this one.
> > > > > >
> > > > > > Please make your vote.
> > > > > >
> > > > > > On Fri, 24 Aug 2018 at 19:14 John Roesler 
> > wrote:
> > > > > >
> > > > > > > I'm also in favor of this. I don't think it's controversial
> > either.
> > > > > > Should
> > > > > > > we just move to a vote?
> > > > > > >
> > > > > > > On Thu, Aug 23, 2018 at 7:01 PM Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1.
> > > > > > > >
> > > > > > > > On Thu, Aug 23, 2018 at 12:47 PM, Ted Yu <
> yuzhih...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > In the Motivation section, you can quote the comment from
> > pull
> > > > > > request
> > > > > > > so
> > > > > > > > > that reader doesn't have to click through.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Thu, Aug 23, 2018 at 12:13 PM Joan Goyeau <
> > j...@goyeau.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > As pointed out in this comment #5539 (comment)
> > > > > > > > > > <
> > > > https://github.com/apache/kafka/pull/5539#discussion_r212380648
> > > > > >
> > > > > > > the
> > > > > > > > > > object FunctionConversions is only of internal use and
> > > > therefore
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > private to the lib only so that we can do changes without
> > > going
> > > > > > > through
> > > > > > > > > KIP
> > > > > > > > > > like this one.
> > > > > > > > > >
> > > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Make+
> > > > > > > > > FunctionConversions+private
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-08-27 Thread xiongqi wu
Hi All,

Do you have any additional comments on this KIP?


On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu  wrote:

> on 2)
> The offsetmap is built starting from dirty segment.
> The compaction starts from the beginning of the log partition.  That's how
> it ensure the deletion of tomb keys.
> I will double check tomorrow.
>
> Xiongqi (Wesley) Wu
>
>
> On Thu, Aug 16, 2018 at 6:46 PM Brett Rann 
> wrote:
>
>> To just clarify a bit on 1.  whether there's an external storage/DB isn't
>> relevant here.
>> Compacted topics allow a tombstone record to be sent (a null value for a
>> key) which
>> currently will result in old values for that key being deleted if some
>> conditions are met.
>> There are existing controls to make sure the old values will stay around
>> for a minimum
>> time at least, but no dedicated control to ensure the tombstone will
>> delete
>> within a
>> maximum time.
>>
>> One popular reason that maximum time for deletion is desirable right now
>> is
>> GDPR with
>> PII. But we're not proposing any GDPR awareness in kafka, just being able
>> to guarantee
>> a max time where a tombstoned key will be removed from the compacted
>> topic.
>>
>> on 2)
>> huh, i thought it kept track of the first dirty segment and didn't
>> recompact older "clean" ones.
>> But I didn't look at code or test for that.
>>
>> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu  wrote:
>>
>> > 1, Owner of data (in this sense, kafka is the not the owner of data)
>> > should keep track of lifecycle of the data in some external storage/DB.
>> > The owner determines when to delete the data and send the delete
>> request to
>> > kafka. Kafka doesn't know about the content of data but to provide a
>> mean
>> > for deletion.
>> >
>> > 2 , each time compaction runs, it will start from first segments (no
>> > matter if it is compacted or not). The time estimation here is only used
>> > to determine whether we should run compaction on this log partition. So
>> we
>> > only need to estimate uncompacted segments.
>> >
>> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin  wrote:
>> >
>> > > Hey Xiongqi,
>> > >
>> > > Thanks for the update. I have two questions for the latest KIP.
>> > >
>> > > 1) The motivation section says that one use case is to delete PII
>> > (Personal
>> > > Identifiable information) data within 7 days while keeping non-PII
>> > > indefinitely in compacted format. I suppose the use-case depends on
>> the
>> > > application to determine when to delete those PII data. Could you
>> explain
>> > > how can application reliably determine the set of keys that should be
>> > > deleted? Is application required to always messages from the topic
>> after
>> > > every restart and determine the keys to be deleted by looking at
>> message
>> > > timestamp, or is application supposed to persist the key-> timstamp
>> > > information in a separate persistent storage system?
>> > >
>> > > 2) It is mentioned in the KIP that "we only need to estimate earliest
>> > > message timestamp for un-compacted log segments because the deletion
>> > > requests that belong to compacted segments have already been
>> processed".
>> > > Not sure if it is correct. If a segment is compacted before user sends
>> > > message to delete a key in this segment, it seems that we still need
>> to
>> > > ensure that the segment will be compacted again within the given time
>> > after
>> > > the deletion is requested, right?
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu 
>> > wrote:
>> > >
>> > > > Hi Xiaohe,
>> > > >
>> > > > Quick note:
>> > > > 1) Use minimum of segment.ms and max.compaction.lag.ms
>> > > > > > >
>> > > >
>> > > > 2) I am not sure if I get your second question. first, we have
>> jitter
>> > > when
>> > > > we roll the active segment. second, on each compaction, we compact
>> upto
>> > > > the offsetmap could allow. Those will not lead to perfect compaction
>> > > storm
>> > > > overtime. In addition, I expect we are setting
>> max.compaction.lag.ms
>> > on
>> > > > the order of days.
>> > > >
>> > > > 3) I don't have access to the confluent community slack for now. I
>> am
>> > > > reachable via the google handle out.
>> > > > To avoid the double effort, here is my plan:
>> > > > a) Collect more feedback and feature requriement on the KIP.
>> > > > b) Wait unitl this KIP is approved.
>> > > > c) I will address any additional requirements in the implementation.
>> > (My
>> > > > current implementation only complies to whatever described in the
>> KIP
>> > > now)
>> > > > d) I can share the code with the you and community see you want to
>> add
>> > > > anything.
>> > > > e) submission through committee
>> > > >
>> > > >
>> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG <
>> dannyriv...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Xiongqi
>> > > > >
>> > > > > Thanks for thinking about implementing this as well. :)
>> > > > >
>> > > > > I was 

Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Jakub Scholz
+1 (non-binding)

On Mon, Aug 27, 2018 at 6:24 PM Manikumar  wrote:

> Hi All,
>
> I would like to start voting on KIP-357 which allows to list ACLs per
> principal using AclCommand (kafka-acls.sh)
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-357%3A++Add+support+to+list+ACLs+per+principal
>
> Discussion Thread:
>
> https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E
>
> Thanks,
> Manikumar
>


Re: [VOTE] KIP-365: Materialized, Serialized, Joined, Consumed and Produced with implicit Serde

2018-08-27 Thread Ted Yu
+1

On Mon, Aug 27, 2018 at 12:18 PM John Roesler  wrote:

> +1 (non-binding)
>
> On Sat, Aug 25, 2018 at 1:16 PM Joan Goyeau  wrote:
>
> > Hi,
> >
> > We want to make sure that we always have a serde for all Materialized,
> > Serialized, Joined, Consumed and Produced.
> > For that we can make use of the implicit parameters in Scala.
> >
> > KIP:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-365%3A+Materialized%2C+Serialized%2C+Joined%2C+Consumed+and+Produced+with+implicit+Serde
> >
> > Github PR: https://github.com/apache/kafka/pull/5551
> >
> > Please make your votes.
> > Thanks
> >
>


Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2018-08-27 Thread Guozhang Wang
I like John's idea as well: for this KIP specifically as we do not expect
any other consumers to read the repartition topics externally, we can
slightly prefix the header to be safe, while keeping the additional cost
(note the header field is per-record, so any additional byte is per-record
as well) low.


Guozhang

On Tue, Aug 21, 2018 at 11:58 AM, Adam Bellemare 
wrote:

> Hi John
>
> That is an excellent idea. The header usage I propose would be limited
> entirely to internal topics, and this could very well be the solution to
> potential conflicts. If we do not officially reserve a prefix "__" then I
> think this would be the safest idea, as it would entirely avoid any
> accidents (perhaps if a company is using its own "__" prefix for other
> reasons).
>
> Thanks
>
> Adam
>
>
> On Tue, Aug 21, 2018 at 2:24 PM, John Roesler  wrote:
>
> > Just a quick thought regarding headers:
> > > I think there is no absolute-safe ways to avoid conflicts, but we can
> > still
> > > consider using some name patterns to reduce the likelihood as much as
> > > possible.. e.g. consider sth. like the internal topics naming: e.g.
> > > "__internal_[name]"?
> >
> > I think there is a safe way to avoid conflicts, since these headers are
> > only needed in internal topics (I think):
> > For internal and changelog topics, we can namespace all headers:
> > * user-defined headers are namespaced as "external." + headerKey
> > * internal headers are namespaced as "internal." + headerKey
> >
> > This is a lot of characters, so we could use a sigil instead (e.g., "_"
> for
> > internal, "~" for external)
> >
> > We simply apply the namespacing when we read user headers from external
> > topics into the topology and then de-namespace them before we emit them
> to
> > an external topic (via "to" or "through").
> > Now, it is not possible to collide with user-defined headers.
> >
> > That said, I'd also be fine with just reserving "__" as a header prefix
> and
> > not worrying about collisions.
> >
> > Thanks for the KIP,
> > -John
> >
> > On Tue, Aug 21, 2018 at 9:48 AM Jan Filipiak 
> > wrote:
> >
> > > Still havent completly grabbed it.
> > > sorry will read more
> > >
> > > On 17.08.2018 21:23, Jan Filipiak wrote:
> > > > Cool stuff.
> > > >
> > > > I made some random remarks. Did not touch the core of the algorithm
> > yet.
> > > >
> > > > Will do Monday 100%
> > > >
> > > > I don't see Interactive Queries :) like that!
> > > >
> > > >
> > > >
> > > >
> > > > On 17.08.2018 20:28, Adam Bellemare wrote:
> > > >> I have submitted a PR with my code against trunk:
> > > >> https://github.com/apache/kafka/pull/5527
> > > >>
> > > >> Do I continue on this thread or do we begin a new one for
> discussion?
> > > >>
> > > >> On Thu, Aug 16, 2018 at 1:40 AM, Jan Filipiak <
> > jan.filip...@trivago.com
> > > >
> > > >> wrote:
> > > >>
> > > >>> even before message headers, the option for me always existed to
> > > >>> just wrap
> > > >>> the messages into my own custom envelop.
> > > >>> So I of course thought this through. One sentence in your last
> email
> > > >>> triggered all the thought process I put in the back then
> > > >>> again to design it in the, what i think is the "kafka-way". It
> ended
> > up
> > > >>> ranting a little about what happened in the past.
> > > >>>
> > > >>> I see plenty of colleagues of mine falling into traps in the API,
> > > >>> that I
> > > >>> did warn about in the 1.0 DSL rewrite. I have the same
> > > >>> feeling again. So I hope it gives you some insights into my though
> > > >>> process. I am aware that since i never ported 213 to higher
> > > >>> streams version, I don't really have a steak here and initially I
> > > >>> didn't
> > > >>> feel like actually sending it. But maybe you can pull
> > > >>> something good from it.
> > > >>>
> > > >>>   Best jan
> > > >>>
> > > >>>
> > > >>>
> > > >>> On 15.08.2018 04:44, Adam Bellemare wrote:
> > > >>>
> > >  @Jan
> > >  Thanks Jan. I take it you mean "key-widening" somehow includes
> > >  information
> > >  about which record is processed first? I understand about a
> > >  CombinedKey
> > >  with both the Foreign and Primary key, but I don't see how you
> track
> > >  ordering metadata in there unless you actually included a metadata
> > >  field
> > >  in
> > >  the key type as well.
> > > 
> > >  @Guozhang
> > >  As Jan mentioned earlier, is Record Headers mean to strictly be
> > >  used in
> > >  just the user-space? It seems that it is possible that a collision
> > >  on the
> > >  (key,value) tuple I wish to add to it could occur. For instance,
> if
> > I
> > >  wanted to add a ("foreignKeyOffset",10) to the Headers but the
> user
> > >  already
> > >  specified their own header with the same key name, then it appears
> > >  there
> > >  would be a collision. (This is one of the issues I brought up in
> > >  the KIP).
> > > 
> > >  

Re: [VOTE] KIP-365: Materialized, Serialized, Joined, Consumed and Produced with implicit Serde

2018-08-27 Thread John Roesler
+1 (non-binding)

On Sat, Aug 25, 2018 at 1:16 PM Joan Goyeau  wrote:

> Hi,
>
> We want to make sure that we always have a serde for all Materialized,
> Serialized, Joined, Consumed and Produced.
> For that we can make use of the implicit parameters in Scala.
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-365%3A+Materialized%2C+Serialized%2C+Joined%2C+Consumed+and+Produced+with+implicit+Serde
>
> Github PR: https://github.com/apache/kafka/pull/5551
>
> Please make your votes.
> Thanks
>


[jira] [Resolved] (KAFKA-4065) Missing Property in ProducerConfig.java - KafkaProducer API 0.9.0.0

2018-08-27 Thread Manikumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4065.
--
Resolution: Auto Closed

Closing inactive issue.  It is highly unlikely to add this feature. 

> Missing Property in ProducerConfig.java - KafkaProducer API 0.9.0.0
> ---
>
> Key: KAFKA-4065
> URL: https://issues.apache.org/jira/browse/KAFKA-4065
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: manzar
>Priority: Major
>
> 1 ) "compressed.topics" property is missing in ProducerConfig.java in 
> KafkaProducer API 0.9.0.0. due to that we can't enable some specific topic 
> for compression.
> 2) "compression.type" property is there in ProducerConfig.java that was 
> expected to be "compression.codec" according to official document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-3369) Add unit-tests to the Metadata Cache

2018-08-27 Thread Manikumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-3369.
--
Resolution: Fixed

Closing as per above comment.

> Add unit-tests to the Metadata Cache
> 
>
> Key: KAFKA-3369
> URL: https://issues.apache.org/jira/browse/KAFKA-3369
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Gwen Shapira
>Priority: Major
>
> There are no tests that validate the Metadata cache in the brokers directly.
> Since this is critical code, it will be nice to add some validation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Ted Yu
+1

On Mon, Aug 27, 2018 at 11:42 AM Priyank Shah  wrote:

> +1 (Non-binding)
>
> On 8/27/18, 9:24 AM, "Manikumar"  wrote:
>
> Hi All,
>
> I would like to start voting on KIP-357 which allows to list ACLs per
> principal using AclCommand (kafka-acls.sh)
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-357%3A++Add+support+to+list+ACLs+per+principal
>
> Discussion Thread:
>
> https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E
>
> Thanks,
> Manikumar
>
>
>


Re: [VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Priyank Shah
+1 (Non-binding)

On 8/27/18, 9:24 AM, "Manikumar"  wrote:

Hi All,

I would like to start voting on KIP-357 which allows to list ACLs per
principal using AclCommand (kafka-acls.sh)

KIP:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-357%3A++Add+support+to+list+ACLs+per+principal

Discussion Thread:

https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E

Thanks,
Manikumar




[VOTE] KIP-357: Add support to list ACLs per principal

2018-08-27 Thread Manikumar
Hi All,

I would like to start voting on KIP-357 which allows to list ACLs per
principal using AclCommand (kafka-acls.sh)

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-357%3A++Add+support+to+list+ACLs+per+principal

Discussion Thread:
https://lists.apache.org/thread.html/dc7f6005845a372a0a48a40872a32d9ece03807a4fb1bb89d3645afb@%3Cdev.kafka.apache.org%3E

Thanks,
Manikumar


Re: [EXTERNAL] [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-08-27 Thread McCaig, Rhys
Randall,

This KIP looks great to me. As for _updating_ topic configs - It’s a nice to 
have but certainly something that I could live without in order to get this KIP 
implemented. (Its not something I would use in my current setup but I can see 
some cases where it could be part of the workflow for mirrored topics).
If it were to be included, I’d be happier to see it hidden behind a config flag 
- (if topic already exists, can be an option to WARN/FAIL or change the topic, 
where the default would be warn?)

Cheers,
Rhys

> On Aug 21, 2018, at 10:58 PM, Randall Hauch  wrote:
> 
> Okay, after much delay let's try this again for AK 2.1. Has anyone found
> any concerns? Stephane suggested that we allow updating topic
> configurations (everything but partition count). I'm unconvinced that it's
> worth the additional complexity in the implementation and the documentation
> to explain the behavior. Changing several of the topic-specific
> configurations have significant impact on broker behavior / functionality,
> so IMO we need to proceed more cautiously.
> 
> Stephane, do you have a particular use case in mind for updating topic
> configurations on an existing topic?
> 
> Randall
> 
> 
> On Fri, Jan 26, 2018 at 4:20 PM Randall Hauch  wrote:
> 
>> The KIP deadline for 1.1 has already passed, but I'd like to restart this
>> discussion so that we make the next release. I've not yet addressed the
>> previous comment about *existing* topics, but I'll try to do that over the
>> next few weeks. Any other comments/suggestions/questions?
>> 
>> Best regards,
>> 
>> Randall
>> 
>> On Thu, Oct 5, 2017 at 12:13 AM, Randall Hauch  wrote:
>> 
>>> Oops. Yes, I meant “replication factor”.
>>> 
 On Oct 4, 2017, at 7:18 PM, Ted Yu  wrote:
 
 Randall:
 bq. AdminClient currently allows changing the replication factory.
 
 By 'replication factory' did you mean 'replication factor' ?
 
 Cheers
 
> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch 
>>> wrote:
> 
> Currently the KIP's scope is only topics that don't yet exist, and we
>>> have
> to cognizant of race conditions between tasks with the same connector.
>>> I
> think it is worthwhile to consider whether the KIP's scope should
>>> expand to
> also address *existing* partitions, though it may not be appropriate to
> have as much control when changing the topic settings for an existing
> topic. For example, changing the number of partitions (which the KIP
> considers a "topic-specific setting" even though technically it is not)
> shouldn't be done blindly due to the partitioning impacts, and IIRC you
> can't reduce them (which we could verify before applying). Also, I
>>> don't
> think the AdminClient currently allows changing the replication
>>> factory. I
> think changing the topic configs is less problematic both from what
>>> makes
> sense for connectors to verify/change and from what the AdminClient
> supports.
> 
> Even if we decide that it's not appropriate to change the settings on
>>> an
> existing topic, I do think it's advantageous to at least notify the
> connector (or task) prior to the first record sent to a given topic so
>>> that
> the connector can fail or issue a warning if it doesn't meet its
> requirements.
> 
> Best regards,
> 
> Randall
> 
> On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
> 
>> Hi Randall,
>> 
>> Thanks for the KIP. I like it
>> What happens when the target topic is already created but the configs
>>> do
>> not match?
>> i.e. wrong RF, num partitions, or missing / additional configs? Will
>>> you
>> attempt to apply the necessary changes or throw an error?
>> 
>> Thanks!
>> Stephane
>> 
>> 
>> On 24/5/17, 5:59 am, "Mathieu Fenniak" >>> 
>> wrote:
>> 
>>   Ah, yes, I see you a highlighted part that should've made this
>>> clear
>>   to me the first read. :-)  Much clearer now!
>> 
>>   By the way, enjoyed your Debezium talk in NYC.
>> 
>>   Looking forward to this Kafka Connect change; it will allow me to
>>   remove a post-deployment tool that I hacked together for the
>>> purpose
>>   of ensuring auto-created topics have the right config.
>> 
>>   Mathieu
>> 
>> 
>>   On Tue, May 23, 2017 at 11:38 AM, Randall Hauch 
>> wrote:
>>> Thanks for the quick feedback, Mathieu. Yes, the first
> configuration
>> rule
>>> whose regex matches will be applied, and no other rules will be
>> used. I've
>>> updated the KIP to try to make this more clear, but let me know if
>> it's
>>> still not clear.
>>> 
>>> Best regards,
>>> 
>>> Randall
>>> 
>>> On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
>>> mathieu.fenn...@replicon.com> wrote:
>>> 
 Hi Randall,
 

Re: Follow-up on KAFKA-7286

2018-08-27 Thread John Roesler
Hi Flavien,

As far as I'm concerned, it's perfectly appropriate to keep pinging
periodically if no one responds to your ticket. I'm sorry no one commented
on your ticket or PR!

I have to admit that I don't know much about the coordinator, but I do have
one question I'll leave a comment on the PR.

Thanks for the contribution!
-John

On Mon, Aug 27, 2018 at 3:54 AM Flavien Raynaud 
wrote:

> Hi there,
>
>
> Not sure if this is expected behaviour (according to the contributing guide
> , it seems to be), let me know if
> not.
>
>
>
> I opened an issue some days ago (KAFKA-7286
> ) along with a related
> Github PR . However, it looks
> like nobody had the chance to look at it yet. Just dropping an email here
> to ask if anyone would have the bandwidth to do so? Or letting me know
> about the expected procedure to follow. 
>
>
> Cheers,
>
> Flavien
>


Re: [VOTE] KIP-336: Consolidate ExtendedSerializer/Serializer and ExtendedDeserializer/Deserializer

2018-08-27 Thread Ismael Juma
Thanks Viktor. I think it would be good to verify that existing
ExtendedSerializer implementations work without recompiling. This could be
done as a manual test. If you agree, I suggest adding it to the testing
plan section.

Ismael

On Mon, Aug 27, 2018 at 7:57 AM Viktor Somogyi-Vass 
wrote:

> Thanks guys, I've updated my KIP with this info (so to keep solution #1).
> If you find it good enough, please vote as well or let me know if you think
> something is missing.
>
> On Sat, Aug 25, 2018 at 1:14 AM Ismael Juma  wrote:
>
> > I'm OK with 1 too. It makes me a bit sad that we don't have a path for
> > removing the method without headers, but it seems like the simplest and
> > least confusing option (I am assuming that headers are not needed in the
> > serializers in the common case).
> >
> > Ismael
> >
> > On Fri, Aug 24, 2018 at 2:42 PM Jason Gustafson 
> > wrote:
> >
> > > Hey Viktor,
> > >
> > > Good summary. I agree that option 1) seems like the simplest choice
> and,
> > as
> > > you note, we can always add the default implementation later. I'll
> leave
> > > Ismael to make a case for the circular forwarding approach ;)
> > >
> > > -Jason
> > >
> > > On Fri, Aug 24, 2018 at 3:02 AM, Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com> wrote:
> > >
> > > > I think in the first draft I didn't provide an implementation for
> them
> > as
> > > > it seemed very simple and straightforward. I looked up a couple of
> > > > implementations of the ExtendedSerializers on github and the general
> > > > behavior seems to be that they delegate to the 2 argument
> (headerless)
> > > > method:
> > > >
> > > > https://github.com/khoitnm/practice-spring-kafka-grpc/blob/
> > > > a6fc9b3395762c4889807baedd822f4653d5dcdd/kafka-common/src/
> > > > main/java/org/tnmk/common/kafka/serialization/protobuf/
> > > > ProtobufSerializer.java
> > > >
> https://github.com/hong-zhu/nxgen/blob/5cf1427d4e1a8f5c7fab47955af32a
> > > > 0d4f4873af/nxgen-kafka-client/src/main/java/nxgen/kafka/
> > > > client/event/serdes/EventSerializer.java
> > > > https://github.com/jerry-jx/spring-kafka/blob/
> > > > ac323ec5b8b9a0ca975db2f7322ff6878fce481a/spring-kafka/src/
> > > >
> > >
> >
> main/java/org/springframework/kafka/support/serializer/JsonSerializer.java
> > > > https://github.com/enzobonggio/nonblocking-kafka/blob/
> > > > bc1a379b2d9593b46cf9604063bc5b38e2785d19/src/main/java/com/
> > > > example/kafka/producer/CustomJsonSerializer.java
> > > >
> > > > Of course 4 example is not representative but it shows that these
> users
> > > > usually delegate to the "headerless" (2 argument) method. I've tried
> to
> > > > look it up on other code search sites but haven't had much luck so
> far.
> > > > Given these examples and the way they implement them I'd say it's
> more
> > > > common to delegate to the headerless method, that's why I think it's
> a
> > > good
> > > > approach for us too. Now having a default implementation for that is
> > > again
> > > > a good question. I think current use cases wouldn't change in either
> > case
> > > > (unless we deprecate the headerless one).
> > > > For the new use cases it depends what do we want to propagate going
> > > > forward. Do we want only one method to exist or two? As Ismael
> > > highlighted
> > > > it might be confusing if we have 2 methods, both with default
> > > > implementation and in this case we want to push the 3 argument one
> for
> > > > users.
> > > >
> > > > So I see three possible ways:
> > > > 1.) Don't provide a default implementation for the headerless method.
> > > This
> > > > supports the current implementations and encourages the delegation
> > style
> > > in
> > > > future implementations. This might be the simplest option.
> > > > 2.) Provide a default implementation for the headerless method. This
> > > would
> > > > be a bit confusing, so we'd likely push the use of the 3 parameter
> > method
> > > > and deprecate the headerless. This would however further litter the
> > code
> > > > base with deprecation warnings as we're using the headerless method
> in
> > a
> > > > lot of places (think of the current serializers/deserializers). So in
> > > this
> > > > case we would want to clean up the code base a little where we can
> and
> > > may
> > > > remove the headerless method entirely in Kafka 3. But they would hang
> > > > around until that point. I think in this case the implementation for
> > the
> > > > headerless is a detail question as that is deprecated so we don't
> > expect
> > > > new implementations to use that method.
> > > > If we decide to move this way, we have explored two options so far:
> > > > returning null / empty array or throwing exceptions. (And I honestly
> > > > started to like the latter as calling that with no real
> implementation
> > is
> > > > really a programming error.)
> > > > 3.) We can do it in multiple steps. In the first step we do 1 and
> later
> > > 2.
> > > > I think it would also make sense as the Kafka code base heavily uses
> 

Re: [VOTE] KIP-336: Consolidate ExtendedSerializer/Serializer and ExtendedDeserializer/Deserializer

2018-08-27 Thread Viktor Somogyi-Vass
Thanks guys, I've updated my KIP with this info (so to keep solution #1).
If you find it good enough, please vote as well or let me know if you think
something is missing.

On Sat, Aug 25, 2018 at 1:14 AM Ismael Juma  wrote:

> I'm OK with 1 too. It makes me a bit sad that we don't have a path for
> removing the method without headers, but it seems like the simplest and
> least confusing option (I am assuming that headers are not needed in the
> serializers in the common case).
>
> Ismael
>
> On Fri, Aug 24, 2018 at 2:42 PM Jason Gustafson 
> wrote:
>
> > Hey Viktor,
> >
> > Good summary. I agree that option 1) seems like the simplest choice and,
> as
> > you note, we can always add the default implementation later. I'll leave
> > Ismael to make a case for the circular forwarding approach ;)
> >
> > -Jason
> >
> > On Fri, Aug 24, 2018 at 3:02 AM, Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com> wrote:
> >
> > > I think in the first draft I didn't provide an implementation for them
> as
> > > it seemed very simple and straightforward. I looked up a couple of
> > > implementations of the ExtendedSerializers on github and the general
> > > behavior seems to be that they delegate to the 2 argument (headerless)
> > > method:
> > >
> > > https://github.com/khoitnm/practice-spring-kafka-grpc/blob/
> > > a6fc9b3395762c4889807baedd822f4653d5dcdd/kafka-common/src/
> > > main/java/org/tnmk/common/kafka/serialization/protobuf/
> > > ProtobufSerializer.java
> > > https://github.com/hong-zhu/nxgen/blob/5cf1427d4e1a8f5c7fab47955af32a
> > > 0d4f4873af/nxgen-kafka-client/src/main/java/nxgen/kafka/
> > > client/event/serdes/EventSerializer.java
> > > https://github.com/jerry-jx/spring-kafka/blob/
> > > ac323ec5b8b9a0ca975db2f7322ff6878fce481a/spring-kafka/src/
> > >
> >
> main/java/org/springframework/kafka/support/serializer/JsonSerializer.java
> > > https://github.com/enzobonggio/nonblocking-kafka/blob/
> > > bc1a379b2d9593b46cf9604063bc5b38e2785d19/src/main/java/com/
> > > example/kafka/producer/CustomJsonSerializer.java
> > >
> > > Of course 4 example is not representative but it shows that these users
> > > usually delegate to the "headerless" (2 argument) method. I've tried to
> > > look it up on other code search sites but haven't had much luck so far.
> > > Given these examples and the way they implement them I'd say it's more
> > > common to delegate to the headerless method, that's why I think it's a
> > good
> > > approach for us too. Now having a default implementation for that is
> > again
> > > a good question. I think current use cases wouldn't change in either
> case
> > > (unless we deprecate the headerless one).
> > > For the new use cases it depends what do we want to propagate going
> > > forward. Do we want only one method to exist or two? As Ismael
> > highlighted
> > > it might be confusing if we have 2 methods, both with default
> > > implementation and in this case we want to push the 3 argument one for
> > > users.
> > >
> > > So I see three possible ways:
> > > 1.) Don't provide a default implementation for the headerless method.
> > This
> > > supports the current implementations and encourages the delegation
> style
> > in
> > > future implementations. This might be the simplest option.
> > > 2.) Provide a default implementation for the headerless method. This
> > would
> > > be a bit confusing, so we'd likely push the use of the 3 parameter
> method
> > > and deprecate the headerless. This would however further litter the
> code
> > > base with deprecation warnings as we're using the headerless method in
> a
> > > lot of places (think of the current serializers/deserializers). So in
> > this
> > > case we would want to clean up the code base a little where we can and
> > may
> > > remove the headerless method entirely in Kafka 3. But they would hang
> > > around until that point. I think in this case the implementation for
> the
> > > headerless is a detail question as that is deprecated so we don't
> expect
> > > new implementations to use that method.
> > > If we decide to move this way, we have explored two options so far:
> > > returning null / empty array or throwing exceptions. (And I honestly
> > > started to like the latter as calling that with no real implementation
> is
> > > really a programming error.)
> > > 3.) We can do it in multiple steps. In the first step we do 1 and later
> > 2.
> > > I think it would also make sense as the Kafka code base heavily uses
> the
> > > headerless method still (think of the existing
> serializers/deserializers)
> > > and it would give us time to eliminate/change those use cases.
> > >
> > > Cheers,
> > > Viktor
> > >
> > > On Thu, Aug 23, 2018 at 11:55 PM Jason Gustafson 
> > > wrote:
> > >
> > > > To clarify, what I am suggesting is to only remove the default
> > > > implementation for these methods. So users would be required to
> > implement
> > > > serialize(topic, data) and deserialize(topic, data).
> > > >
> > > > -Jason
> > > >
> > > > On 

Re: [VOTE] KIP-363: Make FunctionConversions private

2018-08-27 Thread John Roesler
Hey Joan,

I was thinking more about this... Do any of the conversions in
FunctionConversions convert to types that are used in the public Scala
interface?

If you've already checked, then carry on.

Otherwise, we should leave public any that might be in use.

Thanks,
-John

On Sat, Aug 25, 2018 at 12:19 PM Joan Goyeau  wrote:

> Thanks Attila, it's done.
>
> On Sat, 25 Aug 2018 at 02:57 Ted Yu  wrote:
>
> > +1
> >
> > On Fri, Aug 24, 2018 at 5:17 PM Attila Sasvári 
> > wrote:
> >
> > > Hi there,
> > >
> > > There is a conflicting KIP with the same number, see
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Allow+performance+tools+to+print+final+results+to+output+file
> > >
> > > Its discussion was started earlier, on August 23
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg91132.html and
> KIP
> > > page already includes it:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >
> > > Please update KIP number to resolve the conflict.
> > >
> > > Apart from this, +1 (non-binding) and thanks for the KIP!
> > >
> > > Regards,
> > > - Attila
> > >
> > >
> > > Guozhang Wang  (időpont: 2018. aug. 24., P, 20:26)
> > ezt
> > > írta:
> > >
> > > > +1 from me (binding).
> > > >
> > > > On Fri, Aug 24, 2018 at 11:24 AM, Joan Goyeau 
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > As pointed out in this comment #5539 (comment)
> > > > > 
> > > "This
> > > > > class was already defaulted to public visibility, and we can't
> > retract
> > > it
> > > > > now, without a KIP.", the object FunctionConversions is only of
> > > internal
> > > > > use and therefore should be private to the lib only so that we can
> do
> > > > > changes without going through KIP like this one.
> > > > >
> > > > > Please make your vote.
> > > > >
> > > > > On Fri, 24 Aug 2018 at 19:14 John Roesler 
> wrote:
> > > > >
> > > > > > I'm also in favor of this. I don't think it's controversial
> either.
> > > > > Should
> > > > > > we just move to a vote?
> > > > > >
> > > > > > On Thu, Aug 23, 2018 at 7:01 PM Guozhang Wang <
> wangg...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1.
> > > > > > >
> > > > > > > On Thu, Aug 23, 2018 at 12:47 PM, Ted Yu 
> > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > In the Motivation section, you can quote the comment from
> pull
> > > > > request
> > > > > > so
> > > > > > > > that reader doesn't have to click through.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Thu, Aug 23, 2018 at 12:13 PM Joan Goyeau <
> j...@goyeau.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > As pointed out in this comment #5539 (comment)
> > > > > > > > > <
> > > https://github.com/apache/kafka/pull/5539#discussion_r212380648
> > > > >
> > > > > > the
> > > > > > > > > object FunctionConversions is only of internal use and
> > > therefore
> > > > > > should
> > > > > > > > be
> > > > > > > > > private to the lib only so that we can do changes without
> > going
> > > > > > through
> > > > > > > > KIP
> > > > > > > > > like this one.
> > > > > > > > >
> > > > > > > > > KIP:
> > > > > > > > >
> > > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Make+
> > > > > > > > FunctionConversions+private
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Follow-up on KAFKA-7286

2018-08-27 Thread Flavien Raynaud
Hi there,


Not sure if this is expected behaviour (according to the contributing guide
, it seems to be), let me know if not.



I opened an issue some days ago (KAFKA-7286
) along with a related
Github PR . However, it looks
like nobody had the chance to look at it yet. Just dropping an email here
to ask if anyone would have the bandwidth to do so? Or letting me know
about the expected procedure to follow. 


Cheers,

Flavien


Re: [DISCUSS] KIP-358: Migrate Streams API to Duration instead of long ms times

2018-08-27 Thread Nikolay Izhikov
Hello, Matthias, John.

Thanks in advance.

> I wanted to let you know that we have dropped the `grace(long)` method from 
> the Windows interface

`grace(long)` removed from the KIP.

> It seems like, if we want to use long millis internally, then we just need to 
> leave Windows alone.

`Windows` removed from proposed API changes.

> In SessionWindows, inactivityGap is Streams-facing.

`inactivityGap` removed from proposed API changes.

> it seems the KIP does not mention `Punctuator#punctuate(long)` should we add 
> it?

Actually, I think we shouldn't do it.

1. If I understand correctly, user callback may be called every 1 millisecond 
and many callbacks can be instantiated.
Do we want to wrap every `long timestamp` into Instant in that case?

2. If we introduce a new method `Punctuator.punctuate(Instant timestamp` 
we should either break backward compatibility with new interface method or 
provide default implementation:

public interface Punctuator {
void punctuate(Instant timestmp);

default void punctuate(Instant timestamp) {
punctuate(timestamp.toEpochMilli());
}
}

This doesn't seem right to me.
What do you think?

> I think it's best, if the KIPs gets update with a proposal on how to handle 
> "dual use" parts. 
> It's easier to discuss if it's written down IMHO.

My proposal(copy of "Proposed Changes" section from KIP):

For the methods that used both: internally and as a part of public API the 
proposal is:

1. In this scope keep existing methods as is. 
   Try to reduce the visibility of methods in next tickets.
2. Introduce finer methods with Instant and Duration.

В Пт, 24/08/2018 в 10:36 -0700, Matthias J. Sax пишет:
> It's tricky... :)
> 
> Some APIs have "dual use" as I mentioned in my first reply. I agree that
> it would be good to avoid abstract class and use interfaces if possible.
> As long as the change is source code compatible, it should be fine IMHO
> -- we need to document binary incompatibility of course.
> 
> I think it's best, if the KIPs gets update with a proposal on how to
> handle "dual use" parts. It's easier to discuss if it's written down IMHO.
> 
> For `ProcessorContext#schedule()`, you are right John: it's seems fine
> to use `Duration`, as it won't be called often (usually only within
> `Processor#init()`) -- I mixed it up with `Punctuator#punctuate(long)`.
> However, thinking about this twice, we might even want to update both
> methods. Punctuation callbacks don't happen every millisecond and thus
> the overhead to use `Instance` should not be a problem.
> 
> @Nikolay: it seems the KIP does not mention `Punctuator#punctuate(long)`
> -- should we add it?
> 
> 
> -Matthias
> 
> 
> On 8/24/18 10:11 AM, John Roesler wrote:
> > Quick afterthought: I guess that `Window` is exposed to the API via
> > `Windowed` keys. I think it would be fine to not deprecate the `long` start
> > and end, but add `Instant` variants for people preferring that interface.
> > 
> > On Fri, Aug 24, 2018 at 11:10 AM John Roesler  wrote:
> > 
> > > Hey Matthias,
> > > 
> > > Thanks for pointing that out. I agree that we only really need to change
> > > methods that are API-facing, and we probably want to avoid using
> > > Duration/Instant for Streams-facing members.
> > > 
> > > Like I said in my last email, I think the whole Windows interface is
> > > Streams-facing, and the builders we provide are otherwise API-facing.
> > > Likewise, `Window` is Streams-facing, so start and end should not use
> > > Duration. In SessionWindows, inactivityGap is Streams-facing.
> > > 
> > > I actually think that ProcessorContext#schedule() is API-facing, so it
> > > should use Duration. The rationale is that streams processing doesn't call
> > > this method, only implementer of Processor do. Does that seem right?
> > > 
> > > Also, it seems like  ReadOnlyWindowStore#fetch() (2x) and #fetchAll() are
> > > API-facing (for IQ). When we call fetch() during processing, it's actually
> > > `WindowStore#fetch()`. Maybe we should move "WindowStoreIterator 
> > > fetch(K
> > > key, long timeFrom, long timeTo)" to the WindowStore interface and make
> > > all the ReadOnlyWindowStore methods take Durations. And likewise with the
> > > SessionStore interfaces.
> > > 
> > > What do you think?
> > > 
> > > Thanks,
> > > -John
> > > 
> > > 
> > > 
> > > 
> > > On Fri, Aug 24, 2018 at 10:51 AM John Roesler  wrote:
> > > 
> > > > Hi Nikolay,
> > > > 
> > > > First: I wanted to let you know that we have dropped the `grace(long)`
> > > > method from the Windows interface, but we do still need to transition 
> > > > the
> > > > same method on TimeWindows and JoinWindows (
> > > > https://github.com/apache/kafka/pull/5536)
> > > > 
> > > > I have also been thinking it would be nice to replace `Windows` with an
> > > > interface, but for different reasons. I think we can even do it without
> > > > breaking source compatibility (but it would break binary compatibility):
> > > > create a new