Re: [kafka-clients] Re: [VOTE] 2.3.1 RC0

2019-10-04 Thread Matt Farmer
Do we have an ETA on when y'all think 2.3.1 will land?

On Sat, Sep 28, 2019 at 1:55 PM Matthias J. Sax 
wrote:

> There was a recent report about vulnerabilities of some dependent
> libraries: https://issues.apache.org/jira/browse/KAFKA-8952
>
> I think we should fix this for 2.3.1.
>
> Furthermore, we identified the root cause of
> https://issues.apache.org/jira/browse/KAFKA-8649 -- it seems to be a
> critical issue because it affects upgrading of Kafka Streams
> applications. We plan to do a PR asap and hope we can include it in 2.3.1.
>
>
> -Matthias
>
> On 9/25/19 11:57 AM, David Arthur wrote:
> > Thanks, Jason. I agree we should include this. I'll produce RC1 once
> > this patch is available.
> >
> > -David
> >
> > On Tue, Sep 24, 2019 at 6:02 PM Jason Gustafson  > > wrote:
> >
> > Hi David,
> >
> > Thanks for running the release. I think we should consider getting
> > this bug
> > fixed: https://issues.apache.org/jira/browse/KAFKA-8896. The impact
> > of this
> > bug is that consumer groups cannot commit offsets or rebalance. The
> > patch
> > should be ready shortly.
> >
> > Thanks,
> > Jason
> >
> >
> >
> > On Fri, Sep 13, 2019 at 3:53 PM David Arthur  > > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > >
> > > This is the first candidate for release of Apache Kafka 2.3.1 which
> > > includes many bug fixes for Apache Kafka 2.3.
> > >
> > >
> > > Release notes for the 2.3.1 release:
> > >
> > >
> >
> https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html
> > >
> > >
> > > *** Please download, test and vote by Wednesday, September 18, 9am
> PT
> > >
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > >
> > > https://kafka.apache.org/KEYS
> > >
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > >
> > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/
> > >
> > >
> > > * Maven artifacts to be voted upon:
> > >
> > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > >
> > > * Javadoc:
> > >
> > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/
> > >
> > >
> > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> > >
> > > https://github.com/apache/kafka/releases/tag/2.3.1-rc0
> > >
> > >
> > > * Documentation:
> > >
> > > https://kafka.apache.org/23/documentation.html
> > >
> > >
> > > * Protocol:
> > >
> > > https://kafka.apache.org/23/protocol.html
> > >
> > >
> > > * Successful Jenkins builds for the 2.3 branch:
> > >
> > > Unit/integration tests:
> https://builds.apache.org/job/kafka-2.3-jdk8/
> > >
> > > System tests:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119
> > >
> > >
> > >
> > > We have yet to get a successful unit/integration job run due to
> > some flaky
> > > failures. I will send out a follow-up email once we have a passing
> > build.
> > >
> > >
> > > Thanks!
> > >
> > > David
> > >
> >
> >
> >
> > --
> > David Arthur
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to kafka-clients+unsubscr...@googlegroups.com
> > .
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com
> > <
> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com?utm_medium=email_source=footer
> >.
>
>


Issues with partition assignments after Kafka Connect 2.3 Upgrade

2019-09-17 Thread Matt Farmer
Hello all,

After an upgrade to Kafka Connect 2.3.0, we've started seeing an abundance
of Exceptions pertaining to Kafka Connect Workers and their partition
assignments. The exception is:

java.lang.IllegalStateException: No current assignment for partition
[redacted]
at
org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:323)
at
org.apache.kafka.clients.consumer.internals.SubscriptionState.seekUnvalidated(SubscriptionState.java:340)
at
org.apache.kafka.clients.consumer.KafkaConsumer.seek(KafkaConsumer.java:1550)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.rewind(WorkerSinkTask.java:574)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.access$1200(WorkerSinkTask.java:67)
at
org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:653)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:285)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:424)
at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
at
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:443)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:316)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

We have multiple workers that will fail with this. Most of them appear to
be running the WePay BigQuery Sink connector. Has anyone else been having
problems with this? Any idea as to what could be going on and causing us to
have no partitions in the onPartitionsAssigned handler?

Thanks,
Matt Farmer


Re: Kafka official docker image

2019-01-23 Thread Matt Farmer
Would it be possible to get official images that aren't Confluent branded
and only include the things that are maintained by under the scope of the
Apache PMC? IIRC, the Confluent images typically include extras that are
maintained solely by Confluent. (e.g. implementation of code to work with
Confluent Schema Registry) This is fine for most cases, but some
organizations care a lot about clean delineations of Intellectual Property
in upstream images. Having images that are exclusively those things that
come from the apache/kafka repo has some value to those organizations.

On Tue, Jan 15, 2019 at 6:19 PM M. Manna  wrote:

> There should be confluent enterprise image-it should work without control
> centre etc. and basic open source items should stay the same.
>
> Could you kindly check ?
>
> On Tue, 15 Jan 2019 at 23:06, Олег Иванов  wrote:
>
> > Hi,
> >
> > Could you please create an official docker image of kafka? There are a
> lot
> > custom images in the dockerhub, but company's security policy allows only
> > official images.
> >
> > Thanks!
> >
>


Re: [DISCUSS] KIP-399: Extend ProductionExceptionHandler to cover serialization exceptions

2018-12-11 Thread Matt Farmer
Sorry for the delay in reply here. My personal life is a bit interesting at
the moment so keeping up with email seems more and more impossible. :)

That all makes sense to me! Thanks for entertaining the questions!

On Thu, Dec 6, 2018 at 1:01 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Matt,
> I agree with Matthias on not to altering the serializer as it's used by
> multiple components.
>
> Matthias,
>
>  - the proposed method accepts a `ProducerRecord` -- it might be good to
> explain why this cannot be done in a type safe way (ie, missing generics)
>
> To accept different types of records from multiple topologies, I have to
> define the ProducerRecord without generics.
>
> - `AlwaysProductionExceptionHandler` ->
> `AlwaysContinueProductionExceptionHandler`
>
> Updated the typo error in KIP.
>
>  - `DefaultProductionExceptionHandler` is not mentioned
>
> The `handleSerializationException` method in the
> `ProductionExceptionHandler` interface will have default implementation
> that is set to FAIL by default.
> This is done to avoid any changes in the user implementation. So, I didn't
> mentioned the `DefaultProductionExceptionHandler` class. Updated the KIP.
>
> - Why do you distinguish between `ClassCastException` and "any other
> unchecked exception? Both second case seems to include the first one?
>
> In SinkNode.java#93
> <
> https://github.com/apache/kafka/blob/87cc31c4e7ea36e7e832a1d02d71480a91a75293/streams/src/main/java/org/apache/kafka/streams/processor/internals/SinkNode.java#L93
> >
> on
> hitting `ClassCastException`, we are halting the streams as it's a fatal
> error.
> To keep the original behavior, I've to distinguish the exceptions.
>
>
> On Thu, Dec 6, 2018 at 10:44 PM Matthias J. Sax 
> wrote:
>
> > Well, that's exactly the point. The serializer should not be altered
> > IMHO because this would have impact on other components. Also, for
> > applications that use KafkaProducer directly, they can catch any
> > serialization exception and react to it. Hence, I don't don't see a
> > reason to change the serializer interface.
> >
> > Instead, it seems better to solve this issue in Streams by allowing to
> > skip over a record for this case.
> >
> > Some more comments on the KIP:
> >
> >  - the proposed method accepts a `ProducerRecord` -- it might be good to
> > explain why this cannot be done in a type safe way (ie, missing generics)
> >
> >  - `AlwaysProductionExceptionHandler` ->
> > `AlwaysContinueProductionExceptionHandler`
> >
> >  - `DefaultProductionExceptionHandler` is not mentioned
> >
> >  - Why do you distinguish between `ClassCastException` and "any other
> > unchecked exception? Both second case seems to include the first one?
> >
> >
> >
> > -Matthias
> >
> > On 12/6/18 8:35 AM, Matt Farmer wrote:
> > > Ah, good point.
> > >
> > > Should we consider altering the serializer interface to permit not
> > sending
> > > the record?
> > >
> > > On Wed, Dec 5, 2018 at 9:23 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > >> Matt,
> > >>
> > >> That's a good point. If these cases are handled in the serializer,
> > then
> > >> one cannot continue the stream processing by skipping the record.
> > >> To continue, you may have to send a empty record serialized key/value
> > (new
> > >> byte[0]) to the downstream on hitting the error which may cause
> > un-intended
> > >> results.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Dec 5, 2018 at 8:41 PM Matt Farmer  wrote:
> > >>
> > >>> Hi there,
> > >>>
> > >>> Thanks for this KIP.
> > >>>
> > >>> What’s the thinking behind doing this in ProductionExceptionHandler
> > >> versus
> > >>> handling these cases in your serializer implementation?
> > >>>
> > >>> On Mon, Dec 3, 2018 at 1:09 AM Kamal Chandraprakash <
> > >>> kamal.chandraprak...@gmail.com> wrote:
> > >>>
> > >>>> Hello dev,
> > >>>>
> > >>>>   I hope to initiate the discussion for KIP-399: Extend
> > >>>> ProductionExceptionHandler to cover serialization exceptions.
> > >>>>
> > >>>> KIP: <
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions
> > >>>>>
> > >>>> JIRA: https://issues.apache.org/jira/browse/KAFKA-7499
> > >>>>
> > >>>> All feedbacks will be highly appreciated.
> > >>>>
> > >>>> Thanks,
> > >>>> Kamal Chandraprakash
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>


Re: [DISCUSS] KIP-399: Extend ProductionExceptionHandler to cover serialization exceptions

2018-12-06 Thread Matt Farmer
Ah, good point.

Should we consider altering the serializer interface to permit not sending
the record?

On Wed, Dec 5, 2018 at 9:23 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Matt,
>
> That's a good point. If these cases are handled in the serializer, then
> one cannot continue the stream processing by skipping the record.
> To continue, you may have to send a empty record serialized key/value (new
> byte[0]) to the downstream on hitting the error which may cause un-intended
> results.
>
>
>
>
>
> On Wed, Dec 5, 2018 at 8:41 PM Matt Farmer  wrote:
>
> > Hi there,
> >
> > Thanks for this KIP.
> >
> > What’s the thinking behind doing this in ProductionExceptionHandler
> versus
> > handling these cases in your serializer implementation?
> >
> > On Mon, Dec 3, 2018 at 1:09 AM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hello dev,
> > >
> > >   I hope to initiate the discussion for KIP-399: Extend
> > > ProductionExceptionHandler to cover serialization exceptions.
> > >
> > > KIP: <
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions
> > > >
> > > JIRA: https://issues.apache.org/jira/browse/KAFKA-7499
> > >
> > > All feedbacks will be highly appreciated.
> > >
> > > Thanks,
> > > Kamal Chandraprakash
> > >
> >
>


Re: [DISCUSS] KIP-399: Extend ProductionExceptionHandler to cover serialization exceptions

2018-12-05 Thread Matt Farmer
Hi there,

Thanks for this KIP.

What’s the thinking behind doing this in ProductionExceptionHandler versus
handling these cases in your serializer implementation?

On Mon, Dec 3, 2018 at 1:09 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hello dev,
>
>   I hope to initiate the discussion for KIP-399: Extend
> ProductionExceptionHandler to cover serialization exceptions.
>
> KIP: <
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions
> >
> JIRA: https://issues.apache.org/jira/browse/KAFKA-7499
>
> All feedbacks will be highly appreciated.
>
> Thanks,
> Kamal Chandraprakash
>


Re: [DISCUSS] KIP-387: Fair Message Consumption Across Partitions in KafkaConsumer

2018-11-19 Thread Matt Farmer
Hi there,

Thanks for the KIP.

We’ve run into issues with this at Mailchimp so something to address
consuming behavior would save us from having to always ensure we’re running
enough consumers that each consumer has only one partition (which is our
usual MO).

I wonder though if it would be simpler and more powerful to define the
maximum number of records the consumer should pull from one partition
before pulling some records from another?

So if you set max.poll.records to 500 and then some new setting,
max.poll.records.per.partition, to 100 then the Consumer would switch what
partition it reads from every 100 records - looping back around to the
first partition that had records if there aren’t 5 or more partitions with
records.

What do you think?

On Mon, Nov 19, 2018 at 9:11 AM ChienHsing Wu  wrote:

> Hi, could anyone please review this KIP?
>
> Thanks, ChienHsing
>
> From: ChienHsing Wu
> Sent: Friday, November 09, 2018 1:10 PM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-387: Fair Message Consumption Across Partitions
> in KafkaConsumer
>
> Just to check: Will anyone review this? It's been silent for a week...
> Thanks, ChienHsing
>
> From: ChienHsing Wu
> Sent: Monday, November 05, 2018 4:18 PM
> To: 'dev@kafka.apache.org'  dev@kafka.apache.org>>
> Subject: [DISCUSS] KIP-387: Fair Message Consumption Across Partitions in
> KafkaConsumer
>
> Hi I just put together the KIP page as requested. This email is to start
> the discussion thread.
>
> KIP: KIP-387: Fair Message Consumption Across Partitions in KafkaConsumer<
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-387%3A+Fair+Message+Consumption+Across+Partitions+in+KafkaConsumer
> >
> Pull Request: https://github.com/apache/kafka/pull/5838
> Jira: https://issues.apache.org/jira/browse/KAFKA-3932
>
> Thanks, CH
>


Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

2018-11-19 Thread Matt Farmer
Thanks for the KIP.

Will this cap be a global cap across the entire cluster or per broker?

Either way the default value seems a bit high to me, but that could just be
from my own usage patterns. I’d have probably started with 500 or 1k but
could be easily convinced that’s wrong.

Thanks,
Matt

On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen  wrote:

> Hey folks,
>
>
> I would like to start a discussion on KIP-389:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-389%3A+Enforce+group.max.size+to+cap+member+metadata+growth
>
>
> This is a pretty simple change to cap the consumer group size for broker
> stability. Give me your valuable feedback when you got time.
>
>
> Thank you!
>


Re: [VOTE] KIP-354 Time-based log compaction policy

2018-11-14 Thread Matt Farmer
I'm a +1 (non-binding) — This looks like it would have saved us a lot of
pain in an issue we had to debug recently. I can't go into details, but
figuring out how to achieve this effect gave me quite a headache. :)

On Mon, Nov 12, 2018 at 1:00 PM xiongqi wu  wrote:

> Hi all,
>
> Can I have one more vote on this KIP?
> Any comment is appreciated.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Add+a+Maximum+Log+Compaction+Lag
>
>
> Xiongqi (Wesley) Wu
>
>
> On Fri, Nov 9, 2018 at 7:56 PM xiongqi wu  wrote:
>
> > Thanks Dong.
> > I have updated the KIP.
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Fri, Nov 9, 2018 at 5:31 PM Dong Lin  wrote:
> >
> >> Thanks for the KIP Xiongqi. LGTM. +1 (binding)
> >>
> >> One minor comment: it may be a bit better to clarify in the public
> >> interface section that the value of the newly added metric is determined
> >> based by applying that formula across all compactable segments. For
> >> example:
> >>
> >> The maximum value of Math.max(now -
> >> earliest_timestamp_in_ms_of_uncompacted_segment - max.compaction.lag.ms
> ,
> >> 0)/1000 across all compactable partitions, where the
> >> max.compaction.lag.ms
> >> can be overridden on per-topic basis.
> >>
> >>
> >>
> >> On Fri, Nov 9, 2018 at 5:16 PM xiongqi wu  wrote:
> >>
> >> > Thanks Joel.
> >> > Tracking the delay at second granularity makes sense
> >> > I have updated KIP.
> >> >
> >> > Xiongqi (Wesley) Wu
> >> >
> >> >
> >> > On Fri, Nov 9, 2018 at 5:05 PM Joel Koshy 
> wrote:
> >> >
> >> > > +1 with one suggestion on the proposed metric. You should probably
> >> > include
> >> > > the unit. So for e.g., max-compaction-delay-secs.
> >> > >
> >> > > Joel
> >> > >
> >> > > On Tue, Nov 6, 2018 at 5:30 PM xiongqi wu 
> >> wrote:
> >> > >
> >> > > > bump
> >> > > > Xiongqi (Wesley) Wu
> >> > > >
> >> > > >
> >> > > > On Thu, Sep 27, 2018 at 4:20 PM xiongqi wu 
> >> > wrote:
> >> > > >
> >> > > > >
> >> > > > > Thanks Eno, Brett, Dong, Guozhang, Colin,  and Xiaohe for
> >> feedback.
> >> > > > > Can I have more feedback or VOTE on this KIP?
> >> > > > >
> >> > > > >
> >> > > > > Xiongqi (Wesley) Wu
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Sep 19, 2018 at 10:52 AM xiongqi wu <
> xiongq...@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > >> Any other votes or comments?
> >> > > > >>
> >> > > > >> Xiongqi (Wesley) Wu
> >> > > > >>
> >> > > > >>
> >> > > > >> On Tue, Sep 11, 2018 at 11:45 AM xiongqi wu <
> xiongq...@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >>
> >> > > > >>> Yes, more votes and code review.
> >> > > > >>>
> >> > > > >>> Xiongqi (Wesley) Wu
> >> > > > >>>
> >> > > > >>>
> >> > > > >>> On Mon, Sep 10, 2018 at 11:37 PM Brett Rann
> >> > >  >> > > > >
> >> > > > >>> wrote:
> >> > > > >>>
> >> > > >  +1 (non binding) from on 0 then, and on the KIP.
> >> > > > 
> >> > > >  Where do we go from here? More votes?
> >> > > > 
> >> > > >  On Tue, Sep 11, 2018 at 5:34 AM Colin McCabe <
> >> cmcc...@apache.org>
> >> > > >  wrote:
> >> > > > 
> >> > > >  > On Mon, Sep 10, 2018, at 11:44, xiongqi wu wrote:
> >> > > >  > > Thank you for comments. I will use '0' for now.
> >> > > >  > >
> >> > > >  > > If we create topics through admin client in the future,
> we
> >> > might
> >> > > >  perform
> >> > > >  > > some useful checks. (but the assumption is all brokers in
> >> the
> >> > > same
> >> > > >  > cluster
> >> > > >  > > have the same default configurations value, otherwise,it
> >> might
> >> > > >  still be
> >> > > >  > > tricky to do such cross validation check.)
> >> > > >  >
> >> > > >  > This isn't something that we might do in the future-- this
> is
> >> > > >  something we
> >> > > >  > are doing now. We already have Create Topic policies which
> >> are
> >> > > >  enforced by
> >> > > >  > the broker. Check KIP-108 and KIP-170 for details. This is
> >> one
> >> > of
> >> > > > the
> >> > > >  > motivations for getting rid of direct ZK access-- making
> sure
> >> > that
> >> > > >  these
> >> > > >  > policies are applied.
> >> > > >  >
> >> > > >  > I agree that having different configurations on different
> >> > brokers
> >> > > > can
> >> > > >  be
> >> > > >  > confusing and frustrating . That's why more configurations
> >> are
> >> > > being
> >> > > >  made
> >> > > >  > dynamic using KIP-226. Dynamic configurations are stored
> >> > centrally
> >> > > > in
> >> > > >  ZK,
> >> > > >  > so they are the same on all brokers (modulo propagation
> >> delays).
> >> > > In
> >> > > >  any
> >> > > >  > case, this is a general issue, not specific to "create
> >> topics".
> >> > > >  >
> >> > > >  > cheers,
> >> > > >  > Colin
> >> > > >  >
> >> > > >  >
> >> > > >  > >
> >> > > >  > >
> >> > > >  > > Xiongqi (Wesley) Wu
> >> > > >  > >
> >> > > >  > >
> >> > > >  > > On Mon, Sep 10, 2018 at 11:15 AM 

Introducing Kafka Hawk: Monitoring Consumer Group Commit Frequency

2018-11-10 Thread Matt Farmer
Hey everyone,

I wanted to share a small tool I developed last weekend named Kafka Hawk.

Kafka Hawk monitors the __consumer_offsets topic in Kafka and reports on
the number of commits it sees from each consumer group and topic. It can
also optionally report information on the deltas between offset commits for
a group and topic. It exports this information to a Prometheus (
https://prometheus.io/) endpoint.

You can find the project on GitHub here:
https://github.com/farmdawgnation/kafka-hawk

The easiest way to deploy it is to use the Docker image I've deployed to
Docker Hub. It's located at farmdawgnation/kafka-hawk

If you prefer, you can also download the JAR file directly from the
releases page on GitHub. You will need Java 11 to run the application, but
otherwise it's a standard application.

Please let me know if you have any questions! Contributions and bug reports
are welcome!

Cheers,
Matt


Re: [ANNOUNCE] New Kafka PMC member: Dong Lin

2018-09-16 Thread Matt Farmer
Congrats Dong!

On Sat, Sep 15, 2018 at 4:40 PM Bill Bejeck  wrote:

> Congrats!
>
> On Sat, Sep 15, 2018 at 1:27 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Congratulations!!
> >
> > El sáb., 15 sept. 2018 a las 15:18, Dongjin Lee ()
> > escribió:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Sat, Sep 15, 2018 at 3:00 PM Colin McCabe 
> wrote:
> > >
> > > > Congratulations, Dong Lin!
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Wed, Aug 22, 2018, at 05:26, Satish Duggana wrote:
> > > > > Congrats Dong Lin!
> > > > >
> > > > > On Wed, Aug 22, 2018 at 10:08 AM, Abhimanyu Nagrath <
> > > > > abhimanyunagr...@gmail.com> wrote:
> > > > >
> > > > > > Congratulations, Dong!
> > > > > >
> > > > > > On Wed, Aug 22, 2018 at 6:20 AM Dhruvil Shah <
> dhru...@confluent.io
> > >
> > > > wrote:
> > > > > >
> > > > > > > Congratulations, Dong!
> > > > > > >
> > > > > > > On Tue, Aug 21, 2018 at 4:38 PM Jason Gustafson <
> > > ja...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congrats!
> > > > > > > >
> > > > > > > > On Tue, Aug 21, 2018 at 10:03 AM, Ray Chiang <
> > rchi...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congrats Dong!
> > > > > > > > >
> > > > > > > > > -Ray
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 8/21/18 9:33 AM, Becket Qin wrote:
> > > > > > > > >
> > > > > > > > >> Congrats, Dong!
> > > > > > > > >>
> > > > > > > > >> On Aug 21, 2018, at 11:03 PM, Eno Thereska <
> > > > eno.there...@gmail.com>
> > > > > > > > >>> wrote:
> > > > > > > > >>>
> > > > > > > > >>> Congrats Dong!
> > > > > > > > >>>
> > > > > > > > >>> Eno
> > > > > > > > >>>
> > > > > > > > >>> On Tue, Aug 21, 2018 at 7:05 AM, Ted Yu <
> > yuzhih...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> Congratulation Dong!
> > > > > > > > 
> > > > > > > >  On Tue, Aug 21, 2018 at 1:59 AM Viktor Somogyi-Vass <
> > > > > > > >  viktorsomo...@gmail.com>
> > > > > > > >  wrote:
> > > > > > > > 
> > > > > > > >  Congrats Dong! :)
> > > > > > > > >
> > > > > > > > > On Tue, Aug 21, 2018 at 10:09 AM James Cheng <
> > > > > > wushuja...@gmail.com
> > > > > > > >
> > > > > > > > >
> > > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > > Congrats Dong!
> > > > > > > > >>
> > > > > > > > >> -James
> > > > > > > > >>
> > > > > > > > >> On Aug 20, 2018, at 3:54 AM, Ismael Juma <
> > > ism...@juma.me.uk
> > > > >
> > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>> Hi everyone,
> > > > > > > > >>>
> > > > > > > > >>> Dong Lin became a committer in March 2018. Since
> then,
> > he
> > > > has
> > > > > > > > >>>
> > > > > > > > >> remained
> > > > > > > > 
> > > > > > > > > active in the community and contributed a number of
> > > patches,
> > > > > > > reviewed
> > > > > > > > >>> several pull requests and participated in numerous
> KIP
> > > > > > > > discussions. I
> > > > > > > > >>>
> > > > > > > > >> am
> > > > > > > > >
> > > > > > > > >> happy to announce that Dong is now a member of the
> > > > > > > > >>> Apache Kafka PM
> > > > > > > > >>>
> > > > > > > > >>> Congratulation Dong! Looking forward to your future
> > > > > > > contributions.
> > > > > > > > >>>
> > > > > > > > >>> Ismael, on behalf of the Apache Kafka PMC
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  github.com/dongjinleekr
> > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > slideshare:
> > > www.slideshare.net/dongjinleekr
> > > *
> > >
> >
>


Re: [VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-08-28 Thread Matt Farmer
Given that voting and discussion have stalled out it seems like this is a
thing that folks aren't particularly interested in. I'll be moving the KIP
status to abandoned unless I hear an objection in the next day or so. :)

On Thu, May 31, 2018 at 12:39 PM Matt Farmer  wrote:

> Bumping this again as it's been languishing for a few weeks. Would love to
> get further feedback (or know for sure that this won't happen).
>
> On Mon, May 14, 2018 at 3:48 PM, Matt Farmer  wrote:
>
>> Bumping this thread.
>>
>> For anyone who needs a refresher the discussion thread is here:
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201803.mbox/%3CCAM5dya9x---9M3uEf_wrJL5dw%2B6HLV4%3D5PfKKSTPE1vOHEWC_g%40mail.gmail.com%3E
>>
>> And there's a work in progress PR open here:
>> https://github.com/apache/kafka/pull/5002
>>
>> Thanks!
>>
>> On Wed, Apr 25, 2018 at 1:04 PM, Matt Farmer  wrote:
>>
>>> Bump!
>>>
>>> We're currently at 1 non-binding +1.
>>>
>>> Still soliciting votes here. =)
>>>
>>> On Wed, Apr 18, 2018 at 3:41 PM, Ted Yu  wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Apr 18, 2018 at 12:40 PM, Matt Farmer  wrote:
>>>>
>>>> > Good afternoon/evening/morning all:
>>>> >
>>>> > I'd like to start voting on KIP-275: Indicate "isClosing" in the
>>>> > SinkTaskContext
>>>> >
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75977607
>>>> >
>>>> > I'm going to start preparing the patch we've been using internally
>>>> for PR
>>>> > and get it up for review later this week.
>>>> >
>>>> > Thanks!
>>>> > Matt
>>>> >
>>>>
>>>
>>>
>>
>


Re: [DISCUSS] KIP-349 Priorities for Source Topics

2018-08-12 Thread Matt Farmer
Ah, sorry, yes it does.

On Sun, Aug 12, 2018 at 2:58 PM  wrote:

> Does this clarify ?
> --
>   Nick
>
> On Aug 9, 2018, at 7:44 PM, n...@afshartous.com wrote:
>
> Since there are questions I changed the heading from VOTE to DISCUSS
>
> On Aug 8, 2018, at 9:09 PM, Matt Farmer  wrote:
>
> s it worth spelling out explicitly what the behavior is when two topics
> have the same priority? I'm a bit fuzzy on how we choose what topics to
> consume from right now, if I'm being honest, so it could be useful to
> outline the current behavior in the background and to spell out how that
> would change (or if it would change) when two topics are given the same
> priority.
>
>
> I added an additional note in the KIP’s Compatibility section to clarify
> that current behavior would not change in order to preserve backwards
> compatibility.
>
> Also, how does this play with max.poll.records? Does the consumer read from
>
> all the topics in priority order until we've hit the number of records or
> the poll timeout? Or does it immediately return the high priority records
> without pulling low priority records?
>
>
> My own interpretation would be to read from all the topics in priority
> order as the consumer is subscribed to multiple topics.
> --
>   Nick
>
>
>
>
>
>
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-12 Thread Matt Farmer
In thinking on it, another solution for this is another consumer external
to the stream - but then we run into timing issues and complexity with
using a state store as the storage of record. :/
On Sun, Aug 12, 2018 at 9:34 AM Matt Farmer  wrote:

> The work-queue use case is mostly how I see this being used, yes.
>
> In the most generic sense I can see its use in a situation where the
> business dictates
> that we have to guarantee quality of service for some set of low number of
> messages while
> there's some high number of messages being processed from a different
> topic.
>
> For our pipelines at work, this would actually make it possible for us to
> define a command
> and control topic for some of our streams applications. We occasionally
> have to change the
> behavior of our streams in reaction to system issues or, occasionally,
> user's abusing the
> system and creating a bunch of garbage data that we'd like to skip over.
> Today, we have
> to either define that behavior in a config setting and restart the
> application (which is what
> we currently do) or implement some sort of API external to streams that
> the stream pulls
> state from.
>
> With a CnC topic, we could interleave these into the normal stream
> processing flow and
> instruct it to alter a state store, for example, with the criteria for
> records to be dropped
> without introducing other libraries or having to manually synchronize with
> external state.
>
> On Wed, Aug 8, 2018 at 10:11 PM Gwen Shapira  wrote:
>
>> Can you guys spell it out for me? I just don't really see when I want to
>> subscribe to two topics but not get events from both at the same time.
>> Is this a work-queue type pattern?
>>
>> On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:
>>
>> > Oh, almost forgot, thanks for the KIP - I can see this being a very
>> useful
>> > addition. :)
>> >
>> > On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
>> >
>> > > Is it worth spelling out explicitly what the behavior is when two
>> topics
>> > > have the same priority? I'm a bit fuzzy on how we choose what topics
>> to
>> > > consume from right now, if I'm being honest, so it could be useful to
>> > > outline the current behavior in the background and to spell out how
>> that
>> > > would change (or if it would change) when two topics are given the
>> same
>> > > priority.
>> > >
>> > > Also, how does this play with max.poll.records? Does the consumer read
>> > > from all the topics in priority order until we've hit the number of
>> > records
>> > > or the poll timeout? Or does it immediately return the high priority
>> > > records without pulling low priority records?
>> > >
>> > > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
>> > >
>> > >>
>> > >> Hi All,
>> > >>
>> > >> Calling for a vote on KIP-349
>> > >>
>> > >>
>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 349%3A+Priorities+for+Source+Topics
>> > >> <
>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 349:+Priorities+for+Source+Topics
>> > >> >
>> > >>
>> > >> Cheers,
>> > >> --
>> > >>   Nick
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>>
>>
>>
>> --
>> *Gwen Shapira*
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>> <http://www.confluent.io/blog>
>>
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-12 Thread Matt Farmer
The work-queue use case is mostly how I see this being used, yes.

In the most generic sense I can see its use in a situation where the
business dictates
that we have to guarantee quality of service for some set of low number of
messages while
there's some high number of messages being processed from a different topic.

For our pipelines at work, this would actually make it possible for us to
define a command
and control topic for some of our streams applications. We occasionally
have to change the
behavior of our streams in reaction to system issues or, occasionally,
user's abusing the
system and creating a bunch of garbage data that we'd like to skip over.
Today, we have
to either define that behavior in a config setting and restart the
application (which is what
we currently do) or implement some sort of API external to streams that the
stream pulls
state from.

With a CnC topic, we could interleave these into the normal stream
processing flow and
instruct it to alter a state store, for example, with the criteria for
records to be dropped
without introducing other libraries or having to manually synchronize with
external state.

On Wed, Aug 8, 2018 at 10:11 PM Gwen Shapira  wrote:

> Can you guys spell it out for me? I just don't really see when I want to
> subscribe to two topics but not get events from both at the same time.
> Is this a work-queue type pattern?
>
> On Wed, Aug 8, 2018 at 6:10 PM, Matt Farmer  wrote:
>
> > Oh, almost forgot, thanks for the KIP - I can see this being a very
> useful
> > addition. :)
> >
> > On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:
> >
> > > Is it worth spelling out explicitly what the behavior is when two
> topics
> > > have the same priority? I'm a bit fuzzy on how we choose what topics to
> > > consume from right now, if I'm being honest, so it could be useful to
> > > outline the current behavior in the background and to spell out how
> that
> > > would change (or if it would change) when two topics are given the same
> > > priority.
> > >
> > > Also, how does this play with max.poll.records? Does the consumer read
> > > from all the topics in priority order until we've hit the number of
> > records
> > > or the poll timeout? Or does it immediately return the high priority
> > > records without pulling low priority records?
> > >
> > > On Wed, Aug 8, 2018 at 8:39 PM  wrote:
> > >
> > >>
> > >> Hi All,
> > >>
> > >> Calling for a vote on KIP-349
> > >>
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 349%3A+Priorities+for+Source+Topics
> > >> <
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 349:+Priorities+for+Source+Topics
> > >> >
> > >>
> > >> Cheers,
> > >> --
> > >>   Nick
> > >>
> > >>
> > >>
> > >>
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>
>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread Matt Farmer
Oh, almost forgot, thanks for the KIP - I can see this being a very useful
addition. :)

On Wed, Aug 8, 2018 at 9:09 PM Matt Farmer  wrote:

> Is it worth spelling out explicitly what the behavior is when two topics
> have the same priority? I'm a bit fuzzy on how we choose what topics to
> consume from right now, if I'm being honest, so it could be useful to
> outline the current behavior in the background and to spell out how that
> would change (or if it would change) when two topics are given the same
> priority.
>
> Also, how does this play with max.poll.records? Does the consumer read
> from all the topics in priority order until we've hit the number of records
> or the poll timeout? Or does it immediately return the high priority
> records without pulling low priority records?
>
> On Wed, Aug 8, 2018 at 8:39 PM  wrote:
>
>>
>> Hi All,
>>
>> Calling for a vote on KIP-349
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
>> >
>>
>> Cheers,
>> --
>>   Nick
>>
>>
>>
>>


Re: [VOTE] KIP-349 Priorities for Source Topics

2018-08-08 Thread Matt Farmer
Is it worth spelling out explicitly what the behavior is when two topics
have the same priority? I'm a bit fuzzy on how we choose what topics to
consume from right now, if I'm being honest, so it could be useful to
outline the current behavior in the background and to spell out how that
would change (or if it would change) when two topics are given the same
priority.

Also, how does this play with max.poll.records? Does the consumer read from
all the topics in priority order until we've hit the number of records or
the poll timeout? Or does it immediately return the high priority records
without pulling low priority records?

On Wed, Aug 8, 2018 at 8:39 PM  wrote:

>
> Hi All,
>
> Calling for a vote on KIP-349
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
>
> Cheers,
> --
>   Nick
>
>
>
>


Re: [VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-05-31 Thread Matt Farmer
Bumping this again as it's been languishing for a few weeks. Would love to
get further feedback (or know for sure that this won't happen).

On Mon, May 14, 2018 at 3:48 PM, Matt Farmer  wrote:

> Bumping this thread.
>
> For anyone who needs a refresher the discussion thread is here:
> http://mail-archives.apache.org/mod_mbox/kafka-dev/
> 201803.mbox/%3CCAM5dya9x---9M3uEf_wrJL5dw%2B6HLV4%
> 3D5PfKKSTPE1vOHEWC_g%40mail.gmail.com%3E
>
> And there's a work in progress PR open here: https://github.com/
> apache/kafka/pull/5002
>
> Thanks!
>
> On Wed, Apr 25, 2018 at 1:04 PM, Matt Farmer  wrote:
>
>> Bump!
>>
>> We're currently at 1 non-binding +1.
>>
>> Still soliciting votes here. =)
>>
>> On Wed, Apr 18, 2018 at 3:41 PM, Ted Yu  wrote:
>>
>>> +1
>>>
>>> On Wed, Apr 18, 2018 at 12:40 PM, Matt Farmer  wrote:
>>>
>>> > Good afternoon/evening/morning all:
>>> >
>>> > I'd like to start voting on KIP-275: Indicate "isClosing" in the
>>> > SinkTaskContext
>>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pa
>>> geId=75977607
>>> >
>>> > I'm going to start preparing the patch we've been using internally for
>>> PR
>>> > and get it up for review later this week.
>>> >
>>> > Thanks!
>>> > Matt
>>> >
>>>
>>
>>
>


Re: [VOTE] KIP-277 - Fine Grained ACL for CreateTopics API

2018-05-16 Thread Matt Farmer
+1 (non-binding)

On Tue, May 15, 2018 at 4:26 AM, Edoardo Comar  wrote:

> Hi,
> bumping the thread as the current vote count for this KIP is
> 2 binding +1
> 5 non-binding +1
>
> thanks, Edo
>
> On 8 May 2018 at 16:14, Edoardo Comar  wrote:
> > Hi,
> > bumping the thread as the current vote count for this KIP is
> > 2 binding +1
> > 5 non-binding +1
> >
> > so still missing a binding vote please
> >
> > thanks,
> > Edo
> >
> >
> > On 30 April 2018 at 12:49, Manikumar  wrote:
> >>
> >> +1 (non-binding)
> >>
> >> Thanks
> >>
> >> On Thu, Apr 26, 2018 at 9:59 PM, Colin McCabe 
> wrote:
> >>
> >> > +1 (non-binding)
> >> >
> >> > best,
> >> > Colin
> >> >
> >> >
> >> > On Wed, Apr 25, 2018, at 02:45, Edoardo Comar wrote:
> >> > > Hi,
> >> > >
> >> > > The discuss thread on KIP-277 (
> >> > > https://www.mail-archive.com/dev@kafka.apache.org/msg86540.html )
> >> > > seems to have been fruitful and concerns have been addressed, please
> >> > allow
> >> > > me start a vote on it:
> >> > >
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 277+-+Fine+Grained+ACL+for+CreateTopics+API
> >> > >
> >> > > I will update the small PR to the latest KIP semantics if the vote
> >> > passes
> >> > > (as I hope :-).
> >> > >
> >> > > cheers
> >> > > Edo
> >> > > --
> >> > >
> >> > > Edoardo Comar
> >> > >
> >> > > IBM Message Hub
> >> > >
> >> > > IBM UK Ltd, Hursley Park, SO21 2JN
> >> > > Unless stated otherwise above:
> >> > > IBM United Kingdom Limited - Registered in England and Wales with
> >> > > number
> >> > > 741598.
> >> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> >> > 3AU
> >> >
> >
> >
> >
> >
> > --
> > "When the people fear their government, there is tyranny; when the
> > government fears the people, there is liberty." [Thomas Jefferson]
> >
>
>
>
> --
> "When the people fear their government, there is tyranny; when the
> government fears the people, there is liberty." [Thomas Jefferson]
>


Re: [DISCUSS] KIP-298: Error Handling in Connect

2018-05-16 Thread Matt Farmer
Hey Arjun,

I like deadletterqueue all lower case, so I'm +1 on that.

Yes, in the case we were seeing there were external system failures.
We had issues connecting to S3. While the connector does include
some retry functionality, however setting these values sufficiently
high seemed to cause us to hit timeouts and cause the entire
task to fail anyway. (I think I was using something like 100 retries
during the brief test of this behavior?)

Yeah, totally understand that there could be unintended concequences
from this. I guess the use case I'm trying to optimize for is to give
folks some bubblegum to keep a high volume system limping
along until the software engineers get time to address it. So I'm
imagining the situation that I'm paged on a Saturday night because of
an intermittent network issue. With a config flag like this I could push
a config change to cause Connect to treat that as retriable and allow
me to wait until the following Monday to make changes to the code.
That may not be a sensible concern for Kafka writ large, but Connect
is a bit weird when compared with Streams or the Clients. It's almost
more of a piece of infrastructure than a library, and I generally like
infrastructure to have escape hatches like that. Just my 0.02 though. :)

Thanks,
Matt

On Tue, May 15, 2018 at 8:46 PM, Arjun Satish <arjun.sat...@gmail.com>
wrote:

> Matt,
>
> Thanks so much for your comments. Really appreciate it!
>
> 1. Good point about the acronym. I can use deadletterqueue instead of dlq
> (using all lowercase to be consistent with the other configs in Kafka).
> What do you think?
>
> 2. Could you please tell us what errors caused these tasks to fail? Were
> they because of external system failures? And if so, could they be
> implemented in the Connector itself? Or using retries with backoffs?
>
> 3. I like this idea. But did not include it here since it might be a
> stretch. One thing to note is that ConnectExceptions can be thrown from a
> variety of places in a connector. I think it should be OK for the Connector
> to throw RetriableException or something that extends it for the operation
> to be retried. By changing this behavior, a lot of existing connectors
> would have to be updated so that they don't rewrite messages into this
> sink. For example, a sink connector might write some data into the external
> system partially, and then fail with a ConnectException. Since the
> framework has no way of knowing what was written and what was not, a retry
> here might cause the same data to written again into the sink.
>
> Best,
>
>
> On Mon, May 14, 2018 at 12:46 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hi Arjun,
> >
> > I'm following this very closely as better error handling in Connect is a
> > high priority
> > for MailChimp's Data Systems team.
> >
> > A few thoughts (in no particular order):
> >
> > For the dead letter queue configuration, could we use deadLetterQueue
> > instead of
> > dlq? Acronyms are notoriously hard to keep straight in everyone's head
> and
> > unless
> > there's a compelling reason it would be nice to use the characters and be
> > explicit.
> >
> > Have you considered any behavior that would periodically attempt to
> restart
> > failed
> > tasks after a certain amount of time? To get around our issues internally
> > we've
> > deployed a tool that monitors for failed tasks and restarts the task by
> > hitting the
> > REST API after the failure. Such a config would allow us to get rid of
> this
> > tool.
> >
> > Have you considered a config setting to allow-list additional classes as
> > retryable? In the situation we ran into, we were getting
> ConnectExceptions
> > that
> > were intermittent due to an unrelated service. With such a setting we
> could
> > have
> > deployed a config that temporarily whitelisted that Exception as
> > retry-worthy
> > and continued attempting to make progress while the other team worked
> > on mitigating the problem.
> >
> > Thanks for the KIP!
> >
> > On Wed, May 9, 2018 at 2:59 AM, Arjun Satish <arjun.sat...@gmail.com>
> > wrote:
> >
> > > All,
> > >
> > > I'd like to start a discussion on adding ways to handle and report
> record
> > > processing errors in Connect. Please find a KIP here:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 298%3A+Error+Handling+in+Connect
> > >
> > > Any feedback will be highly appreciated.
> > >
> > > Thanks very much,
> > > Arjun
> > >
> >
>


Re: [VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-05-14 Thread Matt Farmer
Bumping this thread.

For anyone who needs a refresher the discussion thread is here:
http://mail-archives.apache.org/mod_mbox/kafka-dev/201803.mbox/%3CCAM5dya9x---9M3uEf_wrJL5dw%2B6HLV4%3D5PfKKSTPE1vOHEWC_g%40mail.gmail.com%3E

And there's a work in progress PR open here:
https://github.com/apache/kafka/pull/5002

Thanks!

On Wed, Apr 25, 2018 at 1:04 PM, Matt Farmer <m...@frmr.me> wrote:

> Bump!
>
> We're currently at 1 non-binding +1.
>
> Still soliciting votes here. =)
>
> On Wed, Apr 18, 2018 at 3:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> +1
>>
>> On Wed, Apr 18, 2018 at 12:40 PM, Matt Farmer <m...@frmr.me> wrote:
>>
>> > Good afternoon/evening/morning all:
>> >
>> > I'd like to start voting on KIP-275: Indicate "isClosing" in the
>> > SinkTaskContext
>> > https://cwiki.apache.org/confluence/pages/viewpage.action?
>> pageId=75977607
>> >
>> > I'm going to start preparing the patch we've been using internally for
>> PR
>> > and get it up for review later this week.
>> >
>> > Thanks!
>> > Matt
>> >
>>
>
>


Re: [DISCUSS] KIP-298: Error Handling in Connect

2018-05-14 Thread Matt Farmer
Hi Arjun,

I'm following this very closely as better error handling in Connect is a
high priority
for MailChimp's Data Systems team.

A few thoughts (in no particular order):

For the dead letter queue configuration, could we use deadLetterQueue
instead of
dlq? Acronyms are notoriously hard to keep straight in everyone's head and
unless
there's a compelling reason it would be nice to use the characters and be
explicit.

Have you considered any behavior that would periodically attempt to restart
failed
tasks after a certain amount of time? To get around our issues internally
we've
deployed a tool that monitors for failed tasks and restarts the task by
hitting the
REST API after the failure. Such a config would allow us to get rid of this
tool.

Have you considered a config setting to allow-list additional classes as
retryable? In the situation we ran into, we were getting ConnectExceptions
that
were intermittent due to an unrelated service. With such a setting we could
have
deployed a config that temporarily whitelisted that Exception as
retry-worthy
and continued attempting to make progress while the other team worked
on mitigating the problem.

Thanks for the KIP!

On Wed, May 9, 2018 at 2:59 AM, Arjun Satish  wrote:

> All,
>
> I'd like to start a discussion on adding ways to handle and report record
> processing errors in Connect. Please find a KIP here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 298%3A+Error+Handling+in+Connect
>
> Any feedback will be highly appreciated.
>
> Thanks very much,
> Arjun
>


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-05-10 Thread Matt Farmer
Given that the conversation has lingered for a bit, I've gone ahead and
opened up a PR with the initial implementation. Let me know your thoughts!

https://github.com/apache/kafka/pull/5002

Also, voting is open - so if you like this idea please send me some binding
+1's before May 22nd so we can get this in Kafka 2.0 :)

On Tue, Apr 17, 2018 at 7:11 PM, Matt Farmer <m...@frmr.me> wrote:

> Hello all, I've updated a KIP again to add a few sentences about the
> general problem we were facing in the motivation section. Please let me
> know if there is any further feedback.
>
> On Tue, Apr 3, 2018 at 1:46 PM, Matt Farmer <m...@frmr.me> wrote:
>
>> Hey Randall,
>>
>> Devil's advocate sparring is always a fun game so I'm down. =)
>>
>> Rebalance caused by connectivity failure is the case that caused us to
>> notice the issue. But the issue
>> is really more about giving connectors the tools to facilitate rebalances
>> or a Kafka connect shutdown
>> cleanly. Perhaps that wasn't clear in the KIP.
>>
>> In our case timeouts were *not* uniformly affecting tasks. But every
>> time a timeout occurred in one task,
>> all tasks lost whatever forward progress they had made. So, yes, in the
>> specific case of timeouts a
>> backoff jitter in the connector is a solution for that particular issue.
>> However, this KIP *also* gives connectors
>> enough information to behave intelligently during any kind of rebalance
>> or shutdown because they're able
>> to discover that preCommit is being invoked for that specific reason. (As
>> opposed to being invoked
>> during normal operation.)
>>
>> On Tue, Apr 3, 2018 at 12:36 PM, Randall Hauch <rha...@gmail.com> wrote:
>>
>>> Matt,
>>>
>>> Let me play devil's advocate. Do we need this additional complexity? The
>>> motivation section talked about needing to deal with task failures due to
>>> connectivity problems. Generally speaking, isn't it possible that if one
>>> task has connectivity problems with either the broker or the external
>>> system that other tasks would as well? And in the particular case of S3,
>>> is
>>> it possible to try and prevent the task shutdown in the first place,
>>> perhaps by improving how the S3 connector retries? (We did this in the
>>> Elasticsearch connector using backoff with jitter; see
>>> https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for
>>> details.)
>>>
>>> Best regards,
>>>
>>> Randall
>>>
>>> On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer <m...@frmr.me> wrote:
>>>
>>> > I have made the requested updates to the KIP! :)
>>> >
>>> > On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer <m...@frmr.me> wrote:
>>> >
>>> > > Ugh
>>> > >
>>> > > * I can update
>>> > >
>>> > > I need more caffeine...
>>> > >
>>> > > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote:
>>> > >
>>> > >> I'm can update the rejected alternatives section as you describe.
>>> > >>
>>> > >> Also, adding a paragraph to the preCommit javadoc also seems like a
>>> > >> Very Very Good Idea™ so I'll make that update to the KIP as well.
>>> > >>
>>> > >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com>
>>> > wrote:
>>> > >>
>>> > >>> Thanks for the KIP proposal, Matt.
>>> > >>>
>>> > >>> You mention in the "Rejected Alternatives" section that you
>>> considered
>>> > >>> changing the signature of the `preCommit` method but rejected it
>>> > because
>>> > >>> it
>>> > >>> would break backward compatibility. Strictly speaking, it is
>>> possible
>>> > to
>>> > >>> do
>>> > >>> this without breaking compatibility by introducing a new
>>> `preCommit`
>>> > >>> method, deprecating the old one, and having the new implementation
>>> call
>>> > >>> the
>>> > >>> old one. Such an approach would be complicated, and I'm not sure it
>>> > adds
>>> > >>> any value. In fact, one of the benefits of having a context object
>>> is
>>> > >>> that
>>> > >>> we can add methods like the

Re: [VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-25 Thread Matt Farmer
Bump!

We're currently at 1 non-binding +1.

Still soliciting votes here. =)

On Wed, Apr 18, 2018 at 3:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> +1
>
> On Wed, Apr 18, 2018 at 12:40 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Good afternoon/evening/morning all:
> >
> > I'd like to start voting on KIP-275: Indicate "isClosing" in the
> > SinkTaskContext
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=75977607
> >
> > I'm going to start preparing the patch we've been using internally for PR
> > and get it up for review later this week.
> >
> > Thanks!
> > Matt
> >
>


Re: [VOTE] Kafka 2.0.0 in June 2018

2018-04-19 Thread Matt Farmer
+1 (non-binding). TY!

On Thu, Apr 19, 2018 at 11:56 AM, tao xiao  wrote:

> +1 non-binding. thx Ismael
>
> On Thu, 19 Apr 2018 at 23:14 Vahid S Hashemian 
> wrote:
>
> > +1 (non-binding).
> >
> > Thanks Ismael.
> >
> > --Vahid
> >
> >
> >
> > From:   Jorge Esteban Quilcate Otoya 
> > To: dev@kafka.apache.org
> > Date:   04/19/2018 07:32 AM
> > Subject:Re: [VOTE] Kafka 2.0.0 in June 2018
> >
> >
> >
> > +1 (non binding), thanks Ismael!
> >
> > El jue., 19 abr. 2018 a las 13:01, Manikumar ( >)
> > escribió:
> >
> > > +1 (non-binding).
> > >
> > > Thanks.
> > >
> > > On Thu, Apr 19, 2018 at 3:07 PM, Stephane Maarek <
> > > steph...@simplemachines.com.au> wrote:
> > >
> > > > +1 (non binding). Thanks Ismael!
> > > >
> > > > On Thu., 19 Apr. 2018, 2:47 pm Gwen Shapira, 
> > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Wed, Apr 18, 2018 at 11:35 AM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I started a discussion last year about bumping the version of the
> > > June
> > > > > 2018
> > > > > > release to 2.0.0[1]. To reiterate the reasons in the original
> > post:
> > > > > >
> > > > > > 1. Adopt KIP-118 (Drop Support for Java 7), which requires a
> major
> > > > > version
> > > > > > bump due to semantic versioning.
> > > > > >
> > > > > > 2. Take the chance to remove deprecated code that was deprecated
> > > prior
> > > > to
> > > > > > 1.0.0, but not removed in 1.0.0 (e.g. old Scala clients) so that
> > we
> > > can
> > > > > > move faster.
> > > > > >
> > > > > > One concern that was raised is that we still do not have a
> rolling
> > > > > upgrade
> > > > > > path for the old ZK-based consumers. Since the Scala clients
> > haven't
> > > > been
> > > > > > updated in a long time (they don't support security or the latest
> > > > message
> > > > > > format), users who need them can continue to use 1.1.0 with no
> > loss
> > > of
> > > > > > functionality.
> > > > > >
> > > > > > Since it's already mid-April and people seemed receptive during
> > the
> > > > > > discussion last year, I'm going straight to a vote, but we can
> > > discuss
> > > > > more
> > > > > > if needed (of course).
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > [1]
> > > > > >
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.
> apache.org_thread.html_dd9d3e31d7e9590c1f727ef5560c93
> =DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> kjJc7uSVcviKUc=dA4UE_6i8-ltuLeZapDpOBc_8-XI9HTNmZdteu6wfk8=lBt342M2PM_
> 4czzbFWtAc63571qsZGc9sfB7A5DlZPo=
> >
> > > > > > 3281bad0de3134469b7b3c4257@%3Cdev.kafka.apache.org%3E
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Gwen Shapira*
> > > > > Product Manager | Confluent
> > > > > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > > > > Follow us: Twitter <
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__
> twitter.com_ConfluentInc=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_
> xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=dA4UE_6i8-ltuLeZapDpOBc_8-
> XI9HTNmZdteu6wfk8=KcgJLWP_UEkzMrujjrbJA4QfHPDrJjcaWS95p2LGewU=
> > > | blog
> > > > > <
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> confluent.io_blog=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=Q_
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=dA4UE_6i8-ltuLeZapDpOBc_8-
> XI9HTNmZdteu6wfk8=XaV8g8yeT1koLf1dbc30NTzBdXB6GAj7FwD8J2VP7iY=
> > >
> > > > >
> > > >
> > >
> >
> >
> >
> >
> >
>


[VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-18 Thread Matt Farmer
Good afternoon/evening/morning all:

I'd like to start voting on KIP-275: Indicate "isClosing" in the
SinkTaskContext
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75977607

I'm going to start preparing the patch we've been using internally for PR
and get it up for review later this week.

Thanks!
Matt


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-17 Thread Matt Farmer
Hello all, I've updated a KIP again to add a few sentences about the
general problem we were facing in the motivation section. Please let me
know if there is any further feedback.

On Tue, Apr 3, 2018 at 1:46 PM, Matt Farmer <m...@frmr.me> wrote:

> Hey Randall,
>
> Devil's advocate sparring is always a fun game so I'm down. =)
>
> Rebalance caused by connectivity failure is the case that caused us to
> notice the issue. But the issue
> is really more about giving connectors the tools to facilitate rebalances
> or a Kafka connect shutdown
> cleanly. Perhaps that wasn't clear in the KIP.
>
> In our case timeouts were *not* uniformly affecting tasks. But every time
> a timeout occurred in one task,
> all tasks lost whatever forward progress they had made. So, yes, in the
> specific case of timeouts a
> backoff jitter in the connector is a solution for that particular issue.
> However, this KIP *also* gives connectors
> enough information to behave intelligently during any kind of rebalance or
> shutdown because they're able
> to discover that preCommit is being invoked for that specific reason. (As
> opposed to being invoked
> during normal operation.)
>
> On Tue, Apr 3, 2018 at 12:36 PM, Randall Hauch <rha...@gmail.com> wrote:
>
>> Matt,
>>
>> Let me play devil's advocate. Do we need this additional complexity? The
>> motivation section talked about needing to deal with task failures due to
>> connectivity problems. Generally speaking, isn't it possible that if one
>> task has connectivity problems with either the broker or the external
>> system that other tasks would as well? And in the particular case of S3,
>> is
>> it possible to try and prevent the task shutdown in the first place,
>> perhaps by improving how the S3 connector retries? (We did this in the
>> Elasticsearch connector using backoff with jitter; see
>> https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for
>> details.)
>>
>> Best regards,
>>
>> Randall
>>
>> On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer <m...@frmr.me> wrote:
>>
>> > I have made the requested updates to the KIP! :)
>> >
>> > On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer <m...@frmr.me> wrote:
>> >
>> > > Ugh
>> > >
>> > > * I can update
>> > >
>> > > I need more caffeine...
>> > >
>> > > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote:
>> > >
>> > >> I'm can update the rejected alternatives section as you describe.
>> > >>
>> > >> Also, adding a paragraph to the preCommit javadoc also seems like a
>> > >> Very Very Good Idea™ so I'll make that update to the KIP as well.
>> > >>
>> > >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com>
>> > wrote:
>> > >>
>> > >>> Thanks for the KIP proposal, Matt.
>> > >>>
>> > >>> You mention in the "Rejected Alternatives" section that you
>> considered
>> > >>> changing the signature of the `preCommit` method but rejected it
>> > because
>> > >>> it
>> > >>> would break backward compatibility. Strictly speaking, it is
>> possible
>> > to
>> > >>> do
>> > >>> this without breaking compatibility by introducing a new `preCommit`
>> > >>> method, deprecating the old one, and having the new implementation
>> call
>> > >>> the
>> > >>> old one. Such an approach would be complicated, and I'm not sure it
>> > adds
>> > >>> any value. In fact, one of the benefits of having a context object
>> is
>> > >>> that
>> > >>> we can add methods like the one you're proposing without causing any
>> > >>> compatibility issues. Anyway, it probably is worth updating this
>> > rejected
>> > >>> alternative to be a bit more precise.
>> > >>>
>> > >>> Otherwise, I think this is a good approach, though I'd request that
>> we
>> > >>> update the `preCommit` JavaDoc to add a paragraph that explains this
>> > >>> scenario. Thoughts?
>> > >>>
>> > >>> Randall
>> > >>>
>> > >>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com>
>> wrote:
>> > >>>
>> > >>> > I looked at WorkerSinkTask and

Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-03 Thread Matt Farmer
Hey Randall,

Devil's advocate sparring is always a fun game so I'm down. =)

Rebalance caused by connectivity failure is the case that caused us to
notice the issue. But the issue
is really more about giving connectors the tools to facilitate rebalances
or a Kafka connect shutdown
cleanly. Perhaps that wasn't clear in the KIP.

In our case timeouts were *not* uniformly affecting tasks. But every time a
timeout occurred in one task,
all tasks lost whatever forward progress they had made. So, yes, in the
specific case of timeouts a
backoff jitter in the connector is a solution for that particular issue.
However, this KIP *also* gives connectors
enough information to behave intelligently during any kind of rebalance or
shutdown because they're able
to discover that preCommit is being invoked for that specific reason. (As
opposed to being invoked
during normal operation.)

On Tue, Apr 3, 2018 at 12:36 PM, Randall Hauch <rha...@gmail.com> wrote:

> Matt,
>
> Let me play devil's advocate. Do we need this additional complexity? The
> motivation section talked about needing to deal with task failures due to
> connectivity problems. Generally speaking, isn't it possible that if one
> task has connectivity problems with either the broker or the external
> system that other tasks would as well? And in the particular case of S3, is
> it possible to try and prevent the task shutdown in the first place,
> perhaps by improving how the S3 connector retries? (We did this in the
> Elasticsearch connector using backoff with jitter; see
> https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for
> details.)
>
> Best regards,
>
> Randall
>
> On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer <m...@frmr.me> wrote:
>
> > I have made the requested updates to the KIP! :)
> >
> > On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer <m...@frmr.me> wrote:
> >
> > > Ugh
> > >
> > > * I can update
> > >
> > > I need more caffeine...
> > >
> > > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote:
> > >
> > >> I'm can update the rejected alternatives section as you describe.
> > >>
> > >> Also, adding a paragraph to the preCommit javadoc also seems like a
> > >> Very Very Good Idea™ so I'll make that update to the KIP as well.
> > >>
> > >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > >>
> > >>> Thanks for the KIP proposal, Matt.
> > >>>
> > >>> You mention in the "Rejected Alternatives" section that you
> considered
> > >>> changing the signature of the `preCommit` method but rejected it
> > because
> > >>> it
> > >>> would break backward compatibility. Strictly speaking, it is possible
> > to
> > >>> do
> > >>> this without breaking compatibility by introducing a new `preCommit`
> > >>> method, deprecating the old one, and having the new implementation
> call
> > >>> the
> > >>> old one. Such an approach would be complicated, and I'm not sure it
> > adds
> > >>> any value. In fact, one of the benefits of having a context object is
> > >>> that
> > >>> we can add methods like the one you're proposing without causing any
> > >>> compatibility issues. Anyway, it probably is worth updating this
> > rejected
> > >>> alternative to be a bit more precise.
> > >>>
> > >>> Otherwise, I think this is a good approach, though I'd request that
> we
> > >>> update the `preCommit` JavaDoc to add a paragraph that explains this
> > >>> scenario. Thoughts?
> > >>>
> > >>> Randall
> > >>>
> > >>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >>>
> > >>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
> > >>> should
> > >>> > suffice for now.
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer <m...@frmr.me> wrote:
> > >>> >
> > >>> > > Hey Ted,
> > >>> > >
> > >>> > > I have not, actually!
> > >>> > >
> > >>> > > Do you think that we're likely to add multiple states here soon?
> > >>> > >
> > >>> > > My 

Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-03 Thread Matt Farmer
I have made the requested updates to the KIP! :)

On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer <m...@frmr.me> wrote:

> Ugh
>
> * I can update
>
> I need more caffeine...
>
> On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote:
>
>> I'm can update the rejected alternatives section as you describe.
>>
>> Also, adding a paragraph to the preCommit javadoc also seems like a
>> Very Very Good Idea™ so I'll make that update to the KIP as well.
>>
>> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com> wrote:
>>
>>> Thanks for the KIP proposal, Matt.
>>>
>>> You mention in the "Rejected Alternatives" section that you considered
>>> changing the signature of the `preCommit` method but rejected it because
>>> it
>>> would break backward compatibility. Strictly speaking, it is possible to
>>> do
>>> this without breaking compatibility by introducing a new `preCommit`
>>> method, deprecating the old one, and having the new implementation call
>>> the
>>> old one. Such an approach would be complicated, and I'm not sure it adds
>>> any value. In fact, one of the benefits of having a context object is
>>> that
>>> we can add methods like the one you're proposing without causing any
>>> compatibility issues. Anyway, it probably is worth updating this rejected
>>> alternative to be a bit more precise.
>>>
>>> Otherwise, I think this is a good approach, though I'd request that we
>>> update the `preCommit` JavaDoc to add a paragraph that explains this
>>> scenario. Thoughts?
>>>
>>> Randall
>>>
>>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
>>> should
>>> > suffice for now.
>>> >
>>> > Thanks
>>> >
>>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer <m...@frmr.me> wrote:
>>> >
>>> > > Hey Ted,
>>> > >
>>> > > I have not, actually!
>>> > >
>>> > > Do you think that we're likely to add multiple states here soon?
>>> > >
>>> > > My instinct is to keep it simple until there are multiple states
>>> that we
>>> > > would want
>>> > > to consider. I really like the simplicity of just getting a boolean
>>> and
>>> > the
>>> > > implementation of WorkerSinkTask already passes around a boolean to
>>> > > indicate this is happening internally. We're really just shuttling
>>> that
>>> > > value into
>>> > > the context at the correct moments.
>>> > >
>>> > > Once we have multiple states, we could choose to provide a more
>>> > > appropriately
>>> > > named method (e.g. getState?) and reimplement isClosing by checking
>>> that
>>> > > enum
>>> > > without breaking compatibility.
>>> > >
>>> > > However, if we think multiple states here are imminent for some
>>> reason, I
>>> > > would
>>> > > be pretty easy to convince adding that would be worth the extra
>>> > complexity!
>>> > > :)
>>> > >
>>> > > Matt
>>> > >
>>> > > —
>>> > > Matt Farmer | Blog <http://farmdawgnation.com/> | Twitter
>>> > > <http://twitter.com/farmdawgnation>
>>> > > GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07
>>> > >
>>> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu <yuzhih...@gmail.com>
>>> wrote:
>>> > >
>>> > > > The enhancement gives SinkTaskContext state information.
>>> > > >
>>> > > > Have you thought of exposing the state retrieval as an enum
>>> (initially
>>> > > with
>>> > > > two values) ?
>>> > > >
>>> > > > Thanks
>>> > > >
>>> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer <m...@frmr.me> wrote:
>>> > > >
>>> > > > > Hello all,
>>> > > > >
>>> > > > > I am proposing KIP-275 to improve Connect's SinkTaskContext so
>>> that
>>> > > Sinks
>>> > > > > can be informed
>>> > > > > in their preCommit hook if the hook is being invoked as a part
>>> of a
>>> > > > > rebalance or Connect
>>> > > > > shutdown.
>>> > > > >
>>> > > > > The KIP is here:
>>> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
>>> > > > action?pageId=75977607
>>> > > > >
>>> > > > > Please let me know what feedback y'all have. Thanks!
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: Seeking Feedback on Kafka Connect Issues

2018-04-02 Thread Matt Farmer
Howdy,

I'll keep an eye on that ticket. KIP-275 came out of some work we've done
internally on our
private forks of Kafka and the Confluent Cloud Storage connector.
Essentially, with that extra
API we've tweaked the S3 connector to check the value of isClosing in
preCommit and immediately
attempt to commit files to S3 regardless of whether or not we've reached a
size or time limit.

We've been using this internally for a few days and it's been working well
for our needs. Whenever
it gets approved and merged I'll be able to open PRs against the Confluent
repos for the changes
we made pretty quickly.

We are, however, still interested in some better error handling for
Connect. I think in the interim
we're going to have to build a sidecar service that monitors the Connect
API for failed tasks
and restarts them for us. :(

Happy to provide whatever help I can toward making that sidecar service not
needed.

On Mon, Apr 2, 2018 at 11:18 AM, Randall Hauch <rha...@gmail.com> wrote:

> Yes, Confluent would be interested in improvements to the S3 connector.
> Feel free to create an issue/PR in
> https://github.com/confluentinc/kafka-connect-storage-cloud/.
>
> I just created https://issues.apache.org/jira/browse/KAFKA-6738 to deal
> with the bad data handling issue, and we can use that to track all of the
> comments, discussions, and work. I know that Konstantine K has already
> thought a fair amount about this, and so I've assigned it (at least
> initially) to him. This is something we'd like to get into the next AK
> release (2.0?), but would certainly appreciate any help from you or any
> other members of the community. If you're willing to help, I'd ask that you
> please coordinate with him on
> https://issues.apache.org/jira/browse/KAFKA-6738.
>
> As a side note, the KIP freeze for each release is often a good month
> before the planned release, and feature freeze usually only a week after
> that. This means that KIP that fails to be approved before this deadline
> will be pushed to the next release - all the more reason to work on the KIP
> and the implementation well before deadline.
>
> Randall
>
> On Tue, Mar 20, 2018 at 9:49 AM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hi Ewen,
> >
> > Thanks for the thoughtful response. I’m happy to take some time to write
> > up a KIP and do some implementation work here.
> > I did KIP-210 previously, so I’ve been through the process before. We
> also
> > have some organizational interest for improving
> > Kafka Connect. Our concern internally is that we don’t want to wait on
> the
> > KIP cycle to fully complete before rolling out
> > something. It was many months for KIP-210 to go from draft to merge.
> >
> > It might be sufficient for us in the interim to:
> >
> > (1) Improve the S3 connector using close/open to be smarter about what
> > multipart uploads it cancels during rebalance.
> >
> > (2) Implement a sidecar service that monitors the connect API and restart
> > tasks that fail
> >
> > … and in parallel work on the KIPs required to provide a less patchwork
> > solution in the framework itself.
> >
> > Is such a contribution to the S3 connector something that Confluent would
> > be open to?
> >
> > > On Mar 19, 2018, at 10:00 PM, Ewen Cheslack-Postava <e...@confluent.io
> >
> > wrote:
> > >
> > > Responses inline.
> > >
> > > On Mon, Mar 19, 2018 at 3:02 PM, Matt Farmer <m...@frmr.me  > m...@frmr.me>> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> We’ve been experimenting recently with some limited use of Kafka
> Connect
> > >> and are hoping to expand to wider use cases soon. However, we had some
> > >> internal issues that gave us a well-timed preview of error handling
> > >> behavior in Kafka Connect. I think the fixes for this will require at
> > least
> > >> three different KIPs, but I want to share some thoughts to get the
> > initial
> > >> reaction from folks in the dev community. If these ideas seem
> > reasonable, I
> > >> can go ahead and create the required KIPs.
> > >>
> > >> Here are the three things specifically we ran into…
> > >>
> > >> ---
> > >>
> > >> (1) Kafka Connect only retries tasks when certain exceptions are
> thrown
> > >> Currently, Kafka Connect only retries tasks when certain exceptions
> are
> > >> thrown - I believe the logic checks to see if the exception is
> > specifically
> > >> marked as “retryable” and if not, fails. We’d like to bypass this
> > be

Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-02 Thread Matt Farmer
Ugh

* I can update

I need more caffeine...

On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote:

> I'm can update the rejected alternatives section as you describe.
>
> Also, adding a paragraph to the preCommit javadoc also seems like a
> Very Very Good Idea™ so I'll make that update to the KIP as well.
>
> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com> wrote:
>
>> Thanks for the KIP proposal, Matt.
>>
>> You mention in the "Rejected Alternatives" section that you considered
>> changing the signature of the `preCommit` method but rejected it because
>> it
>> would break backward compatibility. Strictly speaking, it is possible to
>> do
>> this without breaking compatibility by introducing a new `preCommit`
>> method, deprecating the old one, and having the new implementation call
>> the
>> old one. Such an approach would be complicated, and I'm not sure it adds
>> any value. In fact, one of the benefits of having a context object is that
>> we can add methods like the one you're proposing without causing any
>> compatibility issues. Anyway, it probably is worth updating this rejected
>> alternative to be a bit more precise.
>>
>> Otherwise, I think this is a good approach, though I'd request that we
>> update the `preCommit` JavaDoc to add a paragraph that explains this
>> scenario. Thoughts?
>>
>> Randall
>>
>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
>> should
>> > suffice for now.
>> >
>> > Thanks
>> >
>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer <m...@frmr.me> wrote:
>> >
>> > > Hey Ted,
>> > >
>> > > I have not, actually!
>> > >
>> > > Do you think that we're likely to add multiple states here soon?
>> > >
>> > > My instinct is to keep it simple until there are multiple states that
>> we
>> > > would want
>> > > to consider. I really like the simplicity of just getting a boolean
>> and
>> > the
>> > > implementation of WorkerSinkTask already passes around a boolean to
>> > > indicate this is happening internally. We're really just shuttling
>> that
>> > > value into
>> > > the context at the correct moments.
>> > >
>> > > Once we have multiple states, we could choose to provide a more
>> > > appropriately
>> > > named method (e.g. getState?) and reimplement isClosing by checking
>> that
>> > > enum
>> > > without breaking compatibility.
>> > >
>> > > However, if we think multiple states here are imminent for some
>> reason, I
>> > > would
>> > > be pretty easy to convince adding that would be worth the extra
>> > complexity!
>> > > :)
>> > >
>> > > Matt
>> > >
>> > > —
>> > > Matt Farmer | Blog <http://farmdawgnation.com/> | Twitter
>> > > <http://twitter.com/farmdawgnation>
>> > > GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07
>> > >
>> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > >
>> > > > The enhancement gives SinkTaskContext state information.
>> > > >
>> > > > Have you thought of exposing the state retrieval as an enum
>> (initially
>> > > with
>> > > > two values) ?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer <m...@frmr.me> wrote:
>> > > >
>> > > > > Hello all,
>> > > > >
>> > > > > I am proposing KIP-275 to improve Connect's SinkTaskContext so
>> that
>> > > Sinks
>> > > > > can be informed
>> > > > > in their preCommit hook if the hook is being invoked as a part of
>> a
>> > > > > rebalance or Connect
>> > > > > shutdown.
>> > > > >
>> > > > > The KIP is here:
>> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
>> > > > action?pageId=75977607
>> > > > >
>> > > > > Please let me know what feedback y'all have. Thanks!
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-04-02 Thread Matt Farmer
I'm can update the rejected alternatives section as you describe.

Also, adding a paragraph to the preCommit javadoc also seems like a
Very Very Good Idea™ so I'll make that update to the KIP as well.

On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com> wrote:

> Thanks for the KIP proposal, Matt.
>
> You mention in the "Rejected Alternatives" section that you considered
> changing the signature of the `preCommit` method but rejected it because it
> would break backward compatibility. Strictly speaking, it is possible to do
> this without breaking compatibility by introducing a new `preCommit`
> method, deprecating the old one, and having the new implementation call the
> old one. Such an approach would be complicated, and I'm not sure it adds
> any value. In fact, one of the benefits of having a context object is that
> we can add methods like the one you're proposing without causing any
> compatibility issues. Anyway, it probably is worth updating this rejected
> alternative to be a bit more precise.
>
> Otherwise, I think this is a good approach, though I'd request that we
> update the `preCommit` JavaDoc to add a paragraph that explains this
> scenario. Thoughts?
>
> Randall
>
> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > I looked at WorkerSinkTask and it seems using a boolean for KIP-275
> should
> > suffice for now.
> >
> > Thanks
> >
> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer <m...@frmr.me> wrote:
> >
> > > Hey Ted,
> > >
> > > I have not, actually!
> > >
> > > Do you think that we're likely to add multiple states here soon?
> > >
> > > My instinct is to keep it simple until there are multiple states that
> we
> > > would want
> > > to consider. I really like the simplicity of just getting a boolean and
> > the
> > > implementation of WorkerSinkTask already passes around a boolean to
> > > indicate this is happening internally. We're really just shuttling that
> > > value into
> > > the context at the correct moments.
> > >
> > > Once we have multiple states, we could choose to provide a more
> > > appropriately
> > > named method (e.g. getState?) and reimplement isClosing by checking
> that
> > > enum
> > > without breaking compatibility.
> > >
> > > However, if we think multiple states here are imminent for some
> reason, I
> > > would
> > > be pretty easy to convince adding that would be worth the extra
> > complexity!
> > > :)
> > >
> > > Matt
> > >
> > > —
> > > Matt Farmer | Blog <http://farmdawgnation.com/> | Twitter
> > > <http://twitter.com/farmdawgnation>
> > > GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07
> > >
> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > The enhancement gives SinkTaskContext state information.
> > > >
> > > > Have you thought of exposing the state retrieval as an enum
> (initially
> > > with
> > > > two values) ?
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer <m...@frmr.me> wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I am proposing KIP-275 to improve Connect's SinkTaskContext so that
> > > Sinks
> > > > > can be informed
> > > > > in their preCommit hook if the hook is being invoked as a part of a
> > > > > rebalance or Connect
> > > > > shutdown.
> > > > >
> > > > > The KIP is here:
> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > action?pageId=75977607
> > > > >
> > > > > Please let me know what feedback y'all have. Thanks!
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Matt Farmer
Thanks for this KIP. I can think of some ways we would apply this.
I, too, am ~ on the compatibility story though, however I'm not sure
which way I'd prefer we go at this moment.

On Thu, Mar 29, 2018 at 4:36 PM, Ismael Juma  wrote:

> Thanks for the KIP. I think this is going in the right direction, but we
> need a better compatibility story. Also, it's worth considering whether we
> want to tackle better wildcard support at the same time.
>
> Ismael
>
> On Thu, Mar 29, 2018 at 6:51 AM, Edoardo Comar  wrote:
>
> > Hi all,
> >
> > We have submitted KIP-277 to give users permission to manage the
> lifecycle
> > of a defined set of topics;
> > the current ACL checks are for permission to create *any* topic and on
> > delete for permission against the *named* topics.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 277+-+Fine+Grained+ACL+for+CreateTopics+API
> >
> > Feedback and suggestions are welcome, thanks.
> >
> > Edo & Mickael
> > --
> >
> > Edoardo Comar
> >
> > IBM Message Hub
> >
> > IBM UK Ltd, Hursley Park, SO21 2JN
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
>


Re: [DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-03-28 Thread Matt Farmer
Hey Ted,

I have not, actually!

Do you think that we're likely to add multiple states here soon?

My instinct is to keep it simple until there are multiple states that we
would want
to consider. I really like the simplicity of just getting a boolean and the
implementation of WorkerSinkTask already passes around a boolean to
indicate this is happening internally. We're really just shuttling that
value into
the context at the correct moments.

Once we have multiple states, we could choose to provide a more
appropriately
named method (e.g. getState?) and reimplement isClosing by checking that
enum
without breaking compatibility.

However, if we think multiple states here are imminent for some reason, I
would
be pretty easy to convince adding that would be worth the extra complexity!
:)

Matt

—
Matt Farmer | Blog <http://farmdawgnation.com/> | Twitter
<http://twitter.com/farmdawgnation>
GPG: CD57 2E26 F60C 0A61 E6D8  FC72 4493 8917 D667 4D07

On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> The enhancement gives SinkTaskContext state information.
>
> Have you thought of exposing the state retrieval as an enum (initially with
> two values) ?
>
> Thanks
>
> On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hello all,
> >
> > I am proposing KIP-275 to improve Connect's SinkTaskContext so that Sinks
> > can be informed
> > in their preCommit hook if the hook is being invoked as a part of a
> > rebalance or Connect
> > shutdown.
> >
> > The KIP is here:
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=75977607
> >
> > Please let me know what feedback y'all have. Thanks!
> >
>


[jira] [Created] (KAFKA-6725) Indicate "isClosing" in the SinkTaskContext

2018-03-28 Thread Matt Farmer (JIRA)
Matt Farmer created KAFKA-6725:
--

 Summary: Indicate "isClosing" in the SinkTaskContext
 Key: KAFKA-6725
 URL: https://issues.apache.org/jira/browse/KAFKA-6725
 Project: Kafka
  Issue Type: New Feature
Reporter: Matt Farmer


Addition of the isClosing method to SinkTaskContext per this KIP.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75977607



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-03-28 Thread Matt Farmer
Hello all,

I am proposing KIP-275 to improve Connect's SinkTaskContext so that Sinks
can be informed
in their preCommit hook if the hook is being invoked as a part of a
rebalance or Connect
shutdown.

The KIP is here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75977607

Please let me know what feedback y'all have. Thanks!


Re: Seeking Feedback on Kafka Connect Issues

2018-03-20 Thread Matt Farmer
Hi Ewen,

Thanks for the thoughtful response. I’m happy to take some time to write up a 
KIP and do some implementation work here.
I did KIP-210 previously, so I’ve been through the process before. We also have 
some organizational interest for improving
Kafka Connect. Our concern internally is that we don’t want to wait on the KIP 
cycle to fully complete before rolling out
something. It was many months for KIP-210 to go from draft to merge.

It might be sufficient for us in the interim to:

(1) Improve the S3 connector using close/open to be smarter about what 
multipart uploads it cancels during rebalance.

(2) Implement a sidecar service that monitors the connect API and restart tasks 
that fail

… and in parallel work on the KIPs required to provide a less patchwork 
solution in the framework itself.

Is such a contribution to the S3 connector something that Confluent would be 
open to?

> On Mar 19, 2018, at 10:00 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> 
> Responses inline.
> 
> On Mon, Mar 19, 2018 at 3:02 PM, Matt Farmer <m...@frmr.me 
> <mailto:m...@frmr.me>> wrote:
> 
>> Hi everyone,
>> 
>> We’ve been experimenting recently with some limited use of Kafka Connect
>> and are hoping to expand to wider use cases soon. However, we had some
>> internal issues that gave us a well-timed preview of error handling
>> behavior in Kafka Connect. I think the fixes for this will require at least
>> three different KIPs, but I want to share some thoughts to get the initial
>> reaction from folks in the dev community. If these ideas seem reasonable, I
>> can go ahead and create the required KIPs.
>> 
>> Here are the three things specifically we ran into…
>> 
>> ---
>> 
>> (1) Kafka Connect only retries tasks when certain exceptions are thrown
>> Currently, Kafka Connect only retries tasks when certain exceptions are
>> thrown - I believe the logic checks to see if the exception is specifically
>> marked as “retryable” and if not, fails. We’d like to bypass this behavior
>> and implement a configurable exponential backoff for tasks regardless of
>> the failure reason. This is probably two changes: one to implement
>> exponential backoff retries for tasks if they don’t already exist and a
>> chance to implement a RetryPolicy interface that evaluates the Exception to
>> determine whether or not to retry.
>> 
> 
> This has definitely come up before. The likely "fix" for this is to provide
> general "bad data handling" options within the framework itself. The
> obvious set would be
> 
> 1. fail fast, which is what we do today (assuming connector actually fails
> and doesn't eat errors)
> 2. retry (possibly with configs to limit)
> 3. drop data and move on
> 4. dead letter queue
> 
> This needs to be addressed in a way that handles errors from:
> 
> 1. The connector itself (e.g. connectivity issues to the other system)
> 2. Converters/serializers (bad data, unexpected format, etc)
> 3. SMTs
> 4. Ideally the fmwk as well (though I don't think we have any known bugs
> where this would be a problem, and we'd be inclined to just fix them
> anyway).
> 
> I think we understand the space of problems and how to address them pretty
> well already, this issue is really just a matter of someone finding the
> time to KIP, implement, and review/implement. (And that review/commit one
> realistically means we need multiple people's time). Happy to guide anyone
> interested on next steps. If not addressed by general community, Confluent
> will get to this at some point, but I couldn't say when that would be --
> Randall might know better than I would.
> 
> 
>> (2) Kafka Connect doesn’t permit Connectors to smartly reposition after
>> rebalance
>> We’re using the S3 connector to dump files with a large number of records
>> into an S3 bucket. About 100,000 records per file. Unfortunately, every
>> time a task fails, the consumer rebalance causes all partitions to get
>> re-shuffled amongst the various partitions. To compensate for this, the
>> connector gets stopped and started from what I can tell from the logs? And
>> then picks up from the last consumer position that was committed to the
>> brokers.
>> 
>> This doesn’t work great if you’re batching things into large numbers for
>> archival.
>> 
>> For the S3 connector, for example: Let’s say I have two partitions and the
>> connector has two tasks to process each of those. Task 0 is at 5,000
>> records read from the last commit and Task 1 is at 70,000 records read from
>> the last commit. Then, boom, something goes wrong with Task 0 and it falls
>> over. This tri

Seeking Feedback on Kafka Connect Issues

2018-03-19 Thread Matt Farmer
Hi everyone,

We’ve been experimenting recently with some limited use of Kafka Connect and 
are hoping to expand to wider use cases soon. However, we had some internal 
issues that gave us a well-timed preview of error handling behavior in Kafka 
Connect. I think the fixes for this will require at least three different KIPs, 
but I want to share some thoughts to get the initial reaction from folks in the 
dev community. If these ideas seem reasonable, I can go ahead and create the 
required KIPs.

Here are the three things specifically we ran into…

---

(1) Kafka Connect only retries tasks when certain exceptions are thrown
Currently, Kafka Connect only retries tasks when certain exceptions are thrown 
- I believe the logic checks to see if the exception is specifically marked as 
“retryable” and if not, fails. We’d like to bypass this behavior and implement 
a configurable exponential backoff for tasks regardless of the failure reason. 
This is probably two changes: one to implement exponential backoff retries for 
tasks if they don’t already exist and a chance to implement a RetryPolicy 
interface that evaluates the Exception to determine whether or not to retry.

(2) Kafka Connect doesn’t permit Connectors to smartly reposition after 
rebalance
We’re using the S3 connector to dump files with a large number of records into 
an S3 bucket. About 100,000 records per file. Unfortunately, every time a task 
fails, the consumer rebalance causes all partitions to get re-shuffled amongst 
the various partitions. To compensate for this, the connector gets stopped and 
started from what I can tell from the logs? And then picks up from the last 
consumer position that was committed to the brokers.

This doesn’t work great if you’re batching things into large numbers for 
archival.

For the S3 connector, for example: Let’s say I have two partitions and the 
connector has two tasks to process each of those. Task 0 is at 5,000 records 
read from the last commit and Task 1 is at 70,000 records read from the last 
commit. Then, boom, something goes wrong with Task 0 and it falls over. This 
triggers a rebalance and Task 1 has to take over the workload. Task 1 will, at 
this point, discard the 70,000 records in its buffer and start from the last 
commit point. This failure mode is brutal for the archival system we’re 
building.

There are two solutions that I can think of to this:

(A) Provide an interface for connectors to define their own rebalance listener. 
This listener could compare the newly assigned list of partitions with a 
previously assigned list. For all partitions that this connector was already 
working on prior to the rebalance, it could manually seek to the last position 
it locally processed before resuming. So, in the scenario above Task 1 could 
keep an accounting file locally and seek over the first 70,000 records without 
reprocessing them. It would then wait until after it confirms the S3 upload to 
commit those offsets back to Kafka. This ensures that if the machine running 
Task 1 dies a new consumer can take its place, but we’ll still benefit from a 
local cache if one is present.

(B) Have connect manually round robin partitions on a topic to tasks and never 
rebalance them automatically. If this were combined with better task retry 
semantics, I think this solution would be simpler.

(3) As far as I can tell, JMX metrics aren’t reporting the number of active 
tasks
This one is arguably the simplest issue to resolve, but we’d like to alert if 
the number of active tasks isn’t what we expect it to be so that we can have a 
human investigate.

---

I would love thoughts on all of the above from anyone on this list.

Thanks,

Matt Farmer

Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-03-19 Thread Matt Farmer
What’s the status of this? This is a pretty hard blocker for us to meet 
requirements internally to deploy connect in a distributed fashion.

@Ewen - Regarding the concern of accessing information securely - has there 
been any consideration of adding authentication to the connect api?

> On Jan 17, 2018, at 3:55 PM, Randall Hauch  wrote:
> 
> Vincent,
> 
> Can the KIP more explicitly say that this is opt-in, and that by default
> nothing will change?
> 
> Randall
> 
> On Tue, Jan 16, 2018 at 11:18 PM, Ewen Cheslack-Postava 
> wrote:
> 
>> Vincent,
>> 
>> I think with the addition of a configuration to control this for
>> compatibility, people would generally be ok with it. If you want to start a
>> VOTE thread, the KIP deadline is coming up and the PR looks pretty small. I
>> will take a pass at reviewing the PR so we'll be ready to merge if we can
>> get the KIP voted through.
>> 
>> Thanks,
>> Ewen
>> 
>> On Fri, Jan 12, 2018 at 10:18 AM, Vincent Meng  wrote:
>> 
>>> @Ted: The issue is kinda hard to reproduce. It's just something we
>> observe
>>> over time.
>>> 
>>> @Ewen: I agree. Opt-in seems to be a good solution to me. To your
>> question,
>>> if there is no ConfDef that defines which fields are Passwords we can
>> just
>>> return the config as is.
>>> 
>>> There is a PR for this KIP already. Comments/Discussions are welcome.
>>> https://github.com/apache/kafka/pull/4269
>>> 
>>> On Tue, Jan 2, 2018 at 8:52 PM, Ewen Cheslack-Postava >> 
>>> wrote:
>>> 
 Vincent,
 
 Thanks for the KIP. This is definitely an issue we know is a problem
>> for
 some users.
 
 I think the major problem with the KIP as-is is that it makes it
>>> impossible
 to get the original value back out of the API. This KIP probably ties
>> in
 significantly with ideas for securing the REST API (SSL) and adding
>> ACLs
>>> to
 it. Both are things we know people want, but haven't happened yet.
>>> However,
 it also interacts with other approaches to adding those features, e.g.
 layering proxies on top of the existing API (e.g. nginx, apache, etc).
>>> Just
 doing a blanket replacement of password values with a constant would
>>> likely
 break things for people who secure things via a proxy (and may just not
 allow reads of configs unless the user is authorized for the particular
 connector). These are the types of concerns we like to think through in
>>> the
 compatibility section. One option to get the masking functionality in
 without depending on a bunch of other security improvements might be to
 make this configurable so users that need this (and can forgo seeing a
 valid config via the API) can opt-in.
 
 Regarding your individual points:
 
 * I don't think the particular value for the masked content matters
>> much.
 Any constant indicating a password field is good. Your value seems fine
>>> to
 me.
 * I don't think ConnectorInfo has enough info on its own to do proper
 masking. In fact, I think you need to parse the config enough to get
>> the
 Connector-specific ConfigDef out in order to determine which fields are
 Password fields. I would probably try to push this to be as central as
 possible, maybe adding a method to AbstractHerder that can get configs
>>> with
 a boolean indicating whether they need to have sensitive fields
>> removed.
 That method could deal with parsing the config to get the right
>>> connector,
 getting the connector config, and then sanitizing any configs that are
 sensitive. We could have this in one location, then have the relevant
>>> REST
 APIs just use the right flag to determine if they get sanitized or
 unsanitized data.
 
 That second point raises another interesting point -- what happens if
>> the
 connector configuration references a connector which the worker serving
>>> the
 REST request *does not know about*? In that case, there will be no
 corresponding ConfigDef that defines which fields are Passwords and
>> need
>>> to
 be sensitized. Does it return an error? Or just return the config as
>> is?
 
 -Ewen
 
 On Thu, Dec 28, 2017 at 3:34 AM, Ted Yu  wrote:
 
> For the last point you raised, can you come up with a unit test that
 shows
> what you observed ?
> 
> Cheers
> 
> On Mon, Dec 18, 2017 at 11:14 AM, Vincent Meng 
>> wrote:
> 
>> Hi all,
>> 
>> I've created KIP-242, a proposal to secure credentials in kafka
>>> connect
>> rest endpoint.
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 242%3A+Mask+password+in+Kafka+Connect+Rest+API+response
>> 
>> Here are something I'd like to discuss:
>> 
>>   - The "masked" value is set to "*" (9 stars) currently.
>>> It's
> an
>>   arbitrary 

Re: [VOTE] KIP-212: Enforce set of legal characters for connector names

2018-01-23 Thread Matt Farmer
+1 from me (non-binding) =)

> On Jan 22, 2018, at 7:35 PM, Sönke Liebau 
>  wrote:
> 
> All,
> 
> this KIP has been discussed for quite some time now and I believe we
> addressed all major concerns in the current revision, so I'd like to
> start a vote.
> 
> KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-212%3A+Enforce+set+of+legal+characters+for+connector+names
> 
> Let me know what you think.
> 
> Kind regards,
> Sönke



Re: Kafka compacted topic question.

2018-01-19 Thread Matt Farmer
Yeah, and I thought I answered your question? I think the compaction happens 
when new segments are created.

Sorry if I’m still misunderstanding.

> On Jan 19, 2018, at 3:55 PM, Rahul Bhattacharjee <rahul.rec@gmail.com> 
> wrote:
> 
> Thanks Matt for the response .I was asking about the log compaction
> <https://kafka.apache.org/documentation/#compaction> of kafka topics.
> 
> On Fri, Jan 19, 2018 at 12:36 PM, Matt Farmer <m...@frmr.me> wrote:
> 
>> Someone will need to correct me if I’m wrong, but my understanding is that
>> a topic log on disk is divided into segments. Compaction will occur when a
>> segment “rolls off” - so when a new active segment is created and the
>> previous segment becomes inactive.
>> 
>> Segments can be bounded by size and time in topic and broker configuration
>> to get the effect that you want.
>> 
>>> On Jan 19, 2018, at 2:10 PM, Rahul Bhattacharjee <
>> rahul.rec@gmail.com> wrote:
>>> 
>>> Let's say we have a compacted topic (log.cleanup.policy=compact) where
>> lot
>>> of updates happen for relatively small set of keys.
>>> My question is when does the compaction happen.
>>> 
>>> In memtable , when a new update comes for an already existing key in
>>> memtable , the value is simple replaced.
>>> or,
>>> all the updates are associated with a offset , later the memtable is
>>> spilled to disk and the deletion happens during compaction phase.
>>> 
>>> thanks,
>>> Rahul
>> 
>> 



Re: Kafka compacted topic question.

2018-01-19 Thread Matt Farmer
Someone will need to correct me if I’m wrong, but my understanding is that a 
topic log on disk is divided into segments. Compaction will occur when a 
segment “rolls off” - so when a new active segment is created and the previous 
segment becomes inactive.

Segments can be bounded by size and time in topic and broker configuration to 
get the effect that you want.

> On Jan 19, 2018, at 2:10 PM, Rahul Bhattacharjee  
> wrote:
> 
> Let's say we have a compacted topic (log.cleanup.policy=compact) where lot
> of updates happen for relatively small set of keys.
> My question is when does the compaction happen.
> 
> In memtable , when a new update comes for an already existing key in
> memtable , the value is simple replaced.
> or,
> all the updates are associated with a offset , later the memtable is
> spilled to disk and the deletion happens during compaction phase.
> 
> thanks,
> Rahul



Re: How to always consume from latest offset in kafka-streams

2018-01-19 Thread Matt Farmer
That config setting will only work if there are no offsets stored in the 
consumer offsets target.

Something I’ve done in the past is to make the application.id config setting 
have a random string component to it. So have “my-app-name-[randomchars]” or 
some such. This ensures that there are never pre-existing offsets. Would that 
be an acceptable method for you?

Matt

> On Jan 19, 2018, at 12:22 PM, Saloni Vithalani  
> wrote:
> 
> Our requirement is such that if a kafka-stream app is consuming a
> partition, it should start it's consumption from latest offset of that
> partition.
> 
> This seems like do-able using
> 
> streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
> 
> Now, let's say using above configuration, the kafka-stream app started
> consuming data from latest offset for a partition. And after some time, the
> app crashes. When the app comes back live, we want it to consume data from
> the latest offset of that partition, instead of the where it left last
> reading.
> 
> But I can't find anything that can help achieve it using kafka-streams api.
> 
> P.S. We are using kafka-1.0.0.
> 
> Saloni Vithalani
> Developer
> Email salo...@thoughtworks.com
> Telephone +91 8552889571 <8552889571>
> [image: ThoughtWorks]
> 



Re: [VOTE] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-12 Thread Matt Farmer
Thank you, Gwen! =)

I think that puts us over the finish line. Unless I hear any objections in
the next 12(ish) hours I’ll move the KIP to accepted on the wiki.



On December 12, 2017 at 7:50:23 PM, Gwen Shapira (g...@confluent.io) wrote:

+1 (binding) - looks awesome.

On Tue, Dec 12, 2017 at 10:42 AM Matt Farmer <m...@frmr.me> wrote:

> Current tally here is 2 binding +1s, 4 non-binding +1s.
>
> The remaining remarks on the PR seem to mostly be nits, so I feel like
> we’ve converged a bit. If a committer could take a look and either leave
me
> some feedback on the discussion thread or give me a +1, I’d really
> appreciate it. :)
>
> Thanks!
>
>
> On December 6, 2017 at 2:07:08 PM, Matthias J. Sax (matth...@confluent.io)

> wrote:
>
> +1
>
>
>
> On 12/6/17 7:54 AM, Bill Bejeck wrote:
> > +1
> >
> > On Wed, Dec 6, 2017 at 9:54 AM, Matt Farmer <m...@frmr.me> wrote:
> >
> >> Bumping this thread so it’s visible given that the conversation on
> KIP-210
> >> has converged again.
> >>
> >> Current tally is 2 binding +1s, and 2 non-binding +1s.
> >>
> >> On November 8, 2017 at 12:26:32 PM, Damian Guy (damian@gmail.com)
> >> wrote:
> >>
> >> +1 (binding)
> >>
> >> On Sat, 4 Nov 2017 at 16:50 Matthias J. Sax <matth...@confluent.io>
> wrote:
> >>
> >>> Yes. A KIP needs 3 binding "+1" to be accepted.
> >>>
> >>> You can still work on the PR and get it ready to get merged -- I am
> >>> quite confident that this KIP will be accepted :)
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 11/4/17 3:56 PM, Matt Farmer wrote:
> >>>> Bump! I believe I need two more binding +1's to proceed?
> >>>>
> >>>> On Thu, Nov 2, 2017 at 11:49 AM Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>
> >>>>> +1
> >>>>>
> >>>>> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> +1 (binding) from me. Thanks!
> >>>>>>
> >>>>>> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> The vote should stay open for at least 72 hours. The bylaws can
be
> >>>>> found
> >>>>>>> here https://cwiki.apache.org/confluence/display/KAFKA/Bylaws
> >>>>>>>
> >>>>>>> On Wed, Nov 1, 2017 at 8:09 AM, Matt Farmer <m...@frmr.me> wrote:
> >>>>>>>
> >>>>>>>> Hello all,
> >>>>>>>>
> >>>>>>>> It seems like discussion around KIP-210 has gone to a lull. I've
> >> got
> >>>>>> some
> >>>>>>>> candidate work underway for it already, so I'd like to go ahead
> and
> >>>>> call
> >>>>>>>> it
> >>>>>>>> to a vote.
> >>>>>>>>
> >>>>>>>> For reference, the KIP can be found here:
> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+
> >>>>>>>>
> >>>
Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce
> >>>>>>>>
> >>>>>>>> Also, how long to vote threads stay open generally before
changing
> >>> the
> >>>>>>>> status of the KIP?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Matt
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>


Re: [VOTE] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-12 Thread Matt Farmer
Current tally here is 2 binding +1s, 4 non-binding +1s.

The remaining remarks on the PR seem to mostly be nits, so I feel like
we’ve converged a bit. If a committer could take a look and either leave me
some feedback on the discussion thread or give me a +1, I’d really
appreciate it. :)

Thanks!


On December 6, 2017 at 2:07:08 PM, Matthias J. Sax (matth...@confluent.io)
wrote:

+1



On 12/6/17 7:54 AM, Bill Bejeck wrote:
> +1
>
> On Wed, Dec 6, 2017 at 9:54 AM, Matt Farmer <m...@frmr.me> wrote:
>
>> Bumping this thread so it’s visible given that the conversation on
KIP-210
>> has converged again.
>>
>> Current tally is 2 binding +1s, and 2 non-binding +1s.
>>
>> On November 8, 2017 at 12:26:32 PM, Damian Guy (damian@gmail.com)
>> wrote:
>>
>> +1 (binding)
>>
>> On Sat, 4 Nov 2017 at 16:50 Matthias J. Sax <matth...@confluent.io>
wrote:
>>
>>> Yes. A KIP needs 3 binding "+1" to be accepted.
>>>
>>> You can still work on the PR and get it ready to get merged -- I am
>>> quite confident that this KIP will be accepted :)
>>>
>>>
>>> -Matthias
>>>
>>> On 11/4/17 3:56 PM, Matt Farmer wrote:
>>>> Bump! I believe I need two more binding +1's to proceed?
>>>>
>>>> On Thu, Nov 2, 2017 at 11:49 AM Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>>>>
>>>>>> +1 (binding) from me. Thanks!
>>>>>>
>>>>>> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> The vote should stay open for at least 72 hours. The bylaws can be
>>>>> found
>>>>>>> here https://cwiki.apache.org/confluence/display/KAFKA/Bylaws
>>>>>>>
>>>>>>> On Wed, Nov 1, 2017 at 8:09 AM, Matt Farmer <m...@frmr.me> wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> It seems like discussion around KIP-210 has gone to a lull. I've
>> got
>>>>>> some
>>>>>>>> candidate work underway for it already, so I'd like to go ahead
and
>>>>> call
>>>>>>>> it
>>>>>>>> to a vote.
>>>>>>>>
>>>>>>>> For reference, the KIP can be found here:
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+
>>>>>>>>
>>> Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce
>>>>>>>>
>>>>>>>> Also, how long to vote threads stay open generally before changing
>>> the
>>>>>>>> status of the KIP?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Matt
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>


Re: [VOTE] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-06 Thread Matt Farmer
Bumping this thread so it’s visible given that the conversation on KIP-210
has converged again.

Current tally is 2 binding +1s, and 2 non-binding +1s.

On November 8, 2017 at 12:26:32 PM, Damian Guy (damian@gmail.com) wrote:

+1 (binding)

On Sat, 4 Nov 2017 at 16:50 Matthias J. Sax <matth...@confluent.io> wrote:

> Yes. A KIP needs 3 binding "+1" to be accepted.
>
> You can still work on the PR and get it ready to get merged -- I am
> quite confident that this KIP will be accepted :)
>
>
> -Matthias
>
> On 11/4/17 3:56 PM, Matt Farmer wrote:
> > Bump! I believe I need two more binding +1's to proceed?
> >
> > On Thu, Nov 2, 2017 at 11:49 AM Ted Yu <yuzhih...@gmail.com> wrote:
> >
> >> +1
> >>
> >> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >>
> >>> +1 (binding) from me. Thanks!
> >>>
> >>> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
> >> wrote:
> >>>
> >>>> The vote should stay open for at least 72 hours. The bylaws can be
> >> found
> >>>> here https://cwiki.apache.org/confluence/display/KAFKA/Bylaws
> >>>>
> >>>> On Wed, Nov 1, 2017 at 8:09 AM, Matt Farmer <m...@frmr.me> wrote:
> >>>>
> >>>>> Hello all,
> >>>>>
> >>>>> It seems like discussion around KIP-210 has gone to a lull. I've
got
> >>> some
> >>>>> candidate work underway for it already, so I'd like to go ahead and
> >> call
> >>>>> it
> >>>>> to a vote.
> >>>>>
> >>>>> For reference, the KIP can be found here:
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+
> >>>>>
> Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce
> >>>>>
> >>>>> Also, how long to vote threads stay open generally before changing
> the
> >>>>> status of the KIP?
> >>>>>
> >>>>> Cheers,
> >>>>> Matt
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> -- Guozhang
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>>
> >>
> >
>
>


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-06 Thread Matt Farmer
There is already a vote thread for this KIP. I can bump it so that it’s
towards the top of your inbox.

With regard to your concerns:

1) We do not have the "ProductionExceptionHandler" interface defined in the
wiki page, thought it is sort of clear that it is a one-function interface
with record and exception. Could you add it?


It is defined, it’s just not defined using a code snippet. The KIP reads as
follows:

===

A public interface named ProductionExceptionHandler with a single method,
handle, that has the following signature:

   - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
   byte[]> record, Exception exception)


===

If you’d like me to add a code snippet illustrating this that’s simple for
me to do, but it seemed superfluous.

2) A quick question about your example code: where would be the "logger"
object be created?


SLF4J loggers are typically created as a class member in the class. Such as:

private Logger logger = LoggerFactory.getLogger(HelloWorld.class);

I omit that in my implementation examples for brevity.

On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangg...@gmail.com) wrote:

Hello Matt,

Thanks for writing up the KIP. I made a pass over it and here is a few
minor comments. I think you can consider starting a voting thread for this
KIP while addressing them.

1) We do not have the "ProductionExceptionHandler" interface defined in the
wiki page, thought it is sort of clear that it is a one-function interface
with record and exception. Could you add it?

2) A quick question about your example code: where would be the "logger"
object be created? Note that the implementation of this interface have to
give a non-param constructor, or as a static field of the class but in that
case you would not be able to log which instance is throwing this error (we
may have multiple producers within a single instance, even within a
thread). Just a reminder to consider in your implementation.


Guozhang

On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks a lot for the update! Great write-up! Very clearly explained what
> the change will look like!
>
> Looks good to me. No further comments from my side.
>
>
> -Matthias
>
>
> On 12/5/17 9:14 AM, Matt Farmer wrote:
> > I have updated this KIP accordingly.
> >
> > Can you please take a look and let me know if what I wrote looks
correct
> to
> > you?
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 210+-+Provide+for+custom+error+handling++when+Kafka+
> Streams+fails+to+produce
> >
> > Thanks!
> >
> > Matt
> >
> >
> > On December 4, 2017 at 9:39:13 PM, Matt Farmer (m...@frmr.me) wrote:
> >
> > Hey Matthias, thanks for getting back to me.
> >
> > That's fine. But if we add it to `test` package, we don't need to talk
> > about it in the KIP. `test` is not public API.
> >
> > Yes, that makes sense. It was in the KIP originally because I was, at
one
> > point, planning on including it. We can remove it now that we’ve
decided
> we
> > won’t include it in the public API.
> >
> > Understood. That makes sense. We should explain this clearly in the KIP
> > and maybe log all other following exceptions at DEBUG level?
> >
> >
> > I thought it was clear in the KIP, but I can go back and double check
my
> > wording and revise it to try and make it clearer.
> >
> > I’ll take a look at doing more work on the KIP and the Pull Request
> > tomorrow.
> >
> > Thanks again!
> >
> > On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
> matth...@confluent.io)
> > wrote:
> >
> > Hey,
> >
> > About your questions:
> >
> >>>> Acknowledged, so is ProducerFencedException the only kind of
> exception I
> >>>> need to change my behavior on? Or are there other types I need to
> > check? Is
> >>>> there a comprehensive list somewhere?
> >
> > I cannot think if any other atm. We should list all fatal exceptions
for
> > which we don't call the handler and explain why (exception is "global"
> > and will affect all other records, too | ProducerFenced is
self-healing).
> >
> > We started to collect and categorize exception here (not completed
yet):
> > https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> > :
> >
> > This list should be a good starting point though.
> >
> >> I include it in the test package because I have tests that assert that
> if
> >> the record collector impl encounters an Exception and receives a
> CONTINUE
> >> t

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-05 Thread Matt Farmer
I have updated this KIP accordingly.

Can you please take a look and let me know if what I wrote looks correct to
you?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce

Thanks!

Matt


On December 4, 2017 at 9:39:13 PM, Matt Farmer (m...@frmr.me) wrote:

Hey Matthias, thanks for getting back to me.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

Yes, that makes sense. It was in the KIP originally because I was, at one
point, planning on including it. We can remove it now that we’ve decided we
won’t include it in the public API.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


I thought it was clear in the KIP, but I can go back and double check my
wording and revise it to try and make it clearer.

I’ll take a look at doing more work on the KIP and the Pull Request
tomorrow.

Thanks again!

On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (matth...@confluent.io)
wrote:

Hey,

About your questions:

>>> Acknowledged, so is ProducerFencedException the only kind of exception I
>>> need to change my behavior on? Or are there other types I need to
check? Is
>>> there a comprehensive list somewhere?

I cannot think if any other atm. We should list all fatal exceptions for
which we don't call the handler and explain why (exception is "global"
and will affect all other records, too | ProducerFenced is self-healing).

We started to collect and categorize exception here (not completed yet):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
:

This list should be a good starting point though.

> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


-Matthias


On 12/1/17 11:43 AM, Matt Farmer wrote:
> Bump! It's been three days here and I haven't seen any further feedback.
> Eager to get this resolved, approved, and merged. =)
>
> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <m...@frmr.me> wrote:
>
>> Hi there, sorry for the delay in responding. Last week had a holiday and
>> several busy work days in it so I'm just now getting around to
responding.
>>
>>> We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called.
>> [...]
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>
>> Acknowledged, so is ProducerFencedException the only kind of exception I
>> need to change my behavior on? Or are there other types I need to check?
Is
>> there a comprehensive list somewhere?
>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?)
>>
>> I include it in the test package because I have tests that assert that if
>> the record collector impl encounters an Exception and receives a CONTINUE
>> that it actually does CONTINUE.
>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
>>
>> Once I get the answers to these questions I can make another pass on the
>> pull request!
>>
>> Matt
>>
>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Thanks for following up!
>>>
>>> One thought about an older reply from you:
>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
to
>>>>>>> make a de

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-04 Thread Matt Farmer
Hey Matthias, thanks for getting back to me.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

Yes, that makes sense. It was in the KIP originally because I was, at one
point, planning on including it. We can remove it now that we’ve decided we
won’t include it in the public API.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


I thought it was clear in the KIP, but I can go back and double check my
wording and revise it to try and make it clearer.

I’ll take a look at doing more work on the KIP and the Pull Request
tomorrow.

Thanks again!

On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (matth...@confluent.io)
wrote:

Hey,

About your questions:

>>> Acknowledged, so is ProducerFencedException the only kind of exception
I
>>> need to change my behavior on? Or are there other types I need to
check? Is
>>> there a comprehensive list somewhere?

I cannot think if any other atm. We should list all fatal exceptions for
which we don't call the handler and explain why (exception is "global"
and will affect all other records, too | ProducerFenced is self-healing).

We started to collect and categorize exception here (not completed yet):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions:


This list should be a good starting point though.

> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


-Matthias


On 12/1/17 11:43 AM, Matt Farmer wrote:
> Bump! It's been three days here and I haven't seen any further feedback.
> Eager to get this resolved, approved, and merged. =)
>
> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <m...@frmr.me> wrote:
>
>> Hi there, sorry for the delay in responding. Last week had a holiday and
>> several busy work days in it so I'm just now getting around to
responding.
>>
>>> We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called.
>> [...]
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>
>> Acknowledged, so is ProducerFencedException the only kind of exception I
>> need to change my behavior on? Or are there other types I need to check?
Is
>> there a comprehensive list somewhere?
>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?)
>>
>> I include it in the test package because I have tests that assert that
if
>> the record collector impl encounters an Exception and receives a
CONTINUE
>> that it actually does CONTINUE.
>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been
returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
>>
>> Once I get the answers to these questions I can make another pass on the
>> pull request!
>>
>> Matt
>>
>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Thanks for following up!
>>>
>>> One thought about an older reply from you:
>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
to
>>>>>>> make a decision for streams. There may also be desirable side
>>> effects that
>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>>> as well. We should s

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-12-01 Thread Matt Farmer
Bump! It's been three days here and I haven't seen any further feedback.
Eager to get this resolved, approved, and merged. =)

On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <m...@frmr.me> wrote:

> Hi there, sorry for the delay in responding. Last week had a holiday and
> several busy work days in it so I'm just now getting around to responding.
>
> > We would only exclude
> > exception Streams can handle itself (like ProducerFencedException) --
> > thus, if the handler has code to react to this, it would not be bad, as
> > this code is just never called.
> [...]
> > Thus, I am still in favor of not calling the ProductionExceptionHandler
> > for fatal exception.
>
> Acknowledged, so is ProducerFencedException the only kind of exception I
> need to change my behavior on? Or are there other types I need to check? Is
> there a comprehensive list somewhere?
>
> > About the "always continue" case. Sounds good to me to remove it (not
> > sure why we need it in test package?)
>
> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.
>
> > What is there reasoning for invoking the handler only for the first
> > exception?
>
> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.
>
> Once I get the answers to these questions I can make another pass on the
> pull request!
>
> Matt
>
> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> Thanks for following up!
>>
>> One thought about an older reply from you:
>>
>> >>>> I strongly disagree here. The purpose of this handler isn't *just* to
>> >>>> make a decision for streams. There may also be desirable side
>> effects that
>> >>>> users wish to cause when production exceptions occur. There may be
>> >>>> side-effects that they wish to cause when AuthenticationExceptions
>> occur,
>> >>>> as well. We should still give them the hooks to preform those side
>> effects.
>>
>> And your follow up:
>>
>> >>- I think I would rather invoke it for all exceptions that could
>> occur
>> >>from attempting to produce - even those exceptions were returning
>> CONTINUE
>> >>may not be a good idea (e.g. Authorization exception). Until there
>> is a
>> >>different type for exceptions that are totally fatal (for example a
>> >>FatalStreamsException or some sort), maintaining a list of
>> exceptions that
>> >>you can intercept with this handler and exceptions you cannot would
>> be
>> >>really error-prone and hard to keep correct.
>>
>> I understand what you are saying, however, consider that Streams needs
>> to die for a fatal exception. Thus, if you call the handler for a fatal
>> exception, we would  need to ignore the return value and fail -- this
>> seems to be rather intuitive. Furthermore, users can register an
>> uncaught-exception-handler and side effects for fatal errors can be
>> triggered there.
>>
>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>> "totally fatal" exception -- there is no superlative to fatal from my
>> understanding.
>>
>> About maintaining a list of exceptions: I don't think this is too hard,
>> and users also don't need to worry about it IMHO. We would only exclude
>> exception Streams can handle itself (like ProducerFencedException) --
>> thus, if the handler has code to react to this, it would not be bad, as
>> this code is just never called. In case that there is an exception
>> Streams could actually handle but we still call the handler (ie, bug),
>> there should be no harm either -- the handler needs to return either
>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>> dies---as request by the user(!)---even if Streams could actually handle
>> the error and move on. But this seems to be not a issue from my point of
>> view.
>>
>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>> for fatal exception.
>>
>>
>>
>> About the "always continue" case. Sounds good to me to remove it (not
>> sure why we need it in test package?) and to r

Re: [DISCUSS]: KIP-230: Name Windowing Joins

2017-11-29 Thread Matt Farmer
Hi Matthias,

I certainly have found the auto-generated names unwieldy while doing
cluster administration.

I will point out that your KIP doesn't outline what would happen if you
picked a name that resulted in a non unique topic name? What would be the
error handling behavior there?

On Wed, Nov 29, 2017 at 9:03 AM Matthias Margush 
wrote:

> Hi everyone,
>
> I created this KIP to allow windowing joins to be named. If named, then the
> associated internal topic names would be derived from that, instead of
> being randomly generated.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+230%3A+Name+Windowing+Joins
>
> Thanks,
>
> Matthias
>


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-28 Thread Matt Farmer
Hi there, sorry for the delay in responding. Last week had a holiday and
several busy work days in it so I'm just now getting around to responding.

> We would only exclude
> exception Streams can handle itself (like ProducerFencedException) --
> thus, if the handler has code to react to this, it would not be bad, as
> this code is just never called.
[...]
> Thus, I am still in favor of not calling the ProductionExceptionHandler
> for fatal exception.

Acknowledged, so is ProducerFencedException the only kind of exception I
need to change my behavior on? Or are there other types I need to check? Is
there a comprehensive list somewhere?

> About the "always continue" case. Sounds good to me to remove it (not
> sure why we need it in test package?)

I include it in the test package because I have tests that assert that if
the record collector impl encounters an Exception and receives a CONTINUE
that it actually does CONTINUE.

> What is there reasoning for invoking the handler only for the first
> exception?

I didn't want to invoke the handler in places where the CONTINUE or FAIL
result would just be ignored. Presumably, after a FAIL has been returned,
subsequent exceptions are likely to be repeats or noise from my
understanding of the code paths. If this is incorrect we can revisit.

Once I get the answers to these questions I can make another pass on the
pull request!

Matt

On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks for following up!
>
> One thought about an older reply from you:
>
> >>>> I strongly disagree here. The purpose of this handler isn't *just* to
> >>>> make a decision for streams. There may also be desirable side effects
> that
> >>>> users wish to cause when production exceptions occur. There may be
> >>>> side-effects that they wish to cause when AuthenticationExceptions
> occur,
> >>>> as well. We should still give them the hooks to preform those side
> effects.
>
> And your follow up:
>
> >>- I think I would rather invoke it for all exceptions that could
> occur
> >>from attempting to produce - even those exceptions were returning
> CONTINUE
> >>may not be a good idea (e.g. Authorization exception). Until there
> is a
> >>different type for exceptions that are totally fatal (for example a
> >>FatalStreamsException or some sort), maintaining a list of
> exceptions that
> >>you can intercept with this handler and exceptions you cannot would
> be
> >>really error-prone and hard to keep correct.
>
> I understand what you are saying, however, consider that Streams needs
> to die for a fatal exception. Thus, if you call the handler for a fatal
> exception, we would  need to ignore the return value and fail -- this
> seems to be rather intuitive. Furthermore, users can register an
> uncaught-exception-handler and side effects for fatal errors can be
> triggered there.
>
> Btw: an AuthorizationException is fatal -- not sure what you mean by an
> "totally fatal" exception -- there is no superlative to fatal from my
> understanding.
>
> About maintaining a list of exceptions: I don't think this is too hard,
> and users also don't need to worry about it IMHO. We would only exclude
> exception Streams can handle itself (like ProducerFencedException) --
> thus, if the handler has code to react to this, it would not be bad, as
> this code is just never called. In case that there is an exception
> Streams could actually handle but we still call the handler (ie, bug),
> there should be no harm either -- the handler needs to return either
> CONTINUE or FAIL and we would obey. It could only happen, that Streams
> dies---as request by the user(!)---even if Streams could actually handle
> the error and move on. But this seems to be not a issue from my point of
> view.
>
> Thus, I am still in favor of not calling the ProductionExceptionHandler
> for fatal exception.
>
>
>
> About the "always continue" case. Sounds good to me to remove it (not
> sure why we need it in test package?) and to rename the "failing"
> handler to "Default" (even if "default" is less descriptive and I would
> still prefer "Fail" in the name).
>
>
> Last question:
>
> >>   - Continue to *only* invoke it on the first exception that we
> >>   encounter (before sendException is set)
>
>
> What is there reasoning for invoking the handler only for the first
> exception?
>
>
>
>
> -Matthias
>
> On 11/20/17 10:43 AM, Matt Farmer wrote:
> > Alright, here are some updates I'm planning to make after thinking on

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-20 Thread Matt Farmer
Alright, here are some updates I'm planning to make after thinking on this
for awhile:

   - Given that the "always continue" handler isn't something I'd recommend
   for production use as is, I'm going to move it into the test namespace and
   remove it from mention in the public API.
   - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
   "DefaultProductionExceptionHandler"
   - Given that the API for the exception handler involves being invoked by
   streams to make a decision about whether or not to continue — I think that
   we should:
  - Continue to *only* invoke it on the first exception that we
  encounter (before sendException is set)
  - Stop invoking it for the self-healing fenced exceptions.
   - I think I would rather invoke it for all exceptions that could occur
   from attempting to produce - even those exceptions were returning CONTINUE
   may not be a good idea (e.g. Authorization exception). Until there is a
   different type for exceptions that are totally fatal (for example a
   FatalStreamsException or some sort), maintaining a list of exceptions that
   you can intercept with this handler and exceptions you cannot would be
   really error-prone and hard to keep correct.
  - I'm happy to file a KIP for the creation of this new Exception type
  if there is interest.

@ Matthias — What do you think about the above?

On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <m...@frmr.me> wrote:

> I responded before reading your code review and didn't see the bit about
> how ProducerFencedException is self-healing. This error handling logic is
> *quite* confusing to reason about... I think I'm going to sit down with
> the code a bit more today, but I'm inclined to think that if the fenced
> exceptions are, indeed, self healing that we still invoke the handler but
> ignore its result for those exceptions.
>
> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <m...@frmr.me> wrote:
>
>> Hi there,
>>
>> Following up here...
>>
>> > One tiny comment: I would prefer to remove the "Always" from the
>> handler implementation names -- it sounds "cleaner" to me without it.
>>
>> Let me think on this. I generally prefer expressiveness to clean-ness,
>> and I think that comes out in the names I chose for things. The straw man
>> in my head is the person that tried to substitute in the "AlwaysContinue"
>> variant and the "Always" is a trigger to really consider whether or not
>> they *always* want to try to continue.
>>
>> To be truthful, after some thought, using the "AlwaysContinue" variant
>> isn't something I'd recommend generally in a production system. Ideally
>> these handlers are targeted at handling a specific series of exceptions
>> that a user wants to ignore, and not ignoring all exceptions. More on this
>> in a minute.
>>
>> > For the first category, it seems to not make sense to call the handle but
>> Streams should always fail. If we follow this design, the KIP should list
>> all exceptions for which the handler is not called.
>>
>> I strongly disagree here. The purpose of this handler isn't *just* to
>> make a decision for streams. There may also be desirable side effects that
>> users wish to cause when production exceptions occur. There may be
>> side-effects that they wish to cause when AuthenticationExceptions occur,
>> as well. We should still give them the hooks to preform those side effects.
>>
>> In light of the above, I'm thinking that the
>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>> removed from the public API and moved into tests since that's where it's
>> most useful. The more I think on it, the more I feel that having that in
>> the public API is a landmine. If you agree, then, we can rename the
>> "AlwaysFailProductionExceptionHandler" to
>> "DefaultProductionExceptionHandler".
>>
>> Thoughts?
>>
>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> I just review the PR, and there is one thing we should discuss.
>>>
>>> There are different types of exceptions that could occur. Some apply to
>>> all records (like Authorization exception) while others are for single
>>> records only (like record too large).
>>>
>>> For the first category, it seems to not make sense to call the handle
>>> but Streams should always fail. If we follow this design, the KIP should
>>> list all exceptions for which the handler is not called.
>>>
>>> WDYT?
>>>
>>>
>>> -Matthias

[jira] [Created] (KAFKA-6214) Using standby replicas with an in memory state store causes Streams to crash

2017-11-15 Thread Matt Farmer (JIRA)
Matt Farmer created KAFKA-6214:
--

 Summary: Using standby replicas with an in memory state store 
causes Streams to crash
 Key: KAFKA-6214
 URL: https://issues.apache.org/jira/browse/KAFKA-6214
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.11.0.1
Reporter: Matt Farmer


We decided to start experimenting with Standby Replicas of our State Stores by 
setting the following configuration setting:

{code}
num.standby.replicas=1
{code}

Most applications did okay with this except for one that used an in memory 
state store instead of a persistent state store. With the new configuration, 
the first instance of this application booted fine. When the second instance 
came up, both instances crashed with the following exception:

{code}
java.lang.IllegalStateException: Consumer is not subscribed to any topics or 
assigned any partitions
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1037)
at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeUpdateStandbyTasks(StreamThread.java:752)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:524)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
{code}

Monit attempted to restart both instances but they would just continue to crash 
over and over again. The state store in our problematic application is declared 
like so:

{code}
Stores
.create("TheStateStore")
.withStringKeys()
.withStringValues()
.inMemory()
.build()
{code}

Luckily we had a config switch in place that could turn on an alternate, 
persistent state store. As soon as we flipped to the persistent state store, 
things started working as we expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-14 Thread Matt Farmer
I responded before reading your code review and didn't see the bit about
how ProducerFencedException is self-healing. This error handling logic is
*quite* confusing to reason about... I think I'm going to sit down with the
code a bit more today, but I'm inclined to think that if the fenced
exceptions are, indeed, self healing that we still invoke the handler but
ignore its result for those exceptions.

On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <m...@frmr.me> wrote:

> Hi there,
>
> Following up here...
>
> > One tiny comment: I would prefer to remove the "Always" from the
> handler implementation names -- it sounds "cleaner" to me without it.
>
> Let me think on this. I generally prefer expressiveness to clean-ness, and
> I think that comes out in the names I chose for things. The straw man in my
> head is the person that tried to substitute in the "AlwaysContinue" variant
> and the "Always" is a trigger to really consider whether or not they
> *always* want to try to continue.
>
> To be truthful, after some thought, using the "AlwaysContinue" variant
> isn't something I'd recommend generally in a production system. Ideally
> these handlers are targeted at handling a specific series of exceptions
> that a user wants to ignore, and not ignoring all exceptions. More on this
> in a minute.
>
> > For the first category, it seems to not make sense to call the handle but
> Streams should always fail. If we follow this design, the KIP should list
> all exceptions for which the handler is not called.
>
> I strongly disagree here. The purpose of this handler isn't *just* to
> make a decision for streams. There may also be desirable side effects that
> users wish to cause when production exceptions occur. There may be
> side-effects that they wish to cause when AuthenticationExceptions occur,
> as well. We should still give them the hooks to preform those side effects.
>
> In light of the above, I'm thinking that the
> "AlwaysContinueProductionExceptionHandler" variant should probably be
> removed from the public API and moved into tests since that's where it's
> most useful. The more I think on it, the more I feel that having that in
> the public API is a landmine. If you agree, then, we can rename the
> "AlwaysFailProductionExceptionHandler" to
> "DefaultProductionExceptionHandler".
>
> Thoughts?
>
> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> I just review the PR, and there is one thing we should discuss.
>>
>> There are different types of exceptions that could occur. Some apply to
>> all records (like Authorization exception) while others are for single
>> records only (like record too large).
>>
>> For the first category, it seems to not make sense to call the handle
>> but Streams should always fail. If we follow this design, the KIP should
>> list all exceptions for which the handler is not called.
>>
>> WDYT?
>>
>>
>> -Matthias
>>
>>
>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>> > Just catching up on this KIP.
>> >
>> > One tiny comment: I would prefer to remove the "Always" from the handler
>> > implementation names -- it sounds "cleaner" to me without it.
>> >
>> >
>> > -Matthias
>> >
>> > On 11/5/17 12:57 PM, Matt Farmer wrote:
>> >> It is agreed, then. I've updated the pull request. I'm trying to also
>> >> update the KIP accordingly, but cwiki is being slow and dropping
>> >> connections. I'll try again a bit later but please consider the KIP
>> >> updated for all intents and purposes. heh.
>> >>
>> >> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wangg...@gmail.com>
>> wrote:
>> >>
>> >>> That makes sense.
>> >>>
>> >>>
>> >>> Guozhang
>> >>>
>> >>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <m...@frmr.me> wrote:
>> >>>
>> >>>> Interesting. I'm not sure I agree. I've been bitten many times by
>> >>>> unintentionally shipping code that fails to properly implement
>> logging. I
>> >>>> always discover this at the exact *worst* moment, too. (Normally at
>> 3 AM
>> >>>> during an on-call shift. Hah.) However, if others feel the same way I
>> >>> could
>> >>>> probably be convinced to remove it.
>> >>>>
>> >>>> We could also meet halfway and say that when a customized
>> >

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-14 Thread Matt Farmer
Hi there,

Following up here...

> One tiny comment: I would prefer to remove the "Always" from the handler 
> implementation
names -- it sounds "cleaner" to me without it.

Let me think on this. I generally prefer expressiveness to clean-ness, and
I think that comes out in the names I chose for things. The straw man in my
head is the person that tried to substitute in the "AlwaysContinue" variant
and the "Always" is a trigger to really consider whether or not they
*always* want to try to continue.

To be truthful, after some thought, using the "AlwaysContinue" variant
isn't something I'd recommend generally in a production system. Ideally
these handlers are targeted at handling a specific series of exceptions
that a user wants to ignore, and not ignoring all exceptions. More on this
in a minute.

> For the first category, it seems to not make sense to call the handle but
Streams should always fail. If we follow this design, the KIP should list
all exceptions for which the handler is not called.

I strongly disagree here. The purpose of this handler isn't *just* to make
a decision for streams. There may also be desirable side effects that users
wish to cause when production exceptions occur. There may be side-effects
that they wish to cause when AuthenticationExceptions occur, as well. We
should still give them the hooks to preform those side effects.

In light of the above, I'm thinking that the
"AlwaysContinueProductionExceptionHandler" variant should probably be
removed from the public API and moved into tests since that's where it's
most useful. The more I think on it, the more I feel that having that in
the public API is a landmine. If you agree, then, we can rename the
"AlwaysFailProductionExceptionHandler" to
"DefaultProductionExceptionHandler".

Thoughts?

On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> I just review the PR, and there is one thing we should discuss.
>
> There are different types of exceptions that could occur. Some apply to
> all records (like Authorization exception) while others are for single
> records only (like record too large).
>
> For the first category, it seems to not make sense to call the handle
> but Streams should always fail. If we follow this design, the KIP should
> list all exceptions for which the handler is not called.
>
> WDYT?
>
>
> -Matthias
>
>
> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> > Just catching up on this KIP.
> >
> > One tiny comment: I would prefer to remove the "Always" from the handler
> > implementation names -- it sounds "cleaner" to me without it.
> >
> >
> > -Matthias
> >
> > On 11/5/17 12:57 PM, Matt Farmer wrote:
> >> It is agreed, then. I've updated the pull request. I'm trying to also
> >> update the KIP accordingly, but cwiki is being slow and dropping
> >> connections. I'll try again a bit later but please consider the KIP
> >> updated for all intents and purposes. heh.
> >>
> >> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >>
> >>> That makes sense.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <m...@frmr.me> wrote:
> >>>
> >>>> Interesting. I'm not sure I agree. I've been bitten many times by
> >>>> unintentionally shipping code that fails to properly implement
> logging. I
> >>>> always discover this at the exact *worst* moment, too. (Normally at 3
> AM
> >>>> during an on-call shift. Hah.) However, if others feel the same way I
> >>> could
> >>>> probably be convinced to remove it.
> >>>>
> >>>> We could also meet halfway and say that when a customized
> >>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at
> DEBUG
> >>>> level instead of WARN level. Would that alternative be appealing to
> you?
> >>>>
> >>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Thanks for the updates. I made a pass over the wiki again and it
> looks
> >>>>> good.
> >>>>>
> >>>>> About whether record collector should still internally log the error
> in
> >>>>> addition to what the customized ProductionExceptionHandler does. I
> >>>>> personally would prefer only to log if the returned value is FAIL to
> >>>>> indicate that this thread is going to shu

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-05 Thread Matt Farmer
It is agreed, then. I've updated the pull request. I'm trying to also
update the KIP accordingly, but cwiki is being slow and dropping
connections. I'll try again a bit later but please consider the KIP
updated for all intents and purposes. heh.

On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wangg...@gmail.com> wrote:

> That makes sense.
>
>
> Guozhang
>
> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Interesting. I'm not sure I agree. I've been bitten many times by
> > unintentionally shipping code that fails to properly implement logging. I
> > always discover this at the exact *worst* moment, too. (Normally at 3 AM
> > during an on-call shift. Hah.) However, if others feel the same way I
> could
> > probably be convinced to remove it.
> >
> > We could also meet halfway and say that when a customized
> > ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
> > level instead of WARN level. Would that alternative be appealing to you?
> >
> > On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Thanks for the updates. I made a pass over the wiki again and it looks
> > > good.
> > >
> > > About whether record collector should still internally log the error in
> > > addition to what the customized ProductionExceptionHandler does. I
> > > personally would prefer only to log if the returned value is FAIL to
> > > indicate that this thread is going to shutdown and trigger the
> exception
> > > handler.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <m...@frmr.me> wrote:
> > >
> > > > Hello, a bit later than I'd anticipated, but I've updated this KIP as
> > > > outlined above. The updated KIP is now ready for review again!
> > > >
> > > > On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <m...@frmr.me> wrote:
> > > >
> > > > > Ah. I actually created both of those in the PR and forgot to
> mention
> > > them
> > > > > by name in the KIP! Thanks for pointing out the oversight.
> > > > >
> > > > > I’ll revise the KIP this afternoon accordingly.
> > > > >
> > > > > The logging is actually provided for in the record collector.
> > Whenever
> > > a
> > > > > handler continues it’ll log a warning to ensure that it’s
> > *impossible*
> > > to
> > > > > write a handler that totally suppresses production exceptions from
> > the
> > > > log.
> > > > > As such, the default continue handler just returns the continue
> > value.
> > > I
> > > > > can add details about those semantics to the KIP as well.
> > > > > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> > matth...@confluent.io
> > > >
> > > > > wrote:
> > > > >
> > > > >> One more comment.
> > > > >>
> > > > >> You mention a default implementation for the handler that fails. I
> > > > >> think, this should be part of the public API and thus should have
> a
> > > > >> proper defined named that is mentioned in the KIP.
> > > > >>
> > > > >> We could also add a second implementation for the log-and-move-on
> > > > >> strategy, as both are the two most common cases. This class should
> > > also
> > > > >> be part of public API (so users can just set in the config) with a
> > > > >> proper name.
> > > > >>
> > > > >>
> > > > >> Otherwise, I like the KIP a lot! Thanks.
> > > > >>
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >>
> > > > >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > > > >> > Thanks for the heads up. Yes, I think my changes are compatible
> > with
> > > > >> that
> > > > >> > PR, but there will be a merge conflict that happens whenever one
> > of
> > > > the
> > > > >> PRs
> > > > >> > is merged. Happy to reconcile the changes in my PR if 4148 goes
> in
> > > > >> first. :)
> > > > >> >
> > > > >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> wangg...@gmail.com
> > 

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-05 Thread Matt Farmer
Interesting. I'm not sure I agree. I've been bitten many times by
unintentionally shipping code that fails to properly implement logging. I
always discover this at the exact *worst* moment, too. (Normally at 3 AM
during an on-call shift. Hah.) However, if others feel the same way I could
probably be convinced to remove it.

We could also meet halfway and say that when a customized
ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
level instead of WARN level. Would that alternative be appealing to you?

On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks for the updates. I made a pass over the wiki again and it looks
> good.
>
> About whether record collector should still internally log the error in
> addition to what the customized ProductionExceptionHandler does. I
> personally would prefer only to log if the returned value is FAIL to
> indicate that this thread is going to shutdown and trigger the exception
> handler.
>
>
> Guozhang
>
>
> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hello, a bit later than I'd anticipated, but I've updated this KIP as
> > outlined above. The updated KIP is now ready for review again!
> >
> > On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <m...@frmr.me> wrote:
> >
> > > Ah. I actually created both of those in the PR and forgot to mention
> them
> > > by name in the KIP! Thanks for pointing out the oversight.
> > >
> > > I’ll revise the KIP this afternoon accordingly.
> > >
> > > The logging is actually provided for in the record collector. Whenever
> a
> > > handler continues it’ll log a warning to ensure that it’s *impossible*
> to
> > > write a handler that totally suppresses production exceptions from the
> > log.
> > > As such, the default continue handler just returns the continue value.
> I
> > > can add details about those semantics to the KIP as well.
> > > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <matth...@confluent.io
> >
> > > wrote:
> > >
> > >> One more comment.
> > >>
> > >> You mention a default implementation for the handler that fails. I
> > >> think, this should be part of the public API and thus should have a
> > >> proper defined named that is mentioned in the KIP.
> > >>
> > >> We could also add a second implementation for the log-and-move-on
> > >> strategy, as both are the two most common cases. This class should
> also
> > >> be part of public API (so users can just set in the config) with a
> > >> proper name.
> > >>
> > >>
> > >> Otherwise, I like the KIP a lot! Thanks.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > >> > Thanks for the heads up. Yes, I think my changes are compatible with
> > >> that
> > >> > PR, but there will be a merge conflict that happens whenever one of
> > the
> > >> PRs
> > >> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> > >> first. :)
> > >> >
> > >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wangg...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> That sounds reasonable, thanks Matt.
> > >> >>
> > >> >> As for the implementation, please note that there is another
> ongoing
> > PR
> > >> >> that may touch the same classes that you are working on:
> > >> >> https://github.com/apache/kafka/pull/4148
> > >> >>
> > >> >> So it may help if you can also take a look at that PR and see if it
> > is
> > >> >> compatible with your changes.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Guozhang
> > >> >>
> > >> >>
> > >> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <m...@frmr.me>
> wrote:
> > >> >>
> > >> >>> I've opened this pull request to implement the KIP as currently
> > >> written:
> > >> >>> https://github.com/apache/kafka/pull/4165. It still needs some
> > tests
> > >> >>> added,
> > >> >>> but largely represents the shape I was going for.
> > >> >>>
> > >> >>> If there are more points that folks would like to

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-05 Thread Matt Farmer
Hello, a bit later than I'd anticipated, but I've updated this KIP as
outlined above. The updated KIP is now ready for review again!

On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <m...@frmr.me> wrote:

> Ah. I actually created both of those in the PR and forgot to mention them
> by name in the KIP! Thanks for pointing out the oversight.
>
> I’ll revise the KIP this afternoon accordingly.
>
> The logging is actually provided for in the record collector. Whenever a
> handler continues it’ll log a warning to ensure that it’s *impossible* to
> write a handler that totally suppresses production exceptions from the log.
> As such, the default continue handler just returns the continue value. I
> can add details about those semantics to the KIP as well.
> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> One more comment.
>>
>> You mention a default implementation for the handler that fails. I
>> think, this should be part of the public API and thus should have a
>> proper defined named that is mentioned in the KIP.
>>
>> We could also add a second implementation for the log-and-move-on
>> strategy, as both are the two most common cases. This class should also
>> be part of public API (so users can just set in the config) with a
>> proper name.
>>
>>
>> Otherwise, I like the KIP a lot! Thanks.
>>
>>
>> -Matthias
>>
>>
>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>> > Thanks for the heads up. Yes, I think my changes are compatible with
>> that
>> > PR, but there will be a merge conflict that happens whenever one of the
>> PRs
>> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
>> first. :)
>> >
>> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wangg...@gmail.com>
>> wrote:
>> >
>> >> That sounds reasonable, thanks Matt.
>> >>
>> >> As for the implementation, please note that there is another ongoing PR
>> >> that may touch the same classes that you are working on:
>> >> https://github.com/apache/kafka/pull/4148
>> >>
>> >> So it may help if you can also take a look at that PR and see if it is
>> >> compatible with your changes.
>> >>
>> >>
>> >>
>> >> Guozhang
>> >>
>> >>
>> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <m...@frmr.me> wrote:
>> >>
>> >>> I've opened this pull request to implement the KIP as currently
>> written:
>> >>> https://github.com/apache/kafka/pull/4165. It still needs some tests
>> >>> added,
>> >>> but largely represents the shape I was going for.
>> >>>
>> >>> If there are more points that folks would like to discuss, please let
>> me
>> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
>> start
>> >> a
>> >>> [VOTE] thread.
>> >>>
>> >>> Thanks,
>> >>> Matt
>> >>>
>> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
>> >>>
>> >>>> I can’t think of a reason that would be problematic.
>> >>>>
>> >>>> Most of the time I would write a handler like this, I either want to
>> >>>> ignore the error or fail and bring everything down so that I can spin
>> >> it
>> >>>> back up later and resume from earlier offsets. When we start up after
>> >>>> crashing we’ll eventually try to process the message we failed to
>> >> produce
>> >>>> again.
>> >>>>
>> >>>> I’m concerned that “putting in a queue for later” opens you up to
>> >> putting
>> >>>> messages into the destination topic in an unexpected order. However
>> if
>> >>>> others feel differently, I’m happy to talk about it.
>> >>>>
>> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wangg...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>>> Please correct me if I'm wrong, but my understanding is that the
>> >>> record
>> >>>>>> metadata is always null if an exception occurred while trying to
>> >>>>> produce.
>> >>>>>
>> >>>>> That is right. Thanks.
>> >>>>>
>> >>>>> I looked at the example code, an

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-04 Thread Matt Farmer
Ah. I actually created both of those in the PR and forgot to mention them
by name in the KIP! Thanks for pointing out the oversight.

I’ll revise the KIP this afternoon accordingly.

The logging is actually provided for in the record collector. Whenever a
handler continues it’ll log a warning to ensure that it’s *impossible* to
write a handler that totally suppresses production exceptions from the log.
As such, the default continue handler just returns the continue value. I
can add details about those semantics to the KIP as well.
On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> One more comment.
>
> You mention a default implementation for the handler that fails. I
> think, this should be part of the public API and thus should have a
> proper defined named that is mentioned in the KIP.
>
> We could also add a second implementation for the log-and-move-on
> strategy, as both are the two most common cases. This class should also
> be part of public API (so users can just set in the config) with a
> proper name.
>
>
> Otherwise, I like the KIP a lot! Thanks.
>
>
> -Matthias
>
>
> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > Thanks for the heads up. Yes, I think my changes are compatible with that
> > PR, but there will be a merge conflict that happens whenever one of the
> PRs
> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> first. :)
> >
> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> >> That sounds reasonable, thanks Matt.
> >>
> >> As for the implementation, please note that there is another ongoing PR
> >> that may touch the same classes that you are working on:
> >> https://github.com/apache/kafka/pull/4148
> >>
> >> So it may help if you can also take a look at that PR and see if it is
> >> compatible with your changes.
> >>
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <m...@frmr.me> wrote:
> >>
> >>> I've opened this pull request to implement the KIP as currently
> written:
> >>> https://github.com/apache/kafka/pull/4165. It still needs some tests
> >>> added,
> >>> but largely represents the shape I was going for.
> >>>
> >>> If there are more points that folks would like to discuss, please let
> me
> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
> start
> >> a
> >>> [VOTE] thread.
> >>>
> >>> Thanks,
> >>> Matt
> >>>
> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
> >>>
> >>>> I can’t think of a reason that would be problematic.
> >>>>
> >>>> Most of the time I would write a handler like this, I either want to
> >>>> ignore the error or fail and bring everything down so that I can spin
> >> it
> >>>> back up later and resume from earlier offsets. When we start up after
> >>>> crashing we’ll eventually try to process the message we failed to
> >> produce
> >>>> again.
> >>>>
> >>>> I’m concerned that “putting in a queue for later” opens you up to
> >> putting
> >>>> messages into the destination topic in an unexpected order. However if
> >>>> others feel differently, I’m happy to talk about it.
> >>>>
> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>>
> >>>>>> Please correct me if I'm wrong, but my understanding is that the
> >>> record
> >>>>>> metadata is always null if an exception occurred while trying to
> >>>>> produce.
> >>>>>
> >>>>> That is right. Thanks.
> >>>>>
> >>>>> I looked at the example code, and one thing I realized that since we
> >> are
> >>>>> not passing the context in the handle function, we may not be
> >> implement
> >>>>> the
> >>>>> logic to send the fail records into another queue for future
> >> processing.
> >>>>> Would people think that would be a big issue?
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <m...@frmr.me> wrote:
> >>&g

Re: [VOTE] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-04 Thread Matt Farmer
Bump! I believe I need two more binding +1's to proceed?

On Thu, Nov 2, 2017 at 11:49 AM Ted Yu <yuzhih...@gmail.com> wrote:

> +1
>
> On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > +1 (binding) from me. Thanks!
> >
> > On Wed, Nov 1, 2017 at 4:50 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > The vote should stay open for at least 72 hours. The bylaws can be
> found
> > > here https://cwiki.apache.org/confluence/display/KAFKA/Bylaws
> > >
> > > On Wed, Nov 1, 2017 at 8:09 AM, Matt Farmer <m...@frmr.me> wrote:
> > >
> > >> Hello all,
> > >>
> > >> It seems like discussion around KIP-210 has gone to a lull. I've got
> > some
> > >> candidate work underway for it already, so I'd like to go ahead and
> call
> > >> it
> > >> to a vote.
> > >>
> > >> For reference, the KIP can be found here:
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+
> > >> Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce
> > >>
> > >> Also, how long to vote threads stay open generally before changing the
> > >> status of the KIP?
> > >>
> > >> Cheers,
> > >> Matt
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] KIP-220: Add AdminClient into Kafka Streams' ClientSupplier

2017-11-03 Thread Matt Farmer
This seems like an A+ improvement to me.

On Fri, Nov 3, 2017 at 7:49 PM Guozhang Wang  wrote:

> Hello folks,
>
> I have filed a new KIP on adding AdminClient into Streams for internal
> topic management.
>
> Looking for feedback on
>
> *
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier
> >*
>
> --
> -- Guozhang
>


[VOTE] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-11-01 Thread Matt Farmer
Hello all,

It seems like discussion around KIP-210 has gone to a lull. I've got some
candidate work underway for it already, so I'd like to go ahead and call it
to a vote.

For reference, the KIP can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce

Also, how long to vote threads stay open generally before changing the
status of the KIP?

Cheers,
Matt


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-31 Thread Matt Farmer
Thanks for the heads up. Yes, I think my changes are compatible with that
PR, but there will be a merge conflict that happens whenever one of the PRs
is merged. Happy to reconcile the changes in my PR if 4148 goes in first. :)

On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wangg...@gmail.com> wrote:

> That sounds reasonable, thanks Matt.
>
> As for the implementation, please note that there is another ongoing PR
> that may touch the same classes that you are working on:
> https://github.com/apache/kafka/pull/4148
>
> So it may help if you can also take a look at that PR and see if it is
> compatible with your changes.
>
>
>
> Guozhang
>
>
> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <m...@frmr.me> wrote:
>
> > I've opened this pull request to implement the KIP as currently written:
> > https://github.com/apache/kafka/pull/4165. It still needs some tests
> > added,
> > but largely represents the shape I was going for.
> >
> > If there are more points that folks would like to discuss, please let me
> > know. If I don't hear anything by tomorrow afternoon I'll probably start
> a
> > [VOTE] thread.
> >
> > Thanks,
> > Matt
> >
> > On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
> >
> > > I can’t think of a reason that would be problematic.
> > >
> > > Most of the time I would write a handler like this, I either want to
> > > ignore the error or fail and bring everything down so that I can spin
> it
> > > back up later and resume from earlier offsets. When we start up after
> > > crashing we’ll eventually try to process the message we failed to
> produce
> > > again.
> > >
> > > I’m concerned that “putting in a queue for later” opens you up to
> putting
> > > messages into the destination topic in an unexpected order. However if
> > > others feel differently, I’m happy to talk about it.
> > >
> > > On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > >> > Please correct me if I'm wrong, but my understanding is that the
> > record
> > >> > metadata is always null if an exception occurred while trying to
> > >> produce.
> > >>
> > >> That is right. Thanks.
> > >>
> > >> I looked at the example code, and one thing I realized that since we
> are
> > >> not passing the context in the handle function, we may not be
> implement
> > >> the
> > >> logic to send the fail records into another queue for future
> processing.
> > >> Would people think that would be a big issue?
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <m...@frmr.me> wrote:
> > >>
> > >> > Hello all,
> > >> >
> > >> > I've updated the KIP based on this conversation, and made it so that
> > its
> > >> > interface, config setting, and parameters line up more closely with
> > the
> > >> > interface in KIP-161 (deserialization handler).
> > >> >
> > >> > I believe there are a few specific questions I need to reply to.
> > >> >
> > >> > > The question I had about then handle parameters are around the
> > record,
> > >> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > >> > > `ProducerRecord` or `ProducerRecord > >> extends
> > >> > > Object, ? extends Object>`?
> > >> >
> > >> > At this point in the code we're guaranteed that this is a
> > >> > ProducerRecord<byte[], byte[]>, so the generics would just make it
> > >> harder
> > >> > to work with the key and value.
> > >> >
> > >> > > Also, should the handle function include the `RecordMetadata` as
> > well
> > >> in
> > >> > > case it is not null?
> > >> >
> > >> > Please correct me if I'm wrong, but my understanding is that the
> > record
> > >> > metadata is always null if an exception occurred while trying to
> > >> produce.
> > >> >
> > >> > > We may probably try to write down at least the following handling
> > >> logic
> > >> > and
> > >> > > see if the given API is sufficient for it
> > >> >
> > >> >

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-31 Thread Matt Farmer
I've opened this pull request to implement the KIP as currently written:
https://github.com/apache/kafka/pull/4165. It still needs some tests added,
but largely represents the shape I was going for.

If there are more points that folks would like to discuss, please let me
know. If I don't hear anything by tomorrow afternoon I'll probably start a
[VOTE] thread.

Thanks,
Matt

On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:

> I can’t think of a reason that would be problematic.
>
> Most of the time I would write a handler like this, I either want to
> ignore the error or fail and bring everything down so that I can spin it
> back up later and resume from earlier offsets. When we start up after
> crashing we’ll eventually try to process the message we failed to produce
> again.
>
> I’m concerned that “putting in a queue for later” opens you up to putting
> messages into the destination topic in an unexpected order. However if
> others feel differently, I’m happy to talk about it.
>
> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
>> > Please correct me if I'm wrong, but my understanding is that the record
>> > metadata is always null if an exception occurred while trying to
>> produce.
>>
>> That is right. Thanks.
>>
>> I looked at the example code, and one thing I realized that since we are
>> not passing the context in the handle function, we may not be implement
>> the
>> logic to send the fail records into another queue for future processing.
>> Would people think that would be a big issue?
>>
>>
>> Guozhang
>>
>>
>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <m...@frmr.me> wrote:
>>
>> > Hello all,
>> >
>> > I've updated the KIP based on this conversation, and made it so that its
>> > interface, config setting, and parameters line up more closely with the
>> > interface in KIP-161 (deserialization handler).
>> >
>> > I believe there are a few specific questions I need to reply to.
>> >
>> > > The question I had about then handle parameters are around the record,
>> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> > > `ProducerRecord` or `ProducerRecord> extends
>> > > Object, ? extends Object>`?
>> >
>> > At this point in the code we're guaranteed that this is a
>> > ProducerRecord<byte[], byte[]>, so the generics would just make it
>> harder
>> > to work with the key and value.
>> >
>> > > Also, should the handle function include the `RecordMetadata` as well
>> in
>> > > case it is not null?
>> >
>> > Please correct me if I'm wrong, but my understanding is that the record
>> > metadata is always null if an exception occurred while trying to
>> produce.
>> >
>> > > We may probably try to write down at least the following handling
>> logic
>> > and
>> > > see if the given API is sufficient for it
>> >
>> > I've added some examples to the KIP. Let me know what you think.
>> >
>> > Cheers,
>> > Matt
>> >
>> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <m...@frmr.me> wrote:
>> >
>> > > Thanks for this feedback. I’m at a conference right now and am
>> planning
>> > on
>> > > updating the KIP again with details from this conversation later this
>> > week.
>> > >
>> > > I’ll shoot you a more detailed response then! :)
>> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangg...@gmail.com>
>> > wrote:
>> > >
>> > >> Thanks for the KIP Matt.
>> > >>
>> > >> Regarding the handle interface of ProductionExceptionHandlerResponse,
>> > >> could
>> > >> you write it on the wiki also, along with the actual added config
>> names
>> > >> (e.g. what
>> > >>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> > deserialization+exception+handlers
>> > >> described).
>> > >>
>> > >> The question I had about then handle parameters are around the
>> record,
>> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> > >> `ProducerRecord` or `ProducerRecord> extends
>> > >> Object, ? extends Object>`?
>> > >>
>> > >> Also, should the handle function include the `RecordMetadata`

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-27 Thread Matt Farmer
I can’t think of a reason that would be problematic.

Most of the time I would write a handler like this, I either want to ignore
the error or fail and bring everything down so that I can spin it back up
later and resume from earlier offsets. When we start up after crashing
we’ll eventually try to process the message we failed to produce again.

I’m concerned that “putting in a queue for later” opens you up to putting
messages into the destination topic in an unexpected order. However if
others feel differently, I’m happy to talk about it.

On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wangg...@gmail.com> wrote:

> > Please correct me if I'm wrong, but my understanding is that the record
> > metadata is always null if an exception occurred while trying to produce.
>
> That is right. Thanks.
>
> I looked at the example code, and one thing I realized that since we are
> not passing the context in the handle function, we may not be implement the
> logic to send the fail records into another queue for future processing.
> Would people think that would be a big issue?
>
>
> Guozhang
>
>
> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hello all,
> >
> > I've updated the KIP based on this conversation, and made it so that its
> > interface, config setting, and parameters line up more closely with the
> > interface in KIP-161 (deserialization handler).
> >
> > I believe there are a few specific questions I need to reply to.
> >
> > > The question I had about then handle parameters are around the record,
> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > > `ProducerRecord` or `ProducerRecord > > Object, ? extends Object>`?
> >
> > At this point in the code we're guaranteed that this is a
> > ProducerRecord<byte[], byte[]>, so the generics would just make it harder
> > to work with the key and value.
> >
> > > Also, should the handle function include the `RecordMetadata` as well
> in
> > > case it is not null?
> >
> > Please correct me if I'm wrong, but my understanding is that the record
> > metadata is always null if an exception occurred while trying to produce.
> >
> > > We may probably try to write down at least the following handling logic
> > and
> > > see if the given API is sufficient for it
> >
> > I've added some examples to the KIP. Let me know what you think.
> >
> > Cheers,
> > Matt
> >
> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <m...@frmr.me> wrote:
> >
> > > Thanks for this feedback. I’m at a conference right now and am planning
> > on
> > > updating the KIP again with details from this conversation later this
> > week.
> > >
> > > I’ll shoot you a more detailed response then! :)
> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > >> Thanks for the KIP Matt.
> > >>
> > >> Regarding the handle interface of ProductionExceptionHandlerResponse,
> > >> could
> > >> you write it on the wiki also, along with the actual added config
> names
> > >> (e.g. what
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > deserialization+exception+handlers
> > >> described).
> > >>
> > >> The question I had about then handle parameters are around the record,
> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > >> `ProducerRecord` or `ProducerRecord extends
> > >> Object, ? extends Object>`?
> > >>
> > >> Also, should the handle function include the `RecordMetadata` as well
> in
> > >> case it is not null?
> > >>
> > >> We may probably try to write down at least the following handling
> logic
> > >> and
> > >> see if the given API is sufficient for it: 1) throw exception
> > immediately
> > >> to fail fast and stop the world, 2) log the error and drop record and
> > >> proceed silently, 3) send such errors to a specific "error" Kafka
> topic,
> > >> or
> > >> record it as an app-level metrics (
> > >> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
> > >> monitoring.
> > >>
> > >> Guozhang
> > >>
> > >>
> > >>
> > >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <m...@frmr.me> wrote:
> > >>
> > 

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-26 Thread Matt Farmer
Hello all,

I've updated the KIP based on this conversation, and made it so that its
interface, config setting, and parameters line up more closely with the
interface in KIP-161 (deserialization handler).

I believe there are a few specific questions I need to reply to.

> The question I had about then handle parameters are around the record,
> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> `ProducerRecord` or `ProducerRecord Object, ? extends Object>`?

At this point in the code we're guaranteed that this is a
ProducerRecord<byte[], byte[]>, so the generics would just make it harder
to work with the key and value.

> Also, should the handle function include the `RecordMetadata` as well in
> case it is not null?

Please correct me if I'm wrong, but my understanding is that the record
metadata is always null if an exception occurred while trying to produce.

> We may probably try to write down at least the following handling logic
and
> see if the given API is sufficient for it

I've added some examples to the KIP. Let me know what you think.

Cheers,
Matt

On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <m...@frmr.me> wrote:

> Thanks for this feedback. I’m at a conference right now and am planning on
> updating the KIP again with details from this conversation later this week.
>
> I’ll shoot you a more detailed response then! :)
> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
>> Thanks for the KIP Matt.
>>
>> Regarding the handle interface of ProductionExceptionHandlerResponse,
>> could
>> you write it on the wiki also, along with the actual added config names
>> (e.g. what
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>> described).
>>
>> The question I had about then handle parameters are around the record,
>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> `ProducerRecord` or `ProducerRecord> Object, ? extends Object>`?
>>
>> Also, should the handle function include the `RecordMetadata` as well in
>> case it is not null?
>>
>> We may probably try to write down at least the following handling logic
>> and
>> see if the given API is sufficient for it: 1) throw exception immediately
>> to fail fast and stop the world, 2) log the error and drop record and
>> proceed silently, 3) send such errors to a specific "error" Kafka topic,
>> or
>> record it as an app-level metrics (
>> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
>> monitoring.
>>
>> Guozhang
>>
>>
>>
>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <m...@frmr.me> wrote:
>>
>> > I did some more digging tonight.
>> >
>> > @Ted: It looks like the deserialization handler uses
>> > "default.deserialization.exception.handler" for the config name. No
>> > ".class" on the end. I'm inclined to think this should use
>> > "default.production.exception.handler".
>> >
>> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <m...@frmr.me> wrote:
>> >
>> > > Okay, I've dug into this a little bit.
>> > >
>> > > I think getting access to the serialized record is possible, and
>> changing
>> > > the naming and return type is certainly doable. However, because we're
>> > > hooking into the onCompletion callback we have no guarantee that the
>> > > ProcessorContext state hasn't changed by the time this particular
>> handler
>> > > runs. So I think the signature would change to something like:
>> > >
>> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
>> > record,
>> > > final Exception exception)
>> > >
>> > > Would this be acceptable?
>> > >
>> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
>> > >
>> > >> Ah good idea. Hmmm. I can line up the naming and return type but I’m
>> not
>> > >> sure if I can get my hands on the context and the record itself
>> without
>> > >> other changes.
>> > >>
>> > >> Let me dig in and follow up here tomorrow.
>> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> matth...@confluent.io>
>> > >> wrote:
>> > >>
>> > >>> Thanks for the KIP.
>> > >>>
>> > >>> Are you familiar with KIP-161?
>> > >>>
>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-23 Thread Matt Farmer
Thanks for this feedback. I’m at a conference right now and am planning on
updating the KIP again with details from this conversation later this week.

I’ll shoot you a more detailed response then! :)
On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks for the KIP Matt.
>
> Regarding the handle interface of ProductionExceptionHandlerResponse, could
> you write it on the wiki also, along with the actual added config names
> (e.g. what
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
> described).
>
> The question I had about then handle parameters are around the record,
> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> `ProducerRecord` or `ProducerRecord Object, ? extends Object>`?
>
> Also, should the handle function include the `RecordMetadata` as well in
> case it is not null?
>
> We may probably try to write down at least the following handling logic and
> see if the given API is sufficient for it: 1) throw exception immediately
> to fail fast and stop the world, 2) log the error and drop record and
> proceed silently, 3) send such errors to a specific "error" Kafka topic, or
> record it as an app-level metrics (
> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
> monitoring.
>
> Guozhang
>
>
>
> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > I did some more digging tonight.
> >
> > @Ted: It looks like the deserialization handler uses
> > "default.deserialization.exception.handler" for the config name. No
> > ".class" on the end. I'm inclined to think this should use
> > "default.production.exception.handler".
> >
> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <m...@frmr.me> wrote:
> >
> > > Okay, I've dug into this a little bit.
> > >
> > > I think getting access to the serialized record is possible, and
> changing
> > > the naming and return type is certainly doable. However, because we're
> > > hooking into the onCompletion callback we have no guarantee that the
> > > ProcessorContext state hasn't changed by the time this particular
> handler
> > > runs. So I think the signature would change to something like:
> > >
> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
> > record,
> > > final Exception exception)
> > >
> > > Would this be acceptable?
> > >
> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
> > >
> > >> Ah good idea. Hmmm. I can line up the naming and return type but I’m
> not
> > >> sure if I can get my hands on the context and the record itself
> without
> > >> other changes.
> > >>
> > >> Let me dig in and follow up here tomorrow.
> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> matth...@confluent.io>
> > >> wrote:
> > >>
> > >>> Thanks for the KIP.
> > >>>
> > >>> Are you familiar with KIP-161?
> > >>>
> > >>>
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > deserialization+exception+handlers
> > >>>
> > >>> I thinks, we should align the design (parameter naming, return types,
> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> > experience.
> > >>>
> > >>>
> > >>>
> > >>> -Matthias
> > >>>
> > >>>
> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >>> > I’ll create the JIRA ticket.
> > >>> >
> > >>> > I think that config name will work. I’ll update the KIP
> accordingly.
> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yuzhih...@gmail.com>
> wrote:
> > >>> >
> > >>> >> Can you create JIRA that corresponds to the KIP ?
> > >>> >>
> > >>> >> For the new config, how about naming it
> > >>> >> production.exception.processor.class
> > >>> >> ? This way it is clear that class name should be specified.
> > >>> >>
> > >>> >> Cheers
> > >>> >>
> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <m...@frmr.me>
> wrote:
> > >>> >>
> > >>> >>> Hello everyone,
> > >>> >>>
> > >>> >>> This is the discussion thread for the KIP that I just filed here:
> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> > >>> >>> Streams+fails+to+produce
> > >>> >>>
> > >>> >>> Looking forward to getting some feedback from folks about this
> idea
> > >>> and
> > >>> >>> working toward a solution we can contribute back. :)
> > >>> >>>
> > >>> >>> Cheers,
> > >>> >>> Matt Farmer
> > >>> >>>
> > >>> >>
> > >>> >
> > >>>
> > >>>
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-20 Thread Matt Farmer
I did some more digging tonight.

@Ted: It looks like the deserialization handler uses
"default.deserialization.exception.handler" for the config name. No
".class" on the end. I'm inclined to think this should use
"default.production.exception.handler".

On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <m...@frmr.me> wrote:

> Okay, I've dug into this a little bit.
>
> I think getting access to the serialized record is possible, and changing
> the naming and return type is certainly doable. However, because we're
> hooking into the onCompletion callback we have no guarantee that the
> ProcessorContext state hasn't changed by the time this particular handler
> runs. So I think the signature would change to something like:
>
> ProductionExceptionHandlerResponse handle(final ProducerRecord<..> record,
> final Exception exception)
>
> Would this be acceptable?
>
> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:
>
>> Ah good idea. Hmmm. I can line up the naming and return type but I’m not
>> sure if I can get my hands on the context and the record itself without
>> other changes.
>>
>> Let me dig in and follow up here tomorrow.
>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Thanks for the KIP.
>>>
>>> Are you familiar with KIP-161?
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>>>
>>> I thinks, we should align the design (parameter naming, return types,
>>> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>> > I’ll create the JIRA ticket.
>>> >
>>> > I think that config name will work. I’ll update the KIP accordingly.
>>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yuzhih...@gmail.com> wrote:
>>> >
>>> >> Can you create JIRA that corresponds to the KIP ?
>>> >>
>>> >> For the new config, how about naming it
>>> >> production.exception.processor.class
>>> >> ? This way it is clear that class name should be specified.
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <m...@frmr.me> wrote:
>>> >>
>>> >>> Hello everyone,
>>> >>>
>>> >>> This is the discussion thread for the KIP that I just filed here:
>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>> >>> Streams+fails+to+produce
>>> >>>
>>> >>> Looking forward to getting some feedback from folks about this idea
>>> and
>>> >>> working toward a solution we can contribute back. :)
>>> >>>
>>> >>> Cheers,
>>> >>> Matt Farmer
>>> >>>
>>> >>
>>> >
>>>
>>>


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-20 Thread Matt Farmer
Okay, I've dug into this a little bit.

I think getting access to the serialized record is possible, and changing
the naming and return type is certainly doable. However, because we're
hooking into the onCompletion callback we have no guarantee that the
ProcessorContext state hasn't changed by the time this particular handler
runs. So I think the signature would change to something like:

ProductionExceptionHandlerResponse handle(final ProducerRecord<..> record,
final Exception exception)

Would this be acceptable?

On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <m...@frmr.me> wrote:

> Ah good idea. Hmmm. I can line up the naming and return type but I’m not
> sure if I can get my hands on the context and the record itself without
> other changes.
>
> Let me dig in and follow up here tomorrow.
> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> Thanks for the KIP.
>>
>> Are you familiar with KIP-161?
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>>
>> I thinks, we should align the design (parameter naming, return types,
>> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>>
>>
>>
>> -Matthias
>>
>>
>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> > I’ll create the JIRA ticket.
>> >
>> > I think that config name will work. I’ll update the KIP accordingly.
>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yuzhih...@gmail.com> wrote:
>> >
>> >> Can you create JIRA that corresponds to the KIP ?
>> >>
>> >> For the new config, how about naming it
>> >> production.exception.processor.class
>> >> ? This way it is clear that class name should be specified.
>> >>
>> >> Cheers
>> >>
>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <m...@frmr.me> wrote:
>> >>
>> >>> Hello everyone,
>> >>>
>> >>> This is the discussion thread for the KIP that I just filed here:
>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>> >>> Streams+fails+to+produce
>> >>>
>> >>> Looking forward to getting some feedback from folks about this idea
>> and
>> >>> working toward a solution we can contribute back. :)
>> >>>
>> >>> Cheers,
>> >>> Matt Farmer
>> >>>
>> >>
>> >
>>
>>


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-19 Thread Matt Farmer
Ah good idea. Hmmm. I can line up the naming and return type but I’m not
sure if I can get my hands on the context and the record itself without
other changes.

Let me dig in and follow up here tomorrow.
On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks for the KIP.
>
> Are you familiar with KIP-161?
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>
> I thinks, we should align the design (parameter naming, return types,
> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>
>
>
> -Matthias
>
>
> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > I’ll create the JIRA ticket.
> >
> > I think that config name will work. I’ll update the KIP accordingly.
> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >
> >> Can you create JIRA that corresponds to the KIP ?
> >>
> >> For the new config, how about naming it
> >> production.exception.processor.class
> >> ? This way it is clear that class name should be specified.
> >>
> >> Cheers
> >>
> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <m...@frmr.me> wrote:
> >>
> >>> Hello everyone,
> >>>
> >>> This is the discussion thread for the KIP that I just filed here:
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >>> Streams+fails+to+produce
> >>>
> >>> Looking forward to getting some feedback from folks about this idea and
> >>> working toward a solution we can contribute back. :)
> >>>
> >>> Cheers,
> >>> Matt Farmer
> >>>
> >>
> >
>
>


[jira] [Created] (KAFKA-6086) KIP-210 Provide for custom error handling when Kafka Streams fails to produce

2017-10-18 Thread Matt Farmer (JIRA)
Matt Farmer created KAFKA-6086:
--

 Summary: KIP-210 Provide for custom error handling when Kafka 
Streams fails to produce
 Key: KAFKA-6086
 URL: https://issues.apache.org/jira/browse/KAFKA-6086
 Project: Kafka
  Issue Type: Improvement
Reporter: Matt Farmer


This is an issue related to the following KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-18 Thread Matt Farmer
I’ll create the JIRA ticket.

I think that config name will work. I’ll update the KIP accordingly.
On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yuzhih...@gmail.com> wrote:

> Can you create JIRA that corresponds to the KIP ?
>
> For the new config, how about naming it
> production.exception.processor.class
> ? This way it is clear that class name should be specified.
>
> Cheers
>
> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > Hello everyone,
> >
> > This is the discussion thread for the KIP that I just filed here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 210+-+Provide+for+custom+error+handling++when+Kafka+
> > Streams+fails+to+produce
> >
> > Looking forward to getting some feedback from folks about this idea and
> > working toward a solution we can contribute back. :)
> >
> > Cheers,
> > Matt Farmer
> >
>


[DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

2017-10-18 Thread Matt Farmer
Hello everyone,

This is the discussion thread for the KIP that I just filed here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce

Looking forward to getting some feedback from folks about this idea and
working toward a solution we can contribute back. :)

Cheers,
Matt Farmer


KIP Access

2017-10-18 Thread Matt Farmer
Hey everyone,

I'm a software engineer at MailChimp working on our Data Systems team. I'm
looking to file a KIP to improve the error handling hooks that Kafka
Streams exposes when producing messages. We've got a use case internally
that requires us to be able to ignore certain classes of errors (in this
case RecordTooLargeException), log some details about the message, and
carry on processing other messages. We developed some changes to allow this
internally, and would like to kick off the process of contributing back a
similar change upstream. (I kind of expect what we contribute back to have
a bit of a different shape than what we built internally.)

It looks like filing a KIP is the correct next step here, but it looks like
I need some additional permissions on Confluence. What's the process for
getting those permissions? My Confluence username is farmdawgnation, if
that helps.

Thanks,
Matt


[jira] [Resolved] (KAFKA-5207) Addition of a way to manually revoke individual partitions from a consumer

2017-05-09 Thread Matt Farmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Farmer resolved KAFKA-5207.

Resolution: Fixed

Heh, it was pointed out to me that assignment is an overwrite operation, not a 
cumulative operation.

> Addition of a way to manually revoke individual partitions from a consumer
> --
>
> Key: KAFKA-5207
> URL: https://issues.apache.org/jira/browse/KAFKA-5207
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>    Reporter: Matt Farmer
>
> The {{Consumer.assign}} call permits us to manually assign topic and 
> partition pairs to a consumer. Unfortunately, there is no equivalent that 
> allows us to manually _revoke_ those partitions from the consumer. I'd like 
> to propose the addition of a {{Consumer.revoke}} that also takes a 
> {{Collection}} that will cause the Consumer to stop seeing 
> messages from that topic and partition pair in a manually assigned scenario.
> For some wider context on this request: I'm working on a problem where I need 
> two Consumers to consume from different topics on different threads. But the 
> topics are partitioned identically, and so if Consumer A is subscribed to 
> partition 3, Consumer B also needs to be subscribed to partition 3.
> The addition of this API would permit me to let Consumer A's partition 
> assignments be managed by Kafka, and allow me to use a 
> {{ConsumerRebalanceListener}} to tweak Consumer B's configuration when 
> there's a rebalance event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5207) Addition of a way to manually revoke individual partitions from a consumer

2017-05-09 Thread Matt Farmer (JIRA)
Matt Farmer created KAFKA-5207:
--

 Summary: Addition of a way to manually revoke individual 
partitions from a consumer
 Key: KAFKA-5207
 URL: https://issues.apache.org/jira/browse/KAFKA-5207
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Reporter: Matt Farmer


The {{Consumer.assign}} call permits us to manually assign topic and partition 
pairs to a consumer. Unfortunately, there is no equivalent that allows us to 
manually _revoke_ those partitions from the consumer. I'd like to propose the 
addition of a {{Consumer.revoke}} that also takes a 
{{Collection}} that will cause the Consumer to stop seeing 
messages from that topic and partition pair in a manually assigned scenario.

For some wider context on this request: I'm working on a problem where I need 
two Consumers to consume from different topics on different threads. But the 
topics are partitioned identically, and so if Consumer A is subscribed to 
partition 3, Consumer B also needs to be subscribed to partition 3.

The addition of this API would permit me to let Consumer A's partition 
assignments be managed by Kafka, and allow me to use a 
{{ConsumerRebalanceListener}} to tweak Consumer B's configuration when there's 
a rebalance event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)