Re: Permission to contribute to Apache Kafka

2024-05-22 Thread Yash Mayya
Hi Frédérik,

I've granted you the necessary permissions. Let me know if something
doesn't work as expected.

Cheers,
Yash

On Wed, May 22, 2024 at 1:38 PM Frédérik Rouleau
 wrote:

> Hi,
> As I now have my wiki Id: frouleau and my Jira Id: fred-ro, can I have the
> permission to contribute to KIP ?
>
> Regards,
>


Re: [ANNOUNCE] New committer: Igor Soarez

2024-04-24 Thread Yash Mayya
Congratulations Igor!

On Wed, 24 Apr, 2024, 23:36 Colin McCabe,  wrote:

> Hi all,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer, Igor
> Soarez.
>
> Igor has been a Kafka contributor since 2019. In addition to being a
> regular contributor and reviewer, he has made significant contributions to
> improving Kafka's JBOD support in KRaft mode. He has also contributed to
> discussing and reviewing many KIPs such as KIP-690, KIP-554, KIP-866, and
> KIP-938.
>
> Congratulations, Igor!
>
> Thanks,
>
> Colin (on behalf of the Apache Kafka PMC)
>


Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris

2024-04-13 Thread Yash Mayya
Congrats Greg!

On Sun, 14 Apr, 2024, 05:56 Randall Hauch,  wrote:

> Congratulations, Greg!
>
> On Sat, Apr 13, 2024 at 6:36 PM Luke Chen  wrote:
>
> > Congrats, Greg!
> >
> > On Sun, Apr 14, 2024 at 7:05 AM Viktor Somogyi-Vass
> >  wrote:
> >
> > > Congrats Greg! :)
> > >
> > > On Sun, Apr 14, 2024, 00:35 Bill Bejeck  wrote:
> > >
> > > > Congrats Greg!
> > > >
> > > > -Bill
> > > >
> > > > On Sat, Apr 13, 2024 at 4:25 PM Boudjelda Mohamed Said <
> > > bmsc...@gmail.com>
> > > > wrote:
> > > >
> > > > > Congratulations Greg
> > > > >
> > > > > On Sat 13 Apr 2024 at 20:42, Chris Egerton 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Greg Harris has been a Kafka committer since July 2023. He has
> > > remained
> > > > > > very active and instructive in the community since becoming a
> > > > committer.
> > > > > > It's my pleasure to announce that Greg is now a member of Kafka
> > PMC.
> > > > > >
> > > > > > Congratulations, Greg!
> > > > > >
> > > > > > Chris, on behalf of the Apache Kafka PMC
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-477: Add PATCH method for connector config in Connect REST API

2024-04-12 Thread Yash Mayya
Hi Ivan,

Thanks for reviving this KIP, I think it will be a useful addition to
Connect!

+1 (binding)

Cheers,
Yash

On Tue, Apr 9, 2024 at 4:23 AM Knowles Atchison Jr 
wrote:

> +1 (non binding)
>
> On Mon, Apr 8, 2024, 3:30 PM Chris Egerton 
> wrote:
>
> > Thanks Ivan! +1 (binding) from me.
> >
> > On Mon, Apr 8, 2024, 06:59 Ivan Yurchenko  wrote:
> >
> > > Hello!
> > >
> > > I'd like to put the subj KIP[1] to a vote. Thank you.
> > >
> > > Best regards,
> > > Ivan
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API
> > >
> >
>


Re: Permission to assign tickets in Jira

2024-03-27 Thread Yash Mayya
Hi Pavel,

I've granted you the necessary permissions. Thanks for your interest in
contributing to the Apache Kafka project!

Cheers,
Yash

On Tue, Mar 26, 2024 at 11:32 PM Pavel Pozdeev 
wrote:

>
> Hi Team,
>
> Would it be possible to get a permission to assign tickets in Jira?
> I've got a Jira account, but can only leave a comments on tickets, can not
> assign a ticket to myself.
> My Jira username: pasharik
>
> Best,
> Pavel Pozdeev
>


Re: [ANNOUNCE] New committer: Christo Lolov

2024-03-26 Thread Yash Mayya
Congratulations Christo!

On Tue, Mar 26, 2024 at 5:34 PM Luke Chen  wrote:

> Hi, Everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer:
> Christo Lolov.
>
> Christo has been a Kafka contributor since 2021. He has made over 50
> commits. He authored KIP-902, KIP-963, and KIP-1005, as well as many tiered
> storage related tasks. He also co-drives the migration from EasyMock to
> Mockito and from Junit 4 to JUnit 5.
>
> Congratulations, Christo!
>
> Thanks,
> Luke (on behalf of the Apache Kafka PMC)
>


Re: [VOTE] KIP-995: Allow users to specify initial offsets while creating connectors

2024-03-05 Thread Yash Mayya
Hi Chris,

I followed up with Ashwin offline and I believe he wanted to take a closer
look at the `ConnectorInfoWithInitialOffsetsResponse` stuff he mentioned in
the previous email and whether or not that'll be required (alternatively
using some Jackson JSON tricks). However, that's an implementation detail
and shouldn't hold up the KIP. Bikeshedding a little on the
"initial_offsets_response" field - I'm wondering if something like
"offsets_status" might be more appropriate, what do you think? I don't
think the current name is terrible though, so I'm +1 (binding) if everyone
else agrees that it's suitable.

Thanks,
Yash

On Tue, Mar 5, 2024 at 9:51 PM Chris Egerton 
wrote:

> Hi all,
>
> Wanted to bump this and see if it looks good enough for a third vote. Yash,
> any thoughts?
>
> Cheers,
>
> Chris
>
> On Mon, Jan 29, 2024 at 2:55 AM Ashwin 
> wrote:
>
> > Thanks for reviewing this KIP,  Yash.
> >
> > Could you please elaborate on the cleanup steps? For instance, if we
> > > encounter an error after wiping existing offsets but before writing the
> > new
> > > offsets, there's not really any good way to "revert" the wiped offsets.
> > > It's definitely extremely unlikely that a user would expect the
> previous
> > > offsets for a connector to still be present (by creating a new
> connector
> > > with the same name but without initial offsets for instance) after
> such a
> > > failed operation, but it would still be good to call this out
> > explicitly. I
> > > presume that we'd want to wipe the newly written initial offsets if we
> > fail
> > > while writing the connector's config however?
> >
> >
> > Agree - I have clarified the cleanup here -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-995%3A+Allow+users+to+specify+initial+offsets+while+creating+connectors#KIP995:Allowuserstospecifyinitialoffsetswhilecreatingconnectors-ProposedChanges
> > .
> >
> > The `PATCH /connectors/{connector}/offsets` and `DELETE
> > > /connectors/{connector}/offsets` endpoints have two possible success
> > > messages in the response depending on whether or not the connector
> plugin
> > > has implemented the `alterOffsets` connector method. Since we're
> > proposing
> > > to utilize the same offset validation during connector creation if
> > initial
> > > offsets are specified, I think it would be valuable to surface similar
> > > information to users here as well
> >
> >
> > Thanks for pointing this out. I have updated the response to include a
> new
> > field “initial_offsets_response” which will contain the response based on
> > whether the connector implements alterOffsets or not. This also means
> that
> > if initial_offsets is set in the ConnectorCreate request, we will return
> a
> > new REST entity (ConnectorInfoWithInitialOffsetsResponse ?) which will
> be a
> > child class of ConnectorInfo.
> >
> > (
> >
> >
> https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/entities/ConnectorInfo.java#L28-L28
> > )
> >
> > Thanks,
> > Ashwin
> >
> > On Wed, Jan 17, 2024 at 4:48 PM Yash Mayya  wrote:
> >
> > > Hi Ashwin,
> > >
> > > Thanks for the KIP.
> > >
> > > > If Connect runtime encounters an error in any of these steps,
> > > > it will cleanup (if required) and return an error response
> > >
> > > Could you please elaborate on the cleanup steps? For instance, if we
> > > encounter an error after wiping existing offsets but before writing the
> > new
> > > offsets, there's not really any good way to "revert" the wiped offsets.
> > > It's definitely extremely unlikely that a user would expect the
> previous
> > > offsets for a connector to still be present (by creating a new
> connector
> > > with the same name but without initial offsets for instance) after
> such a
> > > failed operation, but it would still be good to call this out
> > explicitly. I
> > > presume that we'd want to wipe the newly written initial offsets if we
> > fail
> > > while writing the connector's config however?
> > >
> > > > Validate the offset using the same checks performed while
> > > > altering connector offsets (PATCH /$connector/offsets ) as
> > > > specified in KIP-875
> > >
> > > The `PATCH /connectors/{connector}/offsets` and `DELETE
> > > /connectors/{connector}/offsets` endpoints h

Re: [VOTE] KIP-910: Update Source offsets for Source Connectors without producing records

2024-01-30 Thread Yash Mayya
Hi Sagar,

Thanks for the KIP and apologies for the extremely long delay here! I think
we could do with some wordsmithing on the Javadoc for the new
`SourceTask::updateOffsets` method but that can be taken care of in the PR.

+1 (binding)

Thanks,
Yash

On Wed, Nov 15, 2023 at 11:43 PM Sagar  wrote:

> Hey all,
>
> Bumping this vote thread again after quite a while.
>
> Thanks!
> Sagar.
>
> On Wed, Sep 6, 2023 at 3:58 PM Sagar  wrote:
>
> > Hi All,
> >
> > Based on the latest discussion thread, it appears as if all open
> questions
> > have been answered.
> >
> > Hopefully now we are in a state where we can close out on the Voting
> > process.
> >
> > Thanks everyone for the great feedback.
> >
> > Thanks!
> > Sagar.
> >
> > On Fri, Aug 18, 2023 at 9:00 AM Sagar  wrote:
> >
> >> Hi All,
> >>
> >> Bumping the voting thread again.
> >>
> >> Thanks!
> >> Sagar.
> >>
> >> On Wed, Aug 2, 2023 at 4:43 PM Sagar  wrote:
> >>
> >>> Attaching the KIP link for reference:
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-910%3A+Update+Source+offsets+for+Source+Connectors+without+producing+records
> >>>
> >>> Thanks!
> >>> Sagar.
> >>>
> >>> On Wed, Aug 2, 2023 at 4:37 PM Sagar 
> wrote:
> >>>
>  Hi All,
> 
>  Calling a Vote on KIP-910 [1]. I feel we have converged to a
> reasonable
>  design. Ofcourse I am open to any feedback/suggestions and would
> address
>  them.
> 
>  Thanks!
>  Sagar.
> 
> >>>
>


[jira] [Created] (KAFKA-16196) Cast transform doesn't handle invalid whole value casts gracefully

2024-01-25 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-16196:
--

 Summary: Cast transform doesn't handle invalid whole value casts 
gracefully
 Key: KAFKA-16196
 URL: https://issues.apache.org/jira/browse/KAFKA-16196
 Project: Kafka
  Issue Type: Bug
  Components: connect
Reporter: Yash Mayya
Assignee: Yash Mayya


The Cast transform currently doesn't handle invalid whole value casts 
gracefully. A whole value cast is configured like {{{"spec": "int8"}}} as 
opposed to a field level cast like {{{}{"spec": "field1:int8"{.

 

If an invalid field level cast is specified (for instance - {{{}{"spec": 
"field1:invalid"{), this results in a {{ConfigException}} being thrown here 
- 
[https://github.com/apache/kafka/blob/5f410ceb04878ca44d2d007655155b5303a47907/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java#L416]
 which is handled gracefully as a validation error here - 
[https://github.com/apache/kafka/blob/5f410ceb04878ca44d2d007655155b5303a47907/clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java#L605-L609]

 

However, invalid whole value casts aren't handled appropriately and result in 
an 
{{IllegalArgumentException}} being thrown, which surfaces as an uncaught 
exception and a {{500 Internal Server Error}} response from the connector 
create / update / config validation REST API endpoint.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-995: Allow users to specify initial offsets while creating connectors

2024-01-17 Thread Yash Mayya
Hi Ashwin,

Thanks for the KIP.

> If Connect runtime encounters an error in any of these steps,
> it will cleanup (if required) and return an error response

Could you please elaborate on the cleanup steps? For instance, if we
encounter an error after wiping existing offsets but before writing the new
offsets, there's not really any good way to "revert" the wiped offsets.
It's definitely extremely unlikely that a user would expect the previous
offsets for a connector to still be present (by creating a new connector
with the same name but without initial offsets for instance) after such a
failed operation, but it would still be good to call this out explicitly. I
presume that we'd want to wipe the newly written initial offsets if we fail
while writing the connector's config however?

> Validate the offset using the same checks performed while
> altering connector offsets (PATCH /$connector/offsets ) as
> specified in KIP-875

The `PATCH /connectors/{connector}/offsets` and `DELETE
/connectors/{connector}/offsets` endpoints have two possible success
messages in the response depending on whether or not the connector plugin
has implemented the `alterOffsets` connector method. Since we're proposing
to utilize the same offset validation during connector creation if initial
offsets are specified, I think it would be valuable to surface similar
information to users here as well. Thoughts?

Thanks,
Yash

On Wed, Jan 17, 2024 at 3:31 PM Ashwin  wrote:

> Hi All ,
>
> Can I please get one more binding vote, so that the KIP is approved ?
> Thanks for the votes Chris and Mickael !
>
>
> - Ashwin
>
>
> On Thu, Jan 11, 2024 at 3:55 PM Mickael Maison 
> wrote:
>
> > Hi Ashwin,
> >
> > +1 (binding), thanks for the KIP
> >
> > Mickael
> >
> > On Tue, Jan 9, 2024 at 4:54 PM Chris Egerton 
> > wrote:
> > >
> > > Thanks for the KIP! +1 (binding)
> > >
> > > On Mon, Jan 8, 2024 at 9:35 AM Ashwin 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start  a vote on KIP-995.
> > > >
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-995%3A+Allow+users+to+specify+initial+offsets+while+creating+connectors
> > > >
> > > > Discussion thread -
> > > > https://lists.apache.org/thread/msorbr63scglf4484yq764v7klsj7c4j
> > > >
> > > > Thanks!
> > > >
> > > > Ashwin
> > > >
> >
>


Re: [VOTE] KIP-1004: Enforce tasks.max property in Kafka Connect

2024-01-03 Thread Yash Mayya
Hi Chris,

+1 (binding), thanks for the KIP.

Based on discussion in other threads, it looks like the community is
aligned with having a 3.8 release before the 4.0 release so we should be
able to remove the 'tasks.max.enforce' connector property in 4.0 (we'd
discussed potentially having to live with this property until 5.0 in this
KIP's discussion thread). Once we have confirmation of a 3.8 release, will
this KIP be updated to reflect the exact AK versions where the deprecated
property will be introduced and removed?

Thanks,
Yash

On Wed, Jan 3, 2024 at 11:37 PM Greg Harris 
wrote:

> Hey Chris,
>
> Thanks for the KIP! I think the aggressive default and deprecation
> schedule is the right choice for this change.
>
> +1 (binding)
>
> On Wed, Jan 3, 2024 at 9:01 AM Mickael Maison 
> wrote:
> >
> > Hi Chris,
> >
> > +1 (binding), thanks for the KIP
> >
> > Mickael
> >
> > On Tue, Jan 2, 2024 at 8:55 PM Hector Geraldino (BLOOMBERG/ 919 3RD A)
> >  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Thanks Chris!
> > >
> > > From: dev@kafka.apache.org At: 01/02/24 11:49:18 UTC-5:00To:
> dev@kafka.apache.org
> > > Subject: Re: [VOTE] KIP-1004: Enforce tasks.max property in Kafka
> Connect
> > >
> > > Hi all,
> > >
> > > Happy New Year! Wanted to give this a bump now that the holidays are
> over
> > > for a lot of us. Looking forward to people's thoughts!
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Dec 4, 2023 at 10:36 AM Chris Egerton  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to call for a vote on KIP-1004, which adds enforcement for
> the
> > > > tasks.max connector property in Kafka Connect.
> > > >
> > > > The KIP:
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1004%3A+Enforce+tasks.max+
> > > property+in+Kafka+Connect
> > > >
> > > > The discussion thread:
> > > > https://lists.apache.org/thread/scx75cjwm19jyt19wxky41q9smf5nx6d
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > >
> > >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-28 Thread Yash Mayya
Congratulations Divij!

On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:

> Hi, Everyone,
>
> Divij has been a Kafka committer since June, 2023. He has remained very
> active and instructive in the community since becoming a committer. It's my
> pleasure to announce that Divij is now a member of Kafka PMC.
>
> Congratulations Divij!
>
> Luke
> on behalf of Apache Kafka PMC
>


Re: Requesting permission to contribute to Apache Kafka

2023-12-14 Thread Yash Mayya
Hi Roman,

I've granted the required permissions to your accounts.

Cheers,
Yash

On Fri, Dec 15, 2023 at 12:12 PM Роман Бондарь  wrote:

> Hi all,
>
> Please add me as a contributor to kafka project
>
> JIRA username: rbond
> Wiki username: rbond
> GitHub username: gitrbond
>
> Thank you,
> Roman Bondar
>


Re: [DISCUSS] KIP-1004: Enforce tasks.max property in Kafka Connect

2023-11-22 Thread Yash Mayya
Hi Chris,

Thanks for the well written and comprehensive KIP! Given that we're already
past the KIP freeze deadline for 3.7.0 (
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.7.0) and
there may not be a 3.8.0 release before the 4.0.0 release, would we then be
forced to punt the removal of "tasks.max.enforce" to a future 5.0.0
release? I don't have any other comments, and the proposed changes make
sense to me.

Thanks,
Yash

On Mon, Nov 20, 2023 at 10:50 PM Chris Egerton 
wrote:

> Hi Hector,
>
> Thanks for taking a look! I think the key difference between the proposed
> behavior and the rejected alternative is that the set of tasks that will be
> running with the former is still a complete set of tasks, whereas the set
> of tasks in the latter is a subset of tasks. Also noteworthy but slightly
> less important: the problem will be more visible to users with the former
> (the connector will still be marked FAILED) than with the latter.
>
> Cheers,
>
> Chris
>
> On Tue, Nov 21, 2023, 00:53 Hector Geraldino (BLOOMBERG/ 919 3RD A) <
> hgerald...@bloomberg.net> wrote:
>
> > Thanks for the KIP Chris, adding this check makes total sense.
> >
> > I do have one question. The second paragraph in the Public Interfaces
> > section states:
> >
> > "If the connector generated excessive tasks after being reconfigured,
> then
> > any existing tasks for the connector will be allowed to continue running,
> > unless that existing set of tasks also exceeds the tasks.max property."
> >
> > Would not failing the connector land us in the second scenario of
> > 'Rejected Alternatives'?
> >
> > From: dev@kafka.apache.org At: 11/11/23 00:27:44 UTC-5:00To:
> > dev@kafka.apache.org
> > Subject: [DISCUSS] KIP-1004: Enforce tasks.max property in Kafka Connect
> >
> > Hi all,
> >
> > I'd like to open up KIP-1004 for discussion:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1004%3A+Enforce+tasks.max+
> > property+in+Kafka+Connect
> >
> > As a brief summary: this KIP proposes that the Kafka Connect runtime
> start
> > failing connectors that generate a greater number of tasks than the
> > tasks.max property, with an optional emergency override that can be used
> to
> > continue running these (probably-buggy) connectors if absolutely
> necessary.
> >
> > I'll be taking time off most of the next three weeks, so response latency
> > may be a bit higher than usual, but I wanted to kick off the discussion
> in
> > case we can land this in time for the upcoming 3.7.0 release.
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
>


[jira] [Created] (KAFKA-15888) DistributedHerder log context should not use the same client ID for each Connect worker by default

2023-11-22 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15888:
--

 Summary: DistributedHerder log context should not use the same 
client ID for each Connect worker by default
 Key: KAFKA-15888
 URL: https://issues.apache.org/jira/browse/KAFKA-15888
 Project: Kafka
  Issue Type: Bug
  Components: connect, KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


By default, if there is no "{{{}client.id"{}}} configured on a Connect worker 
running in distributed mode, the same client ID ("connect-1") will be used in 
the log context for the DistributedHerder class in every single worker in the 
Connect cluster. This default is quite confusing and obviously not very useful. 
Further, based on how this default is configured 
([ref|https://github.com/apache/kafka/blob/150b0e8290cda57df668ba89f6b422719866de5a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L299]),
 it seems like this might have been an unintentional bug. We could simply use 
the workerId (the advertised host name and port of the worker) by default 
instead, which should be unique for each worker in a cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Requesting permission to contribute to Apache Kafka.

2023-11-07 Thread Yash Mayya
Hi Afshin,

I've granted you the necessary permissions. Thanks for your interest in
contributing to Apache Kafka!

Cheers,
Yash

On Tue, Nov 7, 2023 at 3:52 PM Afshin Moazami
 wrote:

> Hi,
> I would like to request permission to contribute to Apache Kafka.
> wiki ID: amoazami
> Jira ID: afshing
>
>
> Thanks,
> Afshin Moazami
>


Re: Requesting permission to contribute to Apache Kafka

2023-11-02 Thread Yash Mayya
Hi Vedarth,

I've granted you the necessary permissions. Thanks for your interest in
contributing to Apache Kafka!

Cheers,
Yash

On Thu, Nov 2, 2023 at 1:22 PM Vedarth Sharma 
wrote:

> Hi,
>
> Please grant me permission to contribute to KIPs and assign jira tickets to
> myself.
>
> wiki id and jira id :- vedarth
> email :- vedarth.sha...@gmail.com
>
> Regards,
> Vedarth
>


Re: [VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-16 Thread Yash Mayya
Hi all,

Thanks for participating in the discussion and voting! KIP-980 has been
accepted with the following +1 votes:

   - Chris Egerton (binding)
   - Knowles Atchison Jr (non-binding)
   - Greg Harris (binding)
   - Mickael Maison (binding)
   - Yash Mayya (binding)

The target release for this KIP is 3.7.0

Thanks,
Yash

On Mon, Oct 16, 2023 at 7:39 PM Mickael Maison 
wrote:

> +1 (binding)
> Thanks for the KIP
>
> Mickael
>
> On Mon, Oct 16, 2023 at 9:05 AM Yash Mayya  wrote:
> >
> > Hi all,
> >
> > Bumping up this vote thread - we have two binding +1 votes and one
> > non-binding +1 vote so far.
> >
> > Thanks,
> > Yash
> >
> > On Mon, Oct 9, 2023 at 11:57 PM Greg Harris  >
> > wrote:
> >
> > > Thanks Yash for the well written KIP!
> > >
> > > And thank you for finally adding JSON support to the standalone mode
> > > that isn't file extension sensitive. That will be very useful.
> > >
> > > +1 (binding)
> > >
> > > On Mon, Oct 9, 2023 at 10:45 AM Knowles Atchison Jr
> > >  wrote:
> > > >
> > > > This is super useful for pipeline setup!
> > > >
> > > > +1 (non binding)
> > > >
> > > > On Mon, Oct 9, 2023, 7:57 AM Chris Egerton 
> > > wrote:
> > > >
> > > > > Thanks for the KIP, Yash!
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Mon, Oct 9, 2023, 01:12 Yash Mayya 
> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start a vote on KIP-980 which proposes allowing the
> > > creation
> > > > > of
> > > > > > connectors in a stopped (or paused) state.
> > > > > >
> > > > > > KIP -
> > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > >
> > > > > > Discussion Thread -
> > > > > > https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > >
> > >
>


Re: [VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-16 Thread Yash Mayya
Hi all,

Bumping up this vote thread - we have two binding +1 votes and one
non-binding +1 vote so far.

Thanks,
Yash

On Mon, Oct 9, 2023 at 11:57 PM Greg Harris 
wrote:

> Thanks Yash for the well written KIP!
>
> And thank you for finally adding JSON support to the standalone mode
> that isn't file extension sensitive. That will be very useful.
>
> +1 (binding)
>
> On Mon, Oct 9, 2023 at 10:45 AM Knowles Atchison Jr
>  wrote:
> >
> > This is super useful for pipeline setup!
> >
> > +1 (non binding)
> >
> > On Mon, Oct 9, 2023, 7:57 AM Chris Egerton 
> wrote:
> >
> > > Thanks for the KIP, Yash!
> > >
> > > +1 (binding)
> > >
> > > On Mon, Oct 9, 2023, 01:12 Yash Mayya  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start a vote on KIP-980 which proposes allowing the
> creation
> > > of
> > > > connectors in a stopped (or paused) state.
> > > >
> > > > KIP -
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > >
> > > > Discussion Thread -
> > > > https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > >
>


[jira] [Created] (KAFKA-15570) Add unit tests for MemoryConfigBackingStore

2023-10-10 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15570:
--

 Summary: Add unit tests for MemoryConfigBackingStore
 Key: KAFKA-15570
 URL: https://issues.apache.org/jira/browse/KAFKA-15570
 Project: Kafka
  Issue Type: Test
  Components: connect, KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


Currently, the 
[MemoryConfigBackingStore|https://github.com/apache/kafka/blob/6e164bb9ace3ea7a1a9542904d1a01c9fd3a1b48/connect/runtime/src/main/java/org/apache/kafka/connect/storage/MemoryConfigBackingStore.java#L37]
 class doesn't have any unit tests for its functionality. While most of its 
functionality is fairly lightweight today, changes will be introduced with 
[KIP-980|https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state]
 (potentially 
[KIP-976|https://cwiki.apache.org/confluence/display/KAFKA/KIP-976%3A+Cluster-wide+dynamic+log+adjustment+for+Kafka+Connect]
 as well) and it would be good to have a test setup in place before those 
changes are made.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-980: Allow creating connectors in a stopped state

2023-10-08 Thread Yash Mayya
Hi all,

I'd like to start a vote on KIP-980 which proposes allowing the creation of
connectors in a stopped (or paused) state.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state

Discussion Thread -
https://lists.apache.org/thread/om803vl191ysf711qm7czv94285rtt5d

Thanks,
Yash


Re: [VOTE] KIP-976: Cluster-wide dynamic log adjustment for Kafka Connect

2023-10-08 Thread Yash Mayya
Hi Chris,

Thanks for the KIP!
+1 (binding)

Yash

On Fri, Oct 6, 2023 at 9:54 PM Greg Harris 
wrote:

> Hey Chris,
>
> Thanks for the KIP!
> I think that preserving the ephemeral nature of the logging change is
> the right choice here, and using the config topic for intra-cluster
> broadcast is better than REST forwarding.
>
> +1 (binding)
>
> Thanks,
> Greg
>
> On Fri, Oct 6, 2023 at 9:05 AM Chris Egerton 
> wrote:
> >
> > Hi all,
> >
> > I'd like to call for a vote on KIP-976, which augments the existing
> dynamic
> > logger adjustment REST API for Kafka Connect to apply changes
> cluster-wide
> > instead on a per-worker basis.
> >
> > The KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-976:+Cluster-wide+dynamic+log+adjustment+for+Kafka+Connect
> >
> > The discussion thread:
> > https://lists.apache.org/thread/w3x3f3jmyd1vfjxho06y8xgt6mhhzpl5
> >
> > Cheers,
> >
> > Chris
>


Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-10-06 Thread Yash Mayya
Hi Chris,

I've updated the KIP to call out the parsing logic and the user
expectations explicitly. Thanks again for all your feedback on this KIP!
I'll wait for a few more days to see if anyone else has comments before
kicking off a vote thread.

Thanks,
Yash

On Thu, Oct 5, 2023 at 10:49 PM Chris Egerton 
wrote:

> Hi Yash,
>
> Yeah, I think just hardcoding with JSON-first, properties-second is fine.
> IMO it's worth calling this out explicitly in the KIP.
>
> Apart from that, no further comments. LGTM, thanks for the KIP!
>
> Cheers,
>
> Chris
>
> On Thu, Oct 5, 2023 at 6:23 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Thanks for all your feedback so far!
> >
> > 3. That's a good question. I was thinking we'd do some "intelligent"
> > parsing internally during the implementation (i.e. essentially your last
> > option - attempting to parse first as one format, then the other) which
> is
> > why I didn't include any more details in the KIP itself (where I've only
> > outlined the contract changes). This would allow for the smoothest user
> > experience IMO and all the heavy lifting will be done in the parsing
> logic.
> > All the other options seemed either clunky or brittle from the user
> > experience point of view. In terms of the actual implementation itself,
> > we'd probably want to first try parsing it into the supported JSON
> > structures before trying to parse it into Java properties since the Java
> > properties format is very permissive (i.e. we won't really see any errors
> > on attempting to parse a JSON file into Java properties).
> >
> > Thanks,
> > Yash
> >
> > On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton 
> > wrote:
> >
> > > Hi Yash,
> > >
> > > Looking great! Few more thoughts:
> > >
> > >
> > > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> > > blocker; thanks for addressing my concerns about the name of the field
> > and
> > > how it may be perceived by users.
> > >
> > > 2. (Addressed) Thanks for looking into this, and sorry it turned out to
> > be
> > > a bit of a dead end! I'm convinced that the current proposal is good
> > > enough.
> > >
> > > 3. Can you shed a little more light on how we'll determine whether a
> > > connector config should be parsed as JSON or as a properties file? Will
> > > this be based on file extension, a command-line flag (which might apply
> > to
> > > all configs, or individual configs), attempting to parse first as one
> > > format then the other, something else?
> > >
> > > 4. (Addressed) Thanks! Looks great.
> > >
> > > 6. (Addressed) Awesome, great to hear. The point about laggy connector
> > > startup is very convincing; my paranoia is satiated.
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya 
> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thanks for the quick follow up and the continued insightful
> discourse!
> > > >
> > > > 1. Fair point on the need to differentiate it from the actual state
> > > > displayed in the status API, I like the prefix of "initial" to make
> > that
> > > > differentiation (from your suggested alternatives previously).
> > Regarding
> > > > the dots vs underscores as delimiters - the new state field will be a
> > top
> > > > level field in the connector creation request body alongside the
> > "config"
> > > > map (i.e. it won't be a connector configuration itself), so I think
> we
> > > > should be using the underscore delimiter for consistency. For now,
> I've
> > > > updated the KIP to use "initial_state" as the new field's name - let
> me
> > > > know if you disagree, and I'd be happy to reconsider.
> > > >
> > > > 2. Hm, I actually hadn't considered the downgrade implications with
> > your
> > > > proposed single record approach. I agree that it's a bigger downside
> > than
> > > > writing two records to the config topic. I do understand your
> concerns
> > > with
> > > > the potential for config topic inconsistencies which is why I
> proposed
> > > > writing the target state first (since the presence of a target state
> > for
> > > a
> > > > connector with no configuration is a benign condition). Also, even

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-10-05 Thread Yash Mayya
Hi Chris,

Thanks for all your feedback so far!

3. That's a good question. I was thinking we'd do some "intelligent"
parsing internally during the implementation (i.e. essentially your last
option - attempting to parse first as one format, then the other) which is
why I didn't include any more details in the KIP itself (where I've only
outlined the contract changes). This would allow for the smoothest user
experience IMO and all the heavy lifting will be done in the parsing logic.
All the other options seemed either clunky or brittle from the user
experience point of view. In terms of the actual implementation itself,
we'd probably want to first try parsing it into the supported JSON
structures before trying to parse it into Java properties since the Java
properties format is very permissive (i.e. we won't really see any errors
on attempting to parse a JSON file into Java properties).

Thanks,
Yash

On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton 
wrote:

> Hi Yash,
>
> Looking great! Few more thoughts:
>
>
> 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> blocker; thanks for addressing my concerns about the name of the field and
> how it may be perceived by users.
>
> 2. (Addressed) Thanks for looking into this, and sorry it turned out to be
> a bit of a dead end! I'm convinced that the current proposal is good
> enough.
>
> 3. Can you shed a little more light on how we'll determine whether a
> connector config should be parsed as JSON or as a properties file? Will
> this be based on file extension, a command-line flag (which might apply to
> all configs, or individual configs), attempting to parse first as one
> format then the other, something else?
>
> 4. (Addressed) Thanks! Looks great.
>
> 6. (Addressed) Awesome, great to hear. The point about laggy connector
> startup is very convincing; my paranoia is satiated.
>
>
> Cheers,
>
> Chris
>
> On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Thanks for the quick follow up and the continued insightful discourse!
> >
> > 1. Fair point on the need to differentiate it from the actual state
> > displayed in the status API, I like the prefix of "initial" to make that
> > differentiation (from your suggested alternatives previously). Regarding
> > the dots vs underscores as delimiters - the new state field will be a top
> > level field in the connector creation request body alongside the "config"
> > map (i.e. it won't be a connector configuration itself), so I think we
> > should be using the underscore delimiter for consistency. For now, I've
> > updated the KIP to use "initial_state" as the new field's name - let me
> > know if you disagree, and I'd be happy to reconsider.
> >
> > 2. Hm, I actually hadn't considered the downgrade implications with your
> > proposed single record approach. I agree that it's a bigger downside than
> > writing two records to the config topic. I do understand your concerns
> with
> > the potential for config topic inconsistencies which is why I proposed
> > writing the target state first (since the presence of a target state for
> a
> > connector with no configuration is a benign condition). Also, even in the
> > non-transactional config topic producer case - if there is a failure
> > between the two writes, the user will be notified of the error
> > synchronously via the API response (ref -
> > https://github.com/apache/kafka/pull/12984) and will be able to safely
> > retry the operation. I don't see how we'd be able to do a single record
> > write approach along with supporting clean downgrades since we'd either
> > need to introduce a new record type or add a new field to an existing
> > record type - neither of which would be recognized as such by an older
> > Connect worker.
> >
> > > Standalone mode has always supported the REST API,
> > > and so far FWICTwe've maintained feature parity between
> > > the two modes
> >
> > > add support for JSON files with standalone mode.
> >
> > 3. Thanks, I wasn't aware about standalone mode always having supported
> the
> > full REST API - I thought I'd seen some references earlier indicating
> > otherwise. In that case, I do agree that it makes sense to maintain
> parity
> > across both methods of connector creation for user experience
> consistency.
> > I really like the idea of updating the standalone mode CLI to be able to
> > parse JSON files (in the same format as the connector creation REST API
> > endpoint request body) along with Java properties files since I think
> that
> > offers two big benefit

[jira] [Resolved] (KAFKA-15547) Thread leak in MirrorMakerConfigTest#testClientConfigProperties

2023-10-04 Thread Yash Mayya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Mayya resolved KAFKA-15547.

Fix Version/s: 3.7.0
   Resolution: Fixed

> Thread leak in MirrorMakerConfigTest#testClientConfigProperties
> ---
>
> Key: KAFKA-15547
> URL: https://issues.apache.org/jira/browse/KAFKA-15547
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kalpesh Patel
>Assignee: Kalpesh Patel
>Priority: Minor
> Fix For: 3.7.0
>
>
> The test MirrorMakerConfigTest#testClientConfigProperties opens a 
> ForwardingAdmin but fails to close it.
> we should enclose this in a try-with-resources statement to ensure the Admin 
> client is closed and there is no thread leak



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-10-04 Thread Yash Mayya
t accompanying connector
> config), even if it's meant to be only for a brief period. Thinking about
> it some more though, a one-record approach comes with drawbacks in the
> downgrade scenario: older workers wouldn't know how to handle the new
> config format and would just fall back to creating the connector in the
> running state. I suppose we should favor the two-record approach since the
> downgrade scenario is more likely than the other failure mode, but it'd be
> nice if we could think of a way to satisfy both concerns. Not a blocker,
> though.
>
> 3. Standalone mode has always supported the REST API, and so far FWICT
> we've maintained feature parity between the two modes for everything except
> exactly-once source connectors, which would have required significant
> additional work since we'd have to add support for storing source connector
> offsets in a Kafka topic instead of on local storage like we currently do.
> I'd really prefer if we could try to maintain feature parity wherever
> possible--one way we could possibly do that with this KIP is to also add
> support for JSON files with standalone mode.
>
> 4. Yeah, no need to block on that idea since there are other use cases for
> creating stopped connectors. We can treat it like the option to delete
> offsets along with the connector discussed in KIP-875: punt for now,
> possibly implement later pending user feedback and indication of demand.
> Might be worth adding to a "Future work" section as an indication that we
> haven't ruled it out (in which case it'd make sense as a rejected
> alternative) but have chosen not to implement yet.
>
>
> And I had one new thought that's pretty implementation-oriented but may
> influence the design slightly:
>
> 6. Right now we write an empty set of task configs to the config topic when
> handling requests to stop a connector in distributed mode. Do we need to do
> the same when creating connectors in the stopped state, or add any other
> special logic besides noting the new state in the config topic? Or is it
> sufficient to write a non-running target state to the config topic and then
> rely on existing logic to simply refuse to generate task configs for the
> newly-created connector? Is there any chance that the lack of task configs
> (as opposed to an empty list of task configs) in the config topic for a
> connector that exists will cause issues?
>
>
> Cheers,
>
> Chris
>
> On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Thanks for taking a look at this KIP!
> >
> > 1. I chose to go with simply "state" as that exact term is already
> exposed
> > via some of the existing REST API responses and would be one that users
> are
> > already familiar with (although admittedly something like "initial_state"
> > wouldn't be much of a jump). Since it's a field in the request body for
> the
> > connector creation endpoint, wouldn't it be implied that it is the
> > "initial" state just like the "config" field represents the "initial"
> > configuration? Also, I don't think x.y has been established as the field
> > naming convention in the Connect REST API right? From what I can tell,
> x_y
> > is the convention being followed for fields in requests ("kafka_topic" /
> > "kafka_partition" / "kafka_offset" in the offsets APIs for instance) and
> > responses ("error_count", "kafka_cluster_id", "recommended_values" etc.).
> >
> > 2. The connector configuration record is currently used for both
> connector
> > create requests as well as connector config update requests. Since we're
> > only allowing configuring the target state for newly created connectors,
> I
> > feel like it'll be a cleaner separation of concerns to use the existing
> > records for connector configurations and connector target states rather
> > than bundling the "state" and "state.v2" (or equivalent) fields into the
> > connector configuration record. The additional write should be very
> minimal
> > overhead and the two writes would be an atomic operation for Connect
> > clusters that are using a transactional producer for the config topic
> > anyway. Thoughts?
> >
> > 3. I was thinking that we'd support standalone mode via the same
> connector
> > creation REST API endpoint changes (addition of the "state" field). If
> > there is further interest in adding similar support to the command line
> > method of creating connectors as well, perhaps we could do so in a small
> > follow-on KIP? I feel like ever since the standalone m

Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-10-03 Thread Yash Mayya
Hi Chris,

Thanks for taking a look at this KIP!

1. I chose to go with simply "state" as that exact term is already exposed
via some of the existing REST API responses and would be one that users are
already familiar with (although admittedly something like "initial_state"
wouldn't be much of a jump). Since it's a field in the request body for the
connector creation endpoint, wouldn't it be implied that it is the
"initial" state just like the "config" field represents the "initial"
configuration? Also, I don't think x.y has been established as the field
naming convention in the Connect REST API right? From what I can tell, x_y
is the convention being followed for fields in requests ("kafka_topic" /
"kafka_partition" / "kafka_offset" in the offsets APIs for instance) and
responses ("error_count", "kafka_cluster_id", "recommended_values" etc.).

2. The connector configuration record is currently used for both connector
create requests as well as connector config update requests. Since we're
only allowing configuring the target state for newly created connectors, I
feel like it'll be a cleaner separation of concerns to use the existing
records for connector configurations and connector target states rather
than bundling the "state" and "state.v2" (or equivalent) fields into the
connector configuration record. The additional write should be very minimal
overhead and the two writes would be an atomic operation for Connect
clusters that are using a transactional producer for the config topic
anyway. Thoughts?

3. I was thinking that we'd support standalone mode via the same connector
creation REST API endpoint changes (addition of the "state" field). If
there is further interest in adding similar support to the command line
method of creating connectors as well, perhaps we could do so in a small
follow-on KIP? I feel like ever since the standalone mode started
supporting the full Connect REST API, the command line method of creating
connectors is more of a "legacy" concept.

4. Yeah, connector / offsets migration was just used as a representative
example of how this feature could be useful - I didn't intend for it to be
the sole purpose of this KIP. However, that said, I really like the idea of
accepting an "offsets" field in the connector creation request since it'd
reduce the number of user operations required from 3 (create the connector
in a STOPPED state; alter the offsets; resume the connector) to 1. I'd be
happy to either create or review a separate small KIP for that if it sounds
acceptable to you?

5. Whoops, thanks, I hadn't even noticed that that's where I had linked to.
Fixed!

Thanks,
Yash

On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton 
wrote:

> Hi Yash,
>
> Thanks for the KIP! Frankly this feels like an oversight in 875 and I'm
> glad we're closing that loop ASAP.
>
>
> Here are my thoughts:
>
> 1. (Nit): IMO "start.state", "initial.state", or "target.state" might be
> better than just "state" for the field name in the connector creation
> request.
>
> 2. Why implement this in distributed mode with two writes to the config
> topic? We could augment the existing format for connector configs in the
> config topic [1] with a new field instead.
>
> 3. Although standalone mode is mentioned in the KIP, there's no detail on
> the Java properties file format that we support for standalone mode (i.e.,
> `connect-standalone config/connect-standalone.properties
> config/connect-file-source.properties
> config/connect-file-sink.properties`). Do we plan on adding support for
> that mode as well?
>
> 4. I suspect there will be advantages for this feature beyond offsets
> migration, but if offsets migration were the only motivation for it,
> wouldn't it be simpler to accept an "offsets" field in connector creation
> requests? That way, users wouldn't have to start a connector in a different
> state and then resume it, they could just create the connector like normal.
> I think the current proposal is acceptable, but wanted to float this
> alternative in case we anticipate lots of connector migrations and want to
> optimize for them a bit more.
>
> 5. (NIt): We can link to our own Javadocs [2] instead of javadoc.io
>
>
> [1] -
>
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
>
> [2] - https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
>
>
> Cheers,
>
> Chris
>
> On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya  wrote:
>
> > Hi Ashwin,
> >
> > Thanks for taking a look and sharing your thoug

[jira] [Resolved] (KAFKA-15177) MirrorMaker 2 should implement the alterOffsets KIP-875 API

2023-09-27 Thread Yash Mayya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Mayya resolved KAFKA-15177.

Fix Version/s: 3.6.0
   Resolution: Fixed

> MirrorMaker 2 should implement the alterOffsets KIP-875 API
> ---
>
> Key: KAFKA-15177
> URL: https://issues.apache.org/jira/browse/KAFKA-15177
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect, mirrormaker
>    Reporter: Yash Mayya
>Assignee: Chris Egerton
>Priority: Minor
> Fix For: 3.6.0
>
>
> The {{MirrorSourceConnector}} class should implement the new alterOffsets API 
> added in 
> [KIP-875|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect].
>  We could also implement the API in 
> {{MirrorCheckpointConnector}} and 
> {{MirrorHeartbeatConnector}} to prevent external modification of offsets 
> since the operation wouldn't really make sense in their case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-09-21 Thread Yash Mayya
Hi Ashwin,

Thanks for taking a look and sharing your thoughts!

1. Yes, the request / response formats of the two APIs were intentionally
made identical for such use-cases. [1]

2. I'm assuming you're referring to retaining the offset / config topic
records for a connector when it is deleted by a user? Firstly, a
connector's offsets aren't actually currently deleted when the connector is
deleted - it was listed as potential future work in KIP-875 [2]. Secondly,
the config topic is the source of truth for the state of a Connect cluster
and a connector deletion is done by writing a null-valued record
(tombstone) with the connector key to the config topic. We could
potentially modify the delete mechanism to use a special new record
(instead of a tombstone with the connector key) in order to retain the
latest configuration of a connector while still deleting the actual
connector - however, this would have its downsides and I don't see too many
benefits. Furthermore, connector migration between different Kafka clusters
was just used as a representational use case for creating connectors in a
stopped state - it isn't the primary focus of this KIP as such.

3. KIP-875 goes into some more details about this [3], but the TLDR is that
the STOPPED state will be treated like the PAUSED state on older workers
that don't recognize the STOPPED state.

Thanks,
Yash

[1] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat

[2] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors

[3] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate

On Wed, Sep 20, 2023 at 7:24 PM Ashwin  wrote:

> Thanks Yash! This is very useful for migrating connectors from one cluster
> to another.
>
> I had the following comments/questions
>
> 1. Is the offset read using `GET /offsets` api always guaranteed to be in a
> format accepted by `PATCH /offsets` ?
> 2. I had to tackle a similar migration situation but the two connect
> clusters in question were using the same backing Kafka cluster. The
> challenge in this case is that when I delete the original connector, I want
> to retain offsets and config topics. Do you think we should support
> deletion of a connector without removal of these topics as part of this KIP
> ?
> 3. In the case of a downgrade, how will Connect worker handle the optional
> “state” field in config topic ?
>
> Thanks,
> Ashwin
>
>
>
>
> On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya  wrote:
>
> > Hi all,
> >
> > I'd like to begin discussion on a KIP to allow creating connectors in a
> > stopped state -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> >
> >
> > Thanks,
> > Yash
> >
>


[DISCUSS] KIP-980: Allow creating connectors in a stopped state

2023-09-17 Thread Yash Mayya
Hi all,

I'd like to begin discussion on a KIP to allow creating connectors in a
stopped state -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state


Thanks,
Yash


[jira] [Created] (KAFKA-15470) Allow creating connectors in a stopped state

2023-09-15 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15470:
--

 Summary: Allow creating connectors in a stopped state
 Key: KAFKA-15470
 URL: https://issues.apache.org/jira/browse/KAFKA-15470
 Project: Kafka
  Issue Type: New Feature
  Components: connect, KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya
 Fix For: 3.7.0


[KIP-875: First-class offsets support in Kafka 
Connect|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
 introduced a new {{STOPPED}} state for connectors along with some REST API 
endpoints to retrieve and modify offsets for connectors. Currently, only 
connectors that already exist can be stopped and any newly created connector 
will always be in the {{RUNNING}} state initially. Allowing the creation of 
connectors in a {{STOPPED}} state will facilitate multiple new use cases. One 
interesting use case would be to migrate connectors from one Kafka Connect 
cluster to another. Individual connector migration would be useful in a number 
of scenarios such as breaking a large cluster into multiple smaller clusters 
(or vice versa), moving a connector from a cluster running in one data center 
to another etc. A connector migration could be achieved by using the following 
sequence of steps :-
 # Stop the running connector on the original Kafka Connect cluster
 # Retrieve the offsets for the connector via the {{GET 
/connectors/\{connector}/offsets}}  endpoint
 # Create the connector in a stopped state using the same configuration on the 
new Kafka Connect cluster
 # Alter the offsets for the connector on the new cluster via the {{PATCH 
/connectors/\{connector}/offsets}}  endpoint (using the offsets obtained from 
the original cluster)
 # Resume the connector on the new cluster and delete it on the original cluster

Another use case for creating connectors in a stopped state could be deploying 
connectors as a part of a larger data pipeline before the source / sink data 
system has been created or is ready for data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14067) Sink connector override.consumer.group.id can conflict with worker group.id

2023-09-12 Thread Yash Mayya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Mayya resolved KAFKA-14067.

Resolution: Fixed

> Sink connector override.consumer.group.id can conflict with worker group.id
> ---
>
> Key: KAFKA-14067
> URL: https://issues.apache.org/jira/browse/KAFKA-14067
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Greg Harris
>Priority: Minor
> Fix For: 3.7.0
>
>
> Currently there is a validation step for connector names which prevents sink 
> connector consumer groups from colliding with the worker group.id.
> There is currently no such validation for consumer.override.group.id that 
> would prevent a conflicting connector from being configured, and so it is 
> possible to misconfigure a connector in a way that may be damaging to the 
> workers themselves.
> Reproduction steps:
> 1. Configure a connect distributed cluster with a certain group.id in the 
> worker config.
> 2. Configure a sink connector with consumer.override.group.id having the same 
> value as in the worker config
> Expected behavior:
> 1. An error is returned indicating that the consumer.override.group.id is 
> invalid
> 2. The connector is not created or started
> Actual behavior:
> 1. No error is returned, and the configuration is otherwise valid.
> 2. The connector is created and starts running.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-970: Deprecate and remove Connect's redundant task configurations endpoint

2023-09-08 Thread Yash Mayya
Hi all,

KIP-970 has been accepted with three binding +1 votes from Chris, Greg and
Mickael (and three non-binding +1 votes from Hector, Andrew and Sagar).
Thanks all.

Cheers,
Yash

On Fri, Sep 8, 2023 at 7:38 PM Mickael Maison 
wrote:

> Hi Yash,
>
> +1 (binding)
> Thanks for the KIP!
>
> Mickael
>
> On Wed, Sep 6, 2023 at 6:05 PM Greg Harris 
> wrote:
> >
> > Hey Yash,
> >
> > +1(binding)
> >
> > Thanks for the KIP!
> > Greg
> >
> > On Wed, Sep 6, 2023 at 6:59 AM Yash Mayya  wrote:
> > >
> > > Hi all,
> > >
> > > I just wanted to bump up this vote thread. Thanks to everyone who's
> voted
> > > so far - we have 1 binding +1 vote and 3 non-binding +1 votes so far.
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Wed, Aug 30, 2023 at 11:14 PM Sagar 
> wrote:
> > >
> > > > +1 (non - binding).
> > > >
> > > > Thanks !
> > > > Sagar.
> > > >
> > > > On Wed, 30 Aug 2023 at 11:09 PM, Chris Egerton
> 
> > > > wrote:
> > > >
> > > > > +1 (binding), thanks Yash!
> > > > >
> > > > > On Wed, Aug 30, 2023 at 1:34 PM Andrew Schofield <
> > > > > andrew_schofield_j...@outlook.com> wrote:
> > > > >
> > > > > > Thanks for the KIP. Looks good to me.
> > > > > >
> > > > > > +1 (non-binding).
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > > > On 30 Aug 2023, at 18:07, Hector Geraldino (BLOOMBERG/ 919 3RD
> A) <
> > > > > > hgerald...@bloomberg.net> wrote:
> > > > > > >
> > > > > > > This makes sense to me, +1 (non-binding)
> > > > > > >
> > > > > > > From: dev@kafka.apache.org At: 08/30/23 02:58:59 UTC-4:00To:
> > > > > > dev@kafka.apache.org
> > > > > > > Subject: [VOTE] KIP-970: Deprecate and remove Connect's
> redundant
> > > > task
> > > > > > configurations endpoint
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This is the vote thread for KIP-970 which proposes deprecating
> (in
> > > > the
> > > > > > > Apache Kafka 3.7 release) and eventually removing (in the next
> major
> > > > > > Apache
> > > > > > > Kafka release - 4.0) Connect's redundant task configurations
> > > > endpoint.
> > > > > > >
> > > > > > > KIP -
> > > > > > >
> > > > > >
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remov
> > > > > > > e+Connect%27s+redundant+task+configurations+endpoint
> > > > > > >
> > > > > > > Discussion thread -
> > > > > > >
> https://lists.apache.org/thread/997qg9oz58kho3c19mdrjodv0n98plvj
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yash
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
>


Re: [VOTE] KIP-970: Deprecate and remove Connect's redundant task configurations endpoint

2023-09-06 Thread Yash Mayya
Hi all,

I just wanted to bump up this vote thread. Thanks to everyone who's voted
so far - we have 1 binding +1 vote and 3 non-binding +1 votes so far.

Thanks,
Yash

On Wed, Aug 30, 2023 at 11:14 PM Sagar  wrote:

> +1 (non - binding).
>
> Thanks !
> Sagar.
>
> On Wed, 30 Aug 2023 at 11:09 PM, Chris Egerton 
> wrote:
>
> > +1 (binding), thanks Yash!
> >
> > On Wed, Aug 30, 2023 at 1:34 PM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> > > Thanks for the KIP. Looks good to me.
> > >
> > > +1 (non-binding).
> > >
> > > Andrew
> > >
> > > > On 30 Aug 2023, at 18:07, Hector Geraldino (BLOOMBERG/ 919 3RD A) <
> > > hgerald...@bloomberg.net> wrote:
> > > >
> > > > This makes sense to me, +1 (non-binding)
> > > >
> > > > From: dev@kafka.apache.org At: 08/30/23 02:58:59 UTC-4:00To:
> > > dev@kafka.apache.org
> > > > Subject: [VOTE] KIP-970: Deprecate and remove Connect's redundant
> task
> > > configurations endpoint
> > > >
> > > > Hi all,
> > > >
> > > > This is the vote thread for KIP-970 which proposes deprecating (in
> the
> > > > Apache Kafka 3.7 release) and eventually removing (in the next major
> > > Apache
> > > > Kafka release - 4.0) Connect's redundant task configurations
> endpoint.
> > > >
> > > > KIP -
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remov
> > > > e+Connect%27s+redundant+task+configurations+endpoint
> > > >
> > > > Discussion thread -
> > > > https://lists.apache.org/thread/997qg9oz58kho3c19mdrjodv0n98plvj
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > >
> > >
> > >
> >
>


Re: KIP-976: Cluster-wide dynamic log adjustment for Kafka Connect

2023-09-06 Thread Yash Mayya
Hi Chris,

Thanks for the clarification on the last modified timestamp tracking here
and on the KIP, things look good to me now.

On the persistence front, I hadn't considered the interplay between levels
set by the log level REST APIs and those set by the log4j configuration
files, and the user confusion that could arise from it. I think your
argument for keeping the changes made by these log level admin endpoints
ephemeral makes sense, thanks!

Hi Sagar,

> The entire namespace name in the config
> records key (along with logger-cluster prefix)
> seems to be a bit too verbose. My first
> thought was to not have the prefix
> org.apache.kafka.connect in the keys
> considering it is the root namespace. But
> since logging can be enabled at a root level,
> can we just use initials like (o.a.k.c) which is
> also a standard practice.

We allow modifying the log levels for any namespace - i.e. even packages
and classes outside of Kafka Connect itself (think connector classes,
dependencies etc.). I'm not sure I follow how we'd avoid naming clashes
without using the fully qualified name?

> I was also thinking if we could introduce a
>  new parameter which takes a subset of
> worker ids and enables logging for them in
> one go

The future section already talks about potentially introducing new scopes
such as a specific connector or a specific connector plugin. Are there any
use cases apart from these that would be satisfied by allowing changing the
log levels on a subset of workers in a Kafka Connect cluster?

Thanks,
Yash

On Wed, Sep 6, 2023 at 7:41 AM Sagar  wrote:

> Hey Chris,
>
> Thanks for making the updates.
>
> The updated definition of last_modified looks good to me. As a continuation
> to point number 2, could we also mention that this could be used to get
> insights into the propagation of the cluster wide log level updates. It is
> implicit but probably better to add it I feel.
>
> Regarding
>
> It's a little indirect on the front of worker restart behavior, so let me
> > know if that especially should be fleshed out a bit more (possibly by
> > calling it out in the "Config topic records" section).
>
>
> Yeah I would lean on the side of calling it out explicitly. Since the
> behaviour is similar to the current dynamically set log levels (i.e
> resetting to the log4j config files levels) so we can call it out stating
> that similarity just for completeness sake. It would be useful info for
> new/medium level users reading the KIP considering worker restarts is not
> uncommon.
>
>
> Thanks, I'm glad that this seems reasonable. I've tentatively added this to
> > the rejected alternatives section, but am happy to continue the
> > conversation if anyone wants to explore that possibility further.
>
>
> +1
>
> I had a nit level suggestion but not sure if it makes sense but would still
> call it out. The entire namespace name in the config records key (along
> with logger-cluster prefix) seems to be a bit too verbose. My first thought
> was to not have the prefix org.apache.kafka.connect in the keys considering
> it is the root namespace. But since logging can be enabled at a root level,
> can we just use initials like (o.a.k.c) which is also a standard practice.
> Let me know what you think?
>
> Lastly, I was also thinking if we could introduce a new parameter which
> takes a subset of worker ids and enables logging for them in one go. But
> this is already achievable by invoking scope=worker endpoint n times to
> reflect on n workers so maybe not a necessary change. But this could be
> useful on a large cluster. Do you think this is worth listing in the Future
> Work section? It's not important so can be ignored as well.
>
> Thanks!
> Sagar.
>
>
> On Wed, Sep 6, 2023 at 12:08 AM Chris Egerton 
> wrote:
>
> > Hi Sagar,
> >
> > Thanks for your thoughts! Responses inline:
> >
> > > 1) Considering that you have now clarified here that last_modified
> field
> > would be stored in-memory, it is not mentioned in the KIP. Although
> that's
> > something that's understandable, it wasn't apparent when reading the KIP.
> > Probably, we could mention it? Also, what happens if a worker restarts?
> In
> > that case, since the log level update message would be pre-dating the
> > startup of the worker, it would be ignored? We should probably mention
> that
> > behaviour as well IMO.
> >
> > I've tweaked the second paragraph of the "Last modified timestamp"
> section
> > to try to clarify this without getting too verbose: "Modification times
> > will be tracked in-memory and determined by when they are applied by the
> > worker, as opposed to when they are requested by the user or persisted to
> > the config topic (details below). If no modifications to the namespace
> have
> > been made since the worker finished startup, they will be null." Does
> that
> > feel sufficient? It's a little indirect on the front of worker restart
> > behavior, so let me know if that especially should be fleshed out a bit
> > more (possibly 

Re: KIP-976: Cluster-wide dynamic log adjustment for Kafka Connect

2023-09-04 Thread Yash Mayya
> If no modifications to a logging namespace have
> been made, won't the namespace itself be omitted
> from the response? It looks like we currently only
> return loggers that have non-null log levels in the
*> **GET /admin/loggers* endpoint.

This can be ignored - I didn't account for the fact that at worker startup
we'll still have loggers with non-null log levels - the root logger and any
other named loggers which have explicit log levels configured in the log4j
properties.

On Mon, Sep 4, 2023 at 12:00 PM Yash Mayya  wrote:

> Hi Chris,
>
> Thanks for the KIP, this looks like a really useful addition to Kafka
> Connect's log level REST APIs! I have a few questions and comments:
>
> > If no modifications to the namespace have
> > been made since the worker was started,
> > they will be null
>
> If no modifications to a logging namespace have been made, won't the
> namespace itself be omitted from the response? It looks like we currently
> only return loggers that have non-null log levels in the *GET
> /admin/loggers* endpoint.
>
> > Last modified timestamp
>
> From the proposed changes section, it isn't very clear to me how we'll be
> tracking this last modified timestamp to be returned in responses for the *GET
> /admin/loggers* and *GET /admin/loggers/{logger}* endpoints. Could you
> please elaborate on this? Also, will we track the last modified timestamp
> even for worker scoped modifications where we won't write any records to
> the config topic and the requests will essentially be processed
> synchronously?
>
> > Record values will have the following format, where ${level} is the new
> logging level for the namespace:
>
> In the current synchronous implementation for the *PUT
> /admin/loggers/{logger} *endpoint, we return a 404 error if the level is
> invalid (i.e. not one of TRACE, DEBUG, WARN etc.). Since the new cluster
> scoped variant of the endpoint will be asynchronous, can we also add a
> validation to synchronously surface erroneous log levels to users?
>
> > Workers that have not yet completed startup
> > will ignore these records, including if the worker
> > reads one during the read-to-end of the config
> > topic that all workers perform during startup.
>
> I'm curious to know what the rationale here is? In KIP-745, the stated
> reasoning behind ignoring restart requests during worker startup was that
> the worker will anyway be starting all connectors and tasks assigned to it
> so the restart request is essentially meaningless. With the log level API
> however, wouldn't it make more sense to apply any cluster scoped
> modifications to new workers in the cluster too? This also applies to any
> workers that are restarted soon after a request is made to *PUT
> /admin/loggers/{logger}?scope=cluster *on another worker. Maybe we could
> stack up all the cluster scoped log level modification requests during the
> config topic read at worker startup and apply the latest ones per namespace
> (assuming older records haven't already been compacted) after we've
> finished reading to the end of the config topic?
>
> > if you're debugging the distributed herder, you
> > need all the help you can get
>
> 
>
> As an aside, thanks for the impressively thorough testing plan in the KIP!
>
>
> Hi Ashwin,
>
> > isn't it better to forward the requests to the
> > leader in the initial implementation ?
>
> Would there be any downside to only introducing leader forwarding for
> connector/task specific scope types in the future (rather than introducing
> it at the outset here where it isn't strictly required)?
>
> > I would also recommend an additional system
> > test for Standalone herder to ensure that the
> > new scope parameter is honored and the response
> > contains the last modified time.
>
> Can't this be sufficiently covered using unit and / or integration tests?
> System tests are fairly expensive to run in terms of overall test runtime
> and they are also not run on every PR or commit to trunk / feature branches
> (unlike unit tests and integration tests).
>
> Thanks,
> Yash
>
> On Sat, Sep 2, 2023 at 5:12 AM Chris Egerton 
> wrote:
>
>> Hi all,
>>
>> Can't imagine a worse time to publish a new KIP (it's late on a Friday and
>> we're in the middle of the 3.6.0 release), but I wanted to put forth
>> KIP-976 for discussion:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-976%3A+Cluster-wide+dynamic+log+adjustment+for+Kafka+Connect
>>
>> TL;DR: The API to dynamically adjust log levels at runtime with Connect is
>> great, and I'd like to augment it with support to adjust log levels for
>> every worker in the cluster (instead of just the worker that receives the
>> REST request).
>>
>> I look forward to everyone's thoughts, but expect that this will probably
>> take a bump or two once the dust has settled on 3.6.0. Huge thanks to
>> everyone that's contributed to that release so far, especially our release
>> manager Satish!
>>
>> Cheers,
>>
>> Chris
>>
>


Re: KIP-976: Cluster-wide dynamic log adjustment for Kafka Connect

2023-09-04 Thread Yash Mayya
Hi Chris,

Thanks for the KIP, this looks like a really useful addition to Kafka
Connect's log level REST APIs! I have a few questions and comments:

> If no modifications to the namespace have
> been made since the worker was started,
> they will be null

If no modifications to a logging namespace have been made, won't the
namespace itself be omitted from the response? It looks like we currently
only return loggers that have non-null log levels in the *GET
/admin/loggers* endpoint.

> Last modified timestamp

>From the proposed changes section, it isn't very clear to me how we'll be
tracking this last modified timestamp to be returned in responses for the *GET
/admin/loggers* and *GET /admin/loggers/{logger}* endpoints. Could you
please elaborate on this? Also, will we track the last modified timestamp
even for worker scoped modifications where we won't write any records to
the config topic and the requests will essentially be processed
synchronously?

> Record values will have the following format, where ${level} is the new
logging level for the namespace:

In the current synchronous implementation for the *PUT
/admin/loggers/{logger} *endpoint, we return a 404 error if the level is
invalid (i.e. not one of TRACE, DEBUG, WARN etc.). Since the new cluster
scoped variant of the endpoint will be asynchronous, can we also add a
validation to synchronously surface erroneous log levels to users?

> Workers that have not yet completed startup
> will ignore these records, including if the worker
> reads one during the read-to-end of the config
> topic that all workers perform during startup.

I'm curious to know what the rationale here is? In KIP-745, the stated
reasoning behind ignoring restart requests during worker startup was that
the worker will anyway be starting all connectors and tasks assigned to it
so the restart request is essentially meaningless. With the log level API
however, wouldn't it make more sense to apply any cluster scoped
modifications to new workers in the cluster too? This also applies to any
workers that are restarted soon after a request is made to *PUT
/admin/loggers/{logger}?scope=cluster *on another worker. Maybe we could
stack up all the cluster scoped log level modification requests during the
config topic read at worker startup and apply the latest ones per namespace
(assuming older records haven't already been compacted) after we've
finished reading to the end of the config topic?

> if you're debugging the distributed herder, you
> need all the help you can get



As an aside, thanks for the impressively thorough testing plan in the KIP!


Hi Ashwin,

> isn't it better to forward the requests to the
> leader in the initial implementation ?

Would there be any downside to only introducing leader forwarding for
connector/task specific scope types in the future (rather than introducing
it at the outset here where it isn't strictly required)?

> I would also recommend an additional system
> test for Standalone herder to ensure that the
> new scope parameter is honored and the response
> contains the last modified time.

Can't this be sufficiently covered using unit and / or integration tests?
System tests are fairly expensive to run in terms of overall test runtime
and they are also not run on every PR or commit to trunk / feature branches
(unlike unit tests and integration tests).

Thanks,
Yash

On Sat, Sep 2, 2023 at 5:12 AM Chris Egerton 
wrote:

> Hi all,
>
> Can't imagine a worse time to publish a new KIP (it's late on a Friday and
> we're in the middle of the 3.6.0 release), but I wanted to put forth
> KIP-976 for discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-976%3A+Cluster-wide+dynamic+log+adjustment+for+Kafka+Connect
>
> TL;DR: The API to dynamically adjust log levels at runtime with Connect is
> great, and I'd like to augment it with support to adjust log levels for
> every worker in the cluster (instead of just the worker that receives the
> REST request).
>
> I look forward to everyone's thoughts, but expect that this will probably
> take a bump or two once the dust has settled on 3.6.0. Huge thanks to
> everyone that's contributed to that release so far, especially our release
> manager Satish!
>
> Cheers,
>
> Chris
>


[VOTE] KIP-970: Deprecate and remove Connect's redundant task configurations endpoint

2023-08-30 Thread Yash Mayya
Hi all,

This is the vote thread for KIP-970 which proposes deprecating (in the
Apache Kafka 3.7 release) and eventually removing (in the next major Apache
Kafka release - 4.0) Connect's redundant task configurations endpoint.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remove+Connect%27s+redundant+task+configurations+endpoint

Discussion thread -
https://lists.apache.org/thread/997qg9oz58kho3c19mdrjodv0n98plvj

Thanks,
Yash


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-08-29 Thread Yash Mayya
Hi Sagar,

> The size of offsets topic can be controlled by
> setting appropriate topic retention values and
> that is a standard practice in Kafka

Kafka Connect enforces the `cleanup.policy` configuration for the offsets
topic to be "compact" only (references - [1], [2]), so the topic retention
related configurations won't be relevant right?

> Deleting offsets is not something which should
> be done very frequently and should be handled
> with care

> Agreed this involves some toil but it's not something
> that should be done on a very regular basis.

I'm not sure I follow how we came to this conclusion, could you please
expand on the pitfalls of allowing connector plugins to wipe the offsets
for source partitions that they no longer care about?

> The usecases you highlighted are edge cases at
> best. As I have been saying, if it is needed we can
> always add it in the future but that doesn't look like
> a problem we need to solve upfront.

I agree that these cases might not be too common, but I'm just trying to
understand the reasoning behind preventing this use case since null offsets
don't require any separate handling from the Connect runtime's point of
view (and wouldn't need any additional implementation work in this KIP).

Thanks,
Yash

[1] - https://kafka.apache.org/documentation/#connect_running
[2] -
https://github.com/apache/kafka/blob/0912ca27e2a229d2ebe02f4d1dabc40ed5fab0bb/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaTopicBasedBackingStore.java#L47

On Mon, Aug 28, 2023 at 1:38 PM Sagar  wrote:

> Hey Yash,
>
> Thanks for your further comments. Here are my responses:
>
> 1) Deleting offsets via updateOffsets.
>
> Hmm, I am not sure this is really necessary to be part of the KIP at this
> point, and we can always add it later on if needed. I say this for the
> following reasons:
>
>
>- The size of offsets topic can be controlled by setting appropriate
>topic retention values and that is a standard practice in Kafka. Sure
> it's
>not always possible to get the right values but as I said it is a
> standard
>practice. For Connect specifically, there is also a KIP (KIP-943
><
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073470
> >)
>which is trying to solve the problem of a large connect-offsets topic.
> So,
>if that is really the motivation, then these are being addressed
> separately
>anyways.
>- Deleting offsets is not something which should be done very frequently
>and should be handled with care. That is why KIP-875's mechanism to have
>users/ cluster admin do this externally is the right thing to do. Agreed
>this involves some toil but it's not something that should be done on a
>very regular basis.
>- There is no stopping connector implementations to send tombstone
>records as offsets but in practice how many connectors actually do it?
>Maybe 1 or 2 from what we discussed.
>- The usecases you highlighted are edge cases at best. As I have been
>saying, if it is needed we can always add it in the future but that
> doesn't
>look like a problem we need to solve upfront.
>
> Due to these reasons, I don't think this is a point that we need to stress
> so much upon. I say this because offsets topic's purging/clean up can be
> handled either via standard Kafka techniques (point #1 above) or via
> Connect runtime techniques (Pt #2  above). IMO the problem we are trying to
> solve via this KIP has been solved by connectors using techniques which
> have been termed as having higher maintenance cost or a high cognitive load
> (i.e separate topic) and that needs to be addressed upfront. And since you
> yourself termed it as a nice to have feature, we can leave it to that and
> take it up as Future Work. Hope that's ok with you and other community
> members.
>
> 2) Purpose of offsets parameter in updateOffsets
>
> The main purpose is to provide the task with the visibility into what
> partitions are getting their offsets committed. It is not necessary that a
> task might choose to update offsets everytime it sees that a given source
> partition is missing from the about to be committed offsets. Maybe it
> chooses to wait for some X iterations or X amount of time and send out an
> updated offset for a partition only when such thresholds are breached. Even
> here we could argue that since it's sending the partition/offsets it can do
> the tracking on it's own, but IMO that is too much work given that the
> information is already available via offsets to be committed.
>
> Thanks!
> Sagar.
>


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-08-22 Thread Yash Mayya
Hi Sagar,

Thanks for the updates and apologies for the delayed response.

> Hmm the question is how do you qualify a
> partition as stale or old? Let's say a connector
>  has implemented updateOffsets and for a certain
> partition for which no records are received then it
> will update it's offsets. So technically that offset can't
> be termed as stale anymore

The "staleness" was not from the point of view of the offsets, but the
source partition itself. For instance, if a database source connector is
monitoring a number of tables (each modelled as a source partition) and
detects that a table has been dropped, it might be nice to allow the
connector to wipe the offset for that source partition. Similarly, a file
based source connector that is reading from multiple files in a directory
might want to wipe the offsets for a source file that has been deleted.

> Even though I can't think of a side effect at this
> point to disallow offset deletion via this method,
> my opinion is to use a proper mechanism like the
> ones introduced in KIP-875 to delete offsets. Moreover,
> if I also consider the option presented in point #2 , for
> simplicity sake it seems better to not add this feature at
> this point

The KIP-875 APIs would allow users / cluster administrators to manually
wipe offsets externally. However, for the cases that I've outlined above,
it would be additional toil for the operator and something that would be
more suitable to be done by the connector itself. Also, I'm not sure if I'm
missing something here, but I don't get why allowing tombstone offsets
would add any complexity here?

> I get the point now. I can't think of cases where
> updating offsets would be needed.

Given that we're disallowing updating offsets for source partitions whose
offsets are about to be committed (or removing such source partitions
altogether), I'm wondering what purpose does the "offsets" parameter in the
newly proposed SourceTask::updateOffsets method serve?

Thanks,
Yash

On Fri, Jul 28, 2023 at 1:41 PM Sagar  wrote:

> Hey Yash,
>
> Thanks for your comments.
>
> 1) Hmm the question is how do you qualify a partition as stale or old?
> Let's say a connector has implemented updateOffsets and for a certain
> partition for which no records are received then it will update it's
> offsets. So technically that offset can't be termed as stale anymore. Even
> though I can't think of a side effect at this point to disallow offset
> deletion via this method, my opinion is to use a proper mechanism like the
> ones introduced in KIP-875 to delete offsets. Moreover, if I also consider
> the option presented in point #2 , for simplicity sake it seems better to
> not add this feature at this point. If we feel it's really needed and users
> are requesting it, we can add support for it later on.
>
> 2) I get the point now. I can't think of cases where updating offsets would
> be needed. As with point #1, we can always add it back if needed later on.
> For now, I have removed that part from the KIP.
>
> 3) Yes, because the offset commit happens on a different thread, ordering
> guarantees might be harder to ensure if we do it from the other thread. The
> current mechanism proposed, even though gets invoked multiple times, keeps
> things simpler to reason about.
>
> Let me know how things look now. If it's all looking ok, I would go ahead
> and create a Vote thread for the same.
>
> Thanks!
> Sagar.
>
> On Tue, Jul 25, 2023 at 5:15 PM Yash Mayya  wrote:
>
> > Hi Sagar,
> >
> > Thanks for the updates. I had a few more follow up questions:
> >
> > > I have added that a better way of doing that would be
> > > via KIP-875. Also, I didn't want to include any mechamisms
> > > for users to meddle with the offsets topic. Allowing tombstone
> > > records via this method would be akin to publishing tombstone
> > > records directly to the offsets topic which is not recommended
> > > generally.
> >
> > KIP-875 would allow a way for cluster administrators and / or users to do
> > so manually externally whereas allowing tombstones in
> > SourceTask::updateOffsets would enable connectors to clean up offsets for
> > old / stale partitions without user intervention right? I'm not sure I
> > follow what you mean by "I didn't want to include any mechamisms for
> users
> > to meddle with the offsets topic" here? Furthermore, I'm not sure why
> > publishing tombstone records directly to the offsets topic would not be
> > recommended? Isn't that currently the only way to manually clean up
> offsets
> > for a source connector?
> >
> > > It could be useful in a scenario where the offset of a partition
> > > doesn't 

[DISCUSS] KIP-970: Deprecate and remove Connect's redundant task configurations endpoint

2023-08-22 Thread Yash Mayya
Hi all,

I'd like to start a discussion thread for this KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remove+Connect%27s+redundant+task+configurations+endpoint
.

It proposes the deprecation and eventual removal of Kafka Connect's
redundant task configurations endpoint.

Thanks,
Yash


[jira] [Created] (KAFKA-15387) Deprecate and remove Connect's duplicate task configurations retrieval endpoint

2023-08-21 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15387:
--

 Summary: Deprecate and remove Connect's duplicate task 
configurations retrieval endpoint
 Key: KAFKA-15387
 URL: https://issues.apache.org/jira/browse/KAFKA-15387
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya
 Fix For: 4.0.0


A new endpoint ({{{}GET /connectors/\{connector}/tasks-config){}}} was added to 
Kafka Connect's REST API to expose task configurations in 
[KIP-661|https://cwiki.apache.org/confluence/display/KAFKA/KIP-661%3A+Expose+task+configurations+in+Connect+REST+API].
 However, the original patch for Kafka Connect's REST API had already added an 
endpoint ({{{}GET /connectors/\{connector}/tasks){}}} to retrieve the list of a 
connector's tasks and their configurations (ref - 
[https://github.com/apache/kafka/pull/378] , 
https://issues.apache.org/jira/browse/KAFKA-2369) and this was missed in 
KIP-661. We can deprecate the endpoint added by KIP-661 in 3.7 (the next minor 
AK release) and remove it in 4.0 (the next major AK release) since it's 
redundant to have two separate endpoints to expose task configurations. Related 
discussions in 
[https://github.com/apache/kafka/pull/13424#discussion_r1144727886] and 
https://issues.apache.org/jira/browse/KAFKA-15377 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15377) GET /connectors/{connector}/tasks-config endpoint exposes externalized secret values

2023-08-18 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15377:
--

 Summary: GET /connectors/{connector}/tasks-config endpoint exposes 
externalized secret values
 Key: KAFKA-15377
 URL: https://issues.apache.org/jira/browse/KAFKA-15377
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


The \{{GET /connectors/{connector}/tasks-config}} endpoint added in 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-661%3A+Expose+task+configurations+in+Connect+REST+API]
 exposes externalized secret values in task configurations (see 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations)].
 A similar bug was fixed in https://issues.apache.org/jira/browse/KAFKA-5117 / 
[https://github.com/apache/kafka/pull/6129] for the \{{GET 
/connectors/{connector}/tasks}} endpoint. The config provider placeholder 
should be used instead of the resolved config value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.6.0 release

2023-07-26 Thread Yash Mayya
Hi Hector,

KIP-959 actually still requires 2 more binding votes to be accepted (
https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Approvals).
The non-binding votes from people who aren't committers (including myself)
don't count towards the required lazy majority.

Thanks,
Yash

On Wed, Jul 26, 2023 at 7:35 PM Satish Duggana 
wrote:

> Hi Hector,
> Thanks for the update on KIP-959.
>
> ~Satish.
>
> On Wed, 26 Jul 2023 at 18:38, Hector Geraldino (BLOOMBERG/ 919 3RD A)
>  wrote:
> >
> > Hi Satish,
> >
> > I added KIP-959 [1] to the list. The KIP has received enough votes to
> pass, but I'm waiting the 72 hours before announcing the results. There's
> also a (small) PR with the implementation for this KIP that hopefully will
> get reviewed/merged soon.
> >
> > Best,
> >
> > [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-959%3A+Add+BooleanConverter+to+Kafka+Connect
> >
> > From: dev@kafka.apache.org At: 06/12/23 06:22:00 UTC-4:00To:
> dev@kafka.apache.org
> > Subject: Re: Apache Kafka 3.6.0 release
> >
> > Hi,
> > I have created a release plan for Apache Kafka version 3.6.0 on the
> > wiki. You can access the release plan and all related information by
> > following this link:
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.6.0
> >
> > The release plan outlines the key milestones and important dates for
> > version 3.6.0. Currently, the following dates have been set for the
> > release:
> >
> > KIP Freeze: 26th July 23
> > Feature Freeze : 16th Aug 23
> > Code Freeze : 30th Aug 23
> >
> > Please review the release plan and provide any additional information
> > or updates regarding KIPs targeting version 3.6.0. If you have
> > authored any KIPs that are missing a status or if there are incorrect
> > status details, please make the necessary updates and inform me so
> > that I can keep the plan accurate and up to date.
> >
> > Thanks,
> > Satish.
> >
> > On Mon, 17 Apr 2023 at 21:17, Luke Chen  wrote:
> > >
> > > Thanks for volunteering!
> > >
> > > +1
> > >
> > > Luke
> > >
> > > On Mon, Apr 17, 2023 at 2:03 AM Ismael Juma  wrote:
> > >
> > > > Thanks for volunteering Satish. +1.
> > > >
> > > > Ismael
> > > >
> > > > On Sun, Apr 16, 2023 at 10:08 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I would like to volunteer as release manager for the next release,
> > > > > which will be Apache Kafka 3.6.0.
> > > > >
> > > > > If there are no objections, I will start a release plan a week
> after
> > > > > 3.5.0 release(around early May).
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > >
> >
> >
>


Re: [VOTE] KIP-959 Add BooleanConverter to Kafka Connect

2023-07-25 Thread Yash Mayya
Hi Hector,

Thanks for the KIP!

+1 (non-binding)

Thanks,
Yash

On Tue, Jul 25, 2023 at 11:01 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Thanks for the KIP. As you say, not that controversial.
>
> +1 (non-binding)
>
> Thanks,
> Andrew
>
> > On 25 Jul 2023, at 18:22, Hector Geraldino (BLOOMBERG/ 919 3RD A) <
> hgerald...@bloomberg.net> wrote:
> >
> > Hi everyone,
> >
> > The changes proposed by KIP-959 (Add BooleanConverter to Kafka Connect)
> have a limited scope and shouldn't be controversial. I'm opening a voting
> thread with the hope that it can be included in the next upcoming 3.6
> release.
> >
> > Here are some links:
> >
> > KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-959%3A+Add+BooleanConverter+to+Kafka+Connect
> > JIRA: https://issues.apache.org/jira/browse/KAFKA-15248
> > Discussion thread:
> https://lists.apache.org/thread/15c2t0kl9bozmzjxmkl5n57kv4l4o1dt
> > Pull Request: https://github.com/apache/kafka/pull/14093
> >
> > Thanks!
>
>
>


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-07-25 Thread Yash Mayya
ms simpler and more
> >> focused, which I think is a win for users and developers alike. Here are
> >> my
> >> thoughts on the current draft:
> >>
> >> 1. (Nit) Can we move the "Public Interfaces" section before the
> "Proposed
> >> Changes" section? It's nice to have a summary of the
> user/developer-facing
> >> changes first since that answers many of the questions that I had while
> >> reading the "Proposed Changes" section. I'd bet that this is also why we
> >> use that ordering in the KIP template.
> >>
> >> 2. Why are we invoking SourceTask::updateOffsets so frequently when
> >> exactly-once support is disabled? Wouldn't it be simpler both for our
> >> implementation and for connector developers if we only invoked it
> directly
> >> before committing offsets, instead of potentially several times between
> >> offset commits, especially since that would also mirror the behavior
> with
> >> exactly-once support enabled?
> >>
> >> 3. Building off of point 2, we wouldn't need to specify any more detail
> >> than that "SourceTask::updateOffsets will be invoked directly before
> >> committing offsets, with the to-be-committed offsets". There would be no
> >> need to distinguish between when exactly-once support is enabled or
> >> disabled.
> >>
> >> 4. Some general stylistic feedback: we shouldn't mention the names of
> >> internal classes or methods in KIPs. KIPS are for discussing high-level
> >> design proposals. Internal names and APIS may change over time, and are
> >> not
> >> very helpful to readers who are not already familiar with the code base.
> >> Instead, we should describe changes in behavior, not code.
> >>
> >> 5. Why return a complete map of to-be-committed offsets instead of a map
> >> of
> >> just the offsets that the connector wants to change? This seems
> especially
> >> intuitive since we automatically re-insert source partitions that have
> >> been
> >> removed by the connector.
> >>
> >> 6. I don't think we don't need to return an Optional from
> >> SourceTask::updateOffsets. Developers can return null instead of
> >> Optional.empty(), and since the framework will have to handle null
> return
> >> values either way, this would reduce the number of cases for us to
> handle
> >> from three (Optional.of(...), Optional.empty(), null) to two (null,
> >> non-null).
> >>
> >> 7. Why disallow tombstone records? If an upstream resource disappears,
> >> then
> >> wouldn't a task want to emit a tombstone record without having to also
> >> emit
> >> an accompanying source record? This could help prevent an
> >> infinitely-growing offsets topic, although with KIP-875 coming out in
> the
> >> next release, perhaps we can leave this out for now and let Connect
> users
> >> and cluster administrators do this work manually instead of letting
> >> connector developers automate it.
> >>
> >> 8. Is the information on multiple offsets topics for exactly-once
> >> connectors relevant to this KIP? If not, we should remove it.
> >>
> >> 9. It seems like most of the use cases that motivate this KIP only
> require
> >> being able to add a new source partition/source offset pair to the
> >> to-be-committed offsets. Do we need to allow connector developers to
> >> modify
> >> source offsets for already-present source partitions at all? If we
> reduce
> >> the surface of the API, then the worst case is still just that the
> offsets
> >> we commit are at most one commit out-of-date.
> >>
> >> 10. (Nit) The "Motivation" section states that "offsets are written
> >> periodically by the connect framework to an offsets topic". This is only
> >> true in distributed mode; in standalone mode, we write offsets to a
> local
> >> file.
> >>
> >> Cheers,
> >>
> >> Chris
> >>
> >> On Tue, Jul 4, 2023 at 8:42 AM Yash Mayya  wrote:
> >>
> >> > Hi Sagar,
> >> >
> >> > Thanks for your continued work on this KIP! Here are my thoughts on
> your
> >> > updated proposal:
> >> >
> >> > 1) In the proposed changes section where you talk about modifying the
> >> > offsets, could you please clarify that tasks shouldn't modify the
> >> offsets
> >> > map that is passed as an ar

[jira] [Created] (KAFKA-15238) Connect workers can be disabled by DLQ related stuck admin client calls

2023-07-24 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15238:
--

 Summary: Connect workers can be disabled by DLQ related stuck 
admin client calls
 Key: KAFKA-15238
 URL: https://issues.apache.org/jira/browse/KAFKA-15238
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


When Kafka Connect is run in distributed mode - if a sink connector's task is 
restarted (via a worker's REST API), the following sequence of steps will occur 
(on the DistributedHerder's thread):

 
 # The existing sink task will be stopped 
([ref|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1367])
 # A new sink task will be started 
([ref|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1867C40-L1867C40])
 # As a part of the above step, a new {{WorkerSinkTask}} will be instantiated 
([ref|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L656-L663])
 # The DLQ reporter (see 
[KIP-298|https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect])
 for the sink task is also instantiated and configured as a part of this 
([ref|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1800])
 # The DLQ reporter setup involves two synchronous admin client calls to list 
topics and create the DLQ topic if it isn't already created 
([ref|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/errors/DeadLetterQueueReporter.java#L84-L87])

 

All of these are occurring synchronously on the herder's tick thread - in this 
portion 
[here|https://github.com/apache/kafka/blob/4981fa939d588645401619bfc3e321dc523d10e7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L457-L469]
 where external requests are run. If the admin client call in the DLQ reporter 
setup step blocks for some time (due to auth failures and retries or network 
issues or whatever other reason), this can cause the Connect worker to become 
non-functional (REST API requests will timeout) and even fall out of the 
Connect cluster and become a zombie (since the tick thread also drives group 
membership functions).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15216) InternalSinkRecord::newRecord method ignores the headers argument

2023-07-19 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15216:
--

 Summary: InternalSinkRecord::newRecord method ignores the headers 
argument
 Key: KAFKA-15216
 URL: https://issues.apache.org/jira/browse/KAFKA-15216
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


[https://github.com/apache/kafka/blob/a1f6ab69387deb10988461152a0087f0cd2827c4/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/InternalSinkRecord.java#L50-L56]
 - the headers argument passed to the {{InternalSinkRecord}} constructor is the 
instance field via the accessor {{headers()}} method instead of the 
{{newRecord}} method's {{headers}} argument value.

 

Originally discovered 
[here.|https://github.com/apache/kafka/pull/14024#discussion_r1266917499]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15182) Normalize offsets before invoking SourceConnector::alterOffsets

2023-07-12 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15182:
--

 Summary: Normalize offsets before invoking 
SourceConnector::alterOffsets
 Key: KAFKA-15182
 URL: https://issues.apache.org/jira/browse/KAFKA-15182
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


See discussion 
[here|https://github.com/apache/kafka/pull/13945#discussion_r1260946148]

 

TLDR: When users attempt to externally modify source connector offsets via the 
{{PATCH /offsets}} endpoint (introduced in 
[KIP-875|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]),
 type mismatches can occur between offsets passed to 
{{SourceConnector::alterOffsets}} and the offsets that are retrieved by 
connectors / tasks via an instance of {{OffsetStorageReader }}after the offsets 
have been modified. In order to prevent this type mismatch that could lead to 
subtle bugs in connectors, we could serialize + deserialize the offsets using 
the worker's internal JSON converter before invoking 
{{{}SourceConnector::alterOffsets{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15179) Add integration tests for the FileStream Sink and Source connectors

2023-07-11 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15179:
--

 Summary: Add integration tests for the FileStream Sink and Source 
connectors
 Key: KAFKA-15179
 URL: https://issues.apache.org/jira/browse/KAFKA-15179
 Project: Kafka
  Issue Type: Improvement
Reporter: Yash Mayya
Assignee: Yash Mayya


Add integration tests for the FileStream Sink and Source connectors covering 
various different common scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15177) MirrorMaker 2 should implement the alterOffsets KIP-875 API

2023-07-11 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15177:
--

 Summary: MirrorMaker 2 should implement the alterOffsets KIP-875 
API
 Key: KAFKA-15177
 URL: https://issues.apache.org/jira/browse/KAFKA-15177
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect, mirrormaker
Reporter: Yash Mayya


The {{MirrorSourceConnector}} class should implement the new alterOffsets API 
added in 
[KIP-875|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect].
 We could also implement the API in 
{{MirrorCheckpointConnector}} and 
{{MirrorHeartbeatConnector}} to prevent external modification of offsets since 
the operation wouldn't really make sense in their case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New committer: Greg Harris

2023-07-10 Thread Yash Mayya
Congrats Greg!

On Mon, Jul 10, 2023 at 9:15 PM Chris Egerton  wrote:

> Hi all,
>
> The PMC for Apache Kafka has invited Greg Harris to become a committer, and
> we are happy to announce that he has accepted!
>
> Greg has been contributing to Kafka since 2019. He has made over 50 commits
> mostly around Kafka Connect and Mirror Maker 2. His most notable
> contributions include KIP-898: "Modernize Connect plugin discovery" and a
> deep overhaul of the offset syncing logic in MM2 that addressed several
> technically-difficult, long-standing, high-impact issues.
>
> He has also been an active participant in discussions and reviews on the
> mailing lists and on GitHub.
>
> Thanks for all of your contributions, Greg. Congratulations!
>


Re: [VOTE] KIP-882: Kafka Connect REST API timeout improvements

2023-07-10 Thread Yash Mayya
Hi all,

I just wanted to post an update on this KIP since it has been dormant for a
while now. The original issue that motivated this KIP was resolved by
tweaking some configurations for the client that the connector was using to
talk to the external system. Instead of discarding this KIP, however, I've
heavily reduced its scope to only include the following improvements:

   - Abort create / update connector requests if the config validation
   causes the request to time out (instead of the existing behavior of
   proceeding to create / update a connector even after serving a 500 response
   to the user)
   - Fix the double config validation issue in Connect's distributed mode

I was on the fence about whether or not this required a KIP but since we're
technically changing the behavior of a public endpoint in certain cases, I
decided to err on the side of caution and continue with a public discussion
and vote.

Thanks,
Yash

On Wed, Mar 1, 2023 at 6:32 PM Yash Mayya  wrote:

> Hi all,
>
> I'd like to call for a vote on the (hopefully) straightforward KIP-882
> which adds support for configuring request timeouts on Kafka Connect REST
> APIs via query parameters along with a couple of related small improvements.
>
> KIP -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements
>
> Discussion thread -
> https://lists.apache.org/thread/cygy115qmwpc3nj5omnj0crws2dw8nor
>
> Thanks,
> Yash
>


Re: Apache Kafka 3.6.0 release

2023-07-06 Thread Yash Mayya
Hi Satish,

KIP-793 [1] just passed voting and we should be able to wrap up the
implementation in time for the 3.6.0 feature freeze. Could we add it to the
release plan?

[1] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs

Thanks,
Yash

On Mon, Jun 12, 2023 at 3:52 PM Satish Duggana 
wrote:

> Hi,
> I have created a release plan for Apache Kafka version 3.6.0 on the
> wiki. You can access the release plan and all related information by
> following this link:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.6.0
>
> The release plan outlines the key milestones and important dates for
> version 3.6.0. Currently, the following dates have been set for the
> release:
>
> KIP Freeze: 26th July 23
> Feature Freeze : 16th Aug 23
> Code Freeze : 30th Aug 23
>
> Please review the release plan and provide any additional information
> or updates regarding KIPs targeting version 3.6.0. If you have
> authored any KIPs that are missing a status or if there are incorrect
> status details, please make the necessary updates and inform me so
> that I can keep the plan accurate and up to date.
>
> Thanks,
> Satish.
>
> On Mon, 17 Apr 2023 at 21:17, Luke Chen  wrote:
> >
> > Thanks for volunteering!
> >
> > +1
> >
> > Luke
> >
> > On Mon, Apr 17, 2023 at 2:03 AM Ismael Juma  wrote:
> >
> > > Thanks for volunteering Satish. +1.
> > >
> > > Ismael
> > >
> > > On Sun, Apr 16, 2023 at 10:08 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > I would like to volunteer as release manager for the next release,
> > > > which will be Apache Kafka 3.6.0.
> > > >
> > > > If there are no objections, I will start a release plan a week after
> > > > 3.5.0 release(around early May).
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > >
>


Re: [VOTE] KIP-793: Allow sink connectors to be used with topic-mutating SMTs

2023-07-06 Thread Yash Mayya
Hi all,

Thanks to everyone who participated in the voting and the discussion. I'll
close the voting since it has been open for over 72 hours and we have a
sufficient number of votes. KIP-793 has been accepted with the following +1
votes (and no +0 or -1 votes):


   - Chris Egerton (binding)
   - Greg Harris
   - Mickael Maison (binding)
   - Randall Hauch (binding)

I'll start working on the implementation next week and try to get it merged
in time for the 3.6.0 release.

Thanks,
Yash

On Thu, Jul 6, 2023 at 11:32 PM Randall Hauch  wrote:

> Hi, Yash.
>
> Thanks for the KIP and for fixing this long standing issue.
> +1 (binding)
>
> Randall
>
> On Tue, Jul 4, 2023 at 8:32 AM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > +1 (binding)
> > Thanks for the KIP
> >
> > Mickael
> >
> >
> > On Mon, Jul 3, 2023 at 8:18 PM Greg Harris  >
> > wrote:
> > >
> > > Hey Yash,
> > >
> > > Thanks so much for your effort in the design and discussion phase!
> > >
> > > +1 (non-binding)
> > >
> > > Greg
> > >
> > > On Mon, Jul 3, 2023 at 7:19 AM Chris Egerton 
> > wrote:
> > > >
> > > > Hi Yash,
> > > >
> > > > Thanks for the KIP! +1 (binding)
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Mon, Jul 3, 2023 at 7:02 AM Yash Mayya 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a vote on KIP-793 which enables sink connector
> > > > > implementations to be used with SMTs that mutate the topic /
> > partition /
> > > > > offset information of a record.
> > > > >
> > > > > KIP -
> > > > >
> > > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs
> > > > >
> > > > > Discussion thread -
> > > > > https://lists.apache.org/thread/dfo3spv0xtd7vby075qoxvcwsgx5nkj8
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> >
>


Re: [DISCUSS] KIP-943: Add independent "offset.storage.segment.bytes" for connect-distributed.properties

2023-07-06 Thread Yash Mayya
Also, just adding to the above point - we don't necessarily need to
explicitly add a new worker configuration right? Instead, we could
potentially just use the new proposed default value internally which can be
overridden by users through setting a value for
"offset.storage.segment.bytes" (via the existing KIP-605 based mechanism).

On Thu, Jul 6, 2023 at 6:04 PM Yash Mayya  wrote:

> Hi hudeqi,
>
> Thanks for the KIP! Just to clarify - since KIP-605 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-605%3A+Expand+Connect+Worker+Internal+Topic+Settings)
> already allows configuring "segment.bytes" for the Connect cluster's
> offsets topic via a worker configuration ("offset.storage.segment.bytes",
> same as what is being proposed in this KIP), the primary motivation of this
> KIP is essentially to override the default value for that topic
> configuration to a smaller value and decouple it from the backing Kafka
> cluster's "log.segment.bytes" configuration? Also, I'm curious about how
> the new default value of 50 MB was chosen (if there were any experiments
> that were run etc.)?
>
> Thanks,
> Yash
>
> On Mon, Jul 3, 2023 at 6:08 PM hudeqi <16120...@bjtu.edu.cn> wrote:
>
>> Is anyone following this KIP? Bump this thread.
>
>


Re: [DISCUSS] KIP-943: Add independent "offset.storage.segment.bytes" for connect-distributed.properties

2023-07-06 Thread Yash Mayya
Hi hudeqi,

Thanks for the KIP! Just to clarify - since KIP-605 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-605%3A+Expand+Connect+Worker+Internal+Topic+Settings)
already allows configuring "segment.bytes" for the Connect cluster's
offsets topic via a worker configuration ("offset.storage.segment.bytes",
same as what is being proposed in this KIP), the primary motivation of this
KIP is essentially to override the default value for that topic
configuration to a smaller value and decouple it from the backing Kafka
cluster's "log.segment.bytes" configuration? Also, I'm curious about how
the new default value of 50 MB was chosen (if there were any experiments
that were run etc.)?

Thanks,
Yash

On Mon, Jul 3, 2023 at 6:08 PM hudeqi <16120...@bjtu.edu.cn> wrote:

> Is anyone following this KIP? Bump this thread.


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-07-04 Thread Yash Mayya
new method will still be compatible with
older Connect runtimes where the method will simply not be invoked.


Thanks,
Yash

On Wed, Jun 21, 2023 at 10:25 PM Sagar  wrote:

> Hi All,
>
> I have created this PR: https://github.com/apache/kafka/pull/13899 which
> implements the approach outlined in the latest version of the KIP. I
> thought I could use this to validate the approach based on my understanding
> while the KIP itself gets reviewed. I can always change the implementation
> once we move to a final decision on the KIP.
>
> Thanks!
> Sagar.
>
>
> On Wed, Jun 14, 2023 at 4:59 PM Sagar  wrote:
>
> > Hey All,
> >
> > Bumping this discussion thread again to see how the modified KIP looks
> > like.
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, May 29, 2023 at 8:12 PM Sagar  wrote:
> >
> >> Hi,
> >>
> >> Bumping this thread again for further reviews.
> >>
> >> Thanks!
> >> Sagar.
> >>
> >> On Fri, May 12, 2023 at 3:38 PM Sagar 
> wrote:
> >>
> >>> Hi All,
> >>>
> >>> Thanks for the comments/reviews. I have updated the KIP
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-910%3A+Update+Source+offsets+for+Source+Connectors+without+producing+records
> >>> with a newer approach which shelves the need for an explicit topic.
> >>>
> >>> Please review again and let me know what you think.
> >>>
> >>> Thanks!
> >>> Sagar.
> >>>
> >>>
> >>> On Mon, Apr 24, 2023 at 3:35 PM Yash Mayya 
> wrote:
> >>>
> >>>> Hi Sagar,
> >>>>
> >>>> Thanks for the KIP! I have a few questions and comments:
> >>>>
> >>>> 1) I agree with Chris' point about the separation of a connector
> >>>> heartbeat
> >>>> mechanism and allowing source connectors to generate offsets without
> >>>> producing data. What is the purpose of the heartbeat topic here and
> are
> >>>> there any concrete use cases for downstream consumers on this topic?
> Why
> >>>> can't we instead simply introduce a mechanism to retrieve a list of
> >>>> source
> >>>> partition / source offset pairs from the source tasks?
> >>>>
> >>>> 2) With the currently described mechanism, the new
> >>>> "SourceTask::produceHeartbeatRecords" method returns a
> >>>> "List"
> >>>> - what happens with the topic in each of these source records? Chris
> >>>> pointed this out above, but it doesn't seem to have been addressed?
> The
> >>>> "SourceRecord" class also has a bunch of other fields which will be
> >>>> irrelevant here (partition, key / value schema, key / value data,
> >>>> timestamp, headers). In fact, it seems like only the source partition
> >>>> and
> >>>> source offset are relevant here, so we should either introduce a new
> >>>> abstraction or simply use a data structure like a mapping from source
> >>>> partitions to source offsets (adds to the above point)?
> >>>>
> >>>> 3) I'm not sure I fully follow why the heartbeat timer / interval is
> >>>> needed? What are the downsides of
> >>>> calling "SourceTask::produceHeartbeatRecords" in every execution loop
> >>>> (similar to the existing "SourceTask::poll" method)? Is this only to
> >>>> prevent the generation of a lot of offset records? Since Connect's
> >>>> offsets
> >>>> topics are log compacted (and source partitions are used as keys for
> >>>> each
> >>>> source offset), I'm not sure if such concerns are valid and such a
> >>>> heartbeat timer / interval mechanism is required?
> >>>>
> >>>> 4) The first couple of rejected alternatives state that the use of a
> >>>> null
> >>>> topic / key / value are preferably avoided - but the current proposal
> >>>> would
> >>>> also likely require connectors to use such workarounds (null topic
> when
> >>>> the
> >>>> heartbeat topic is configured at a worker level and always for the
> key /
> >>>> value)?
> >>>>
> >>>> 5) The third rejected alternative talks about subclassing the
> >>>> "SourceRecord" class - this presumably means allowing connector

[jira] [Created] (KAFKA-15145) AbstractWorkerSourceTask re-processes records filtered out by SMTs on retriable exceptions

2023-07-04 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15145:
--

 Summary: AbstractWorkerSourceTask re-processes records filtered 
out by SMTs on retriable exceptions
 Key: KAFKA-15145
 URL: https://issues.apache.org/jira/browse/KAFKA-15145
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


If a RetriableException is thrown from an admin client or producer client 
operation in 
[AbstractWorkerSourceTask::sendRecords|https://github.com/apache/kafka/blob/5c2492bca71200806ccf776ea31639a90290d43e/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L388],
 the send operation is retried for the remaining records in the batch. There is 
a minor bug in the logic for computing the remaining records for a batch which 
causes records that are filtered out by the task's transformation chain to be 
re-processed. This will also result in the SourceTask::commitRecord method 
being called twice for the same record, which can cause certain types of source 
connectors to fail. This bug seems to exist since when SMTs were first 
introduced in 0.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-793: Allow sink connectors to be used with topic-mutating SMTs

2023-07-03 Thread Yash Mayya
Hi all,

I'd like to start a vote on KIP-793 which enables sink connector
implementations to be used with SMTs that mutate the topic / partition /
offset information of a record.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs

Discussion thread -
https://lists.apache.org/thread/dfo3spv0xtd7vby075qoxvcwsgx5nkj8

Thanks,
Yash


Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-07-03 Thread Yash Mayya
Hi Chris,

Thanks for pointing that out, I hadn't realized that the SubmittedRecords
class has almost exactly the same semantics needed for handling offset
commits in the per-sink record ack API case. However, I agree that it isn't
worth the tradeoff and we've already discussed the backward compatibility
concerns imposed on connector developers if we were to consider deprecating
/ removing the preCommit hook in favor of a new ack-based API.

Thanks,
Yash

On Thu, Jun 29, 2023 at 7:31 PM Chris Egerton 
wrote:

> Hi Yash,
>
> Thanks for your continued work on this tricky feature. I have no further
> comments or suggestions on the KIP and am ready to vote in favor of it.
>
> That said, I did want to quickly respond to this comment:
>
> > On a side note, this also means that the per sink record ack API
> that was proposed earlier wouldn't really work for this case since Kafka
> consumers themselves don't support per message acknowledgement semantics
> (and any sort of manual book-keeping based on offset linearity in a topic
> partition would be affected by things like log compaction, control records
> for transactional use cases etc.) right?
>
> I believe we could still use the SubmittedRecords class [1] (with some
> small tweaks) to track ack'd messages and the latest-committable offsets
> per topic partition, without relying on assumptions about offsets for
> consecutive records consumed from Kafka always differing by one. But at
> this point I think that, although this approach does come with the
> advantage of also enabling fine-grained metrics on record delivery to the
> sink system, it's not worth the tradeoff in intuition since it's less clear
> why users should prefer that API instead of using SinkTask::preCommit.
>
> [1] -
>
> https://github.com/apache/kafka/blob/12be344fdd3b20f338ccab87933b89049ce202a4/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java
>
> Cheers,
>
> Chris
>
> On Wed, Jun 21, 2023 at 9:46 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Firstly, thanks for sharing your detailed thoughts on this thorny issue!
> > Point taken on Kafka Connect being a brownfield project and I guess we
> > might just need to trade off elegant / "clean" interfaces for fixing this
> > gap in functionality. Also, thanks for calling out all the existing
> > cross-plugin interactions and also the fact that connectors are not and
> > should not be developed in silos ignoring the rest of the ecosystem. That
> > said, here are my thoughts:
> >
> > > we could replace these methods with headers that the
> > > Connect runtime automatically injects into records directly
> > > before dispatching them to SinkTask::put.
> >
> > Hm, that's an interesting idea to get around the need for connectors to
> > handle potential 'NoSuchMethodError's in calls to
> > SinkRecord::originalTopic/originalKafkaPartition/originalKafkaOffset.
> > However, I'm inclined to agree that retrieving these values from the
> record
> > headers seems even less intuitive and I'm okay with adding this to the
> > rejected alternatives list.
> >
> > > we can consider eliminating the overridden
> > > SinkTask::open/close methods
> >
> > I tried to further explore the idea of keeping just the existing
> > SinkTask::open / SinkTask::close methods but only calling them with
> > post-transform topic partitions and ended up coming to the same
> conclusion
> > that you did earlier in this thread :)
> >
> > The overloaded SinkTask::open / SinkTask::close are currently the biggest
> > sticking points with the latest iteration of this KIP and I'd prefer this
> > elimination for now. The primary reasoning is that the information from
> > open / close on pre-transform topic partitions can be combined with the
> per
> > record information of both pre-transform and post-transform topic
> > partitions to handle most practical use cases without significantly
> > muddying the sink connector related public interfaces. The argument that
> > this makes it harder for sink connectors to deal with post-transform
> topic
> > partitions (i.e. in terms of grouping together or batching records for
> > writing to the sink system) can be countered with the fact that it'll be
> > similarly challenging even with the overloaded method approach of calling
> > open / close with both pre-transform and post-transform topic partitions
> > since the batching would be done on post-transform topic partitions
> whereas
> > offset tracking and reporting for commits would be done on pre-transform
> > topic partitions (and the two won't necessarily serially

[jira] [Created] (KAFKA-15121) FileStreamSourceConnector and FileStreamSinkConnector should implement KIP-875 APIs

2023-06-26 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15121:
--

 Summary: FileStreamSourceConnector and FileStreamSinkConnector 
should implement KIP-875 APIs
 Key: KAFKA-15121
 URL: https://issues.apache.org/jira/browse/KAFKA-15121
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


[https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
 introduced the new SourceConnector::alterOffsets and 
SinkConnector::alterOffsets APIs. The FileStreamSourceConnector and 
FileStreamSinkConnector should implement these new methods to improve the user 
experience when modifying offsets for these connectors and also to serve as an 
example for other connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15113) Gracefully handle cases where a sink connector's admin and consumer client config overrides target different Kafka clusters

2023-06-22 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15113:
--

 Summary: Gracefully handle cases where a sink connector's admin 
and consumer client config overrides target different Kafka clusters
 Key: KAFKA-15113
 URL: https://issues.apache.org/jira/browse/KAFKA-15113
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Reporter: Yash Mayya


Background reading -
 * 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]
 
 * 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
 

 

>From [https://github.com/apache/kafka/pull/13434#discussion_r1144415671] -
{quote}Currently, admin clients are only instantiated for sink connectors to 
create the DLQ topic if required. So it seems like it could be technically 
possible for a sink connector's consumer client overrides to target a different 
Kafka cluster from its producer and admin client overrides. Such a setup won't 
work with this implementation of the get offsets API as it is using an admin 
client to get a sink connector's consumer group offsets. However, I'm not sure 
we want to use a consumer client to retrieve the offsets either as we shouldn't 
be disrupting the existing sink tasks' consumer group just to fetch offsets. 
Leveraging a sink task's consumer also isn't an option because fetching offsets 
for a stopped sink connector (where all the tasks will be stopped) should be 
allowed. I'm wondering if we should document that a connector's various client 
config override policies shouldn't target different Kafka clusters (side note - 
looks like we don't [currently 
document|https://kafka.apache.org/documentation/#connect] client config 
overrides for Connect beyond just the worker property 
{{{}connector.client.config.override.policy{}}}).
{quote}
 
{quote}I don't think we need to worry too much about this. I cannot imagine a 
sane use case that involves overriding a connector's Kafka clients with 
different Kafka clusters (not just bootstrap servers, but actually different 
clusters) for producer/consumer/admin. I'd be fine with adding a note to our 
docs that that kind of setup isn't supported but I really, really hope that 
it's not necessary and nobody's trying to do that in the first place. I also 
suspect that there are other places where this might cause issues, like with 
exactly-once source support or automatic topic creation for source connectors.

That said, there is a different case we may want to consider: someone may have 
configured consumer overrides for a sink connector, but not admin overrides. 
This may happen if they don't use a DLQ topic. I don't know if we absolutely 
need to handle this now and we may consider filing a follow-up ticket to look 
into this, but one quick-and-dirty thought I've had is to configure the admin 
client used here with a combination of the configurations for the connector's 
admin client and its consumer, giving precedent to the latter.
{quote}
 

Also from [https://github.com/apache/kafka/pull/13818#discussion_r1224138055] -
{quote}We will have undesirable behavior if the connector is targeting a Kafka 
cluster different from the Connect cluster's backing Kafka cluster and the user 
has configured the consumer overrides appropriately for their connector, but 
not the admin overrides (something we also discussed previously 
[here|https://github.com/apache/kafka/pull/13434#discussion_r1144415671]).

In the above case, if a user attempts to reset their sink connector's offsets 
via the {{DELETE /connectors/\{connector}/offsets}} endpoint, the following 
will occur:
 # We list the consumer group offsets via {{Admin::listConsumerGroupOffsets}} 
which returns an empty partition offsets map for the sink connector's consumer 
group ID (it exists on a different Kafka cluster to the one that the admin 
client is connecting to).
 # We call {{SinkConnector::alterOffsets}} with an empty offsets map which 
could cause the sink connector to propagate the offsets reset related changes 
to the sink system.
 # We attempt to delete the consumer group via {{Admin::deleteConsumerGroups}} 
which returns {{GroupIdNotFoundException}} which we essentially swallow in 
order to keep offsets reset operations idempotent and return a success message 
to the user (even though the real consumer group for the sink connector on the 
other Kafka cluster hasn't been deleted).

This will occur if the connector's admin overrides are missing OR the admin 
overrides are deliberately configured to target a Kafka cluster different from 
the consumer overrides (although like you pointed out in the other linked 
thread, this doesn't seem like a valid use case that we'd even want to support).

I guess we'd want to pursue the approach you suggested where we'd configure the 
admin client with a combination of the connector's admin overri

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-06-21 Thread Yash Mayya
opers
> will probably have to be aware of at least the converter interface, some of
> the available implementations of it, and some details of Kafka clients
> (e.g., consumer groups for sink connectors). And this isn't a bad
> thing--it's unlikely that someone will write a Kafka connector without
> having or benefitting from some understanding of Kafka and the steps of the
> data pipeline that it will be a part of.
>
> Bringing this to the practical topic of discussion--transformations--I
> think it's actually in everyone's best interests for connector developers
> to be aware of transformations. This isn't just because of the specific
> problem that the KIP is trying to address. It's because there's plenty of
> logic that can be implemented via SMT that a naive connector developer will
> think that they have to implement on their own, which will ultimately lead
> to a sub-par experience for people who end up using those connectors due to
> inconsistent semantics (especially lack of predicates), inconsistent
> configuration syntax, increased chances for bugs, and FUD ("why wasn't this
> implemented as an SMT?").
>
> Finally, although preserving clean, composable interfaces that can be
> understood in isolation is a great principle to start with, we are now in
> what Anna McDonald recently referred to as "brownfield" space for Connect.
> We can't go back in time and redesign the SMT interface/contracts to make
> things cleaner. And I don't think it's fair to anyone to suddenly drop
> support for SMTs that mutate t/p/o information for sink records, especially
> since these can be used gainfully with plenty of existing sink connectors.
>
> Ultimately I still think the path forward that's best for the users is to
> make the impossible possible by addressing this long-standing API gap in
> Connect. Yes, it adds to the cognitive burden for connector developers, but
> if they can tolerate it, the end result is better for everyone involved,
> and if they can't, it's likely that the end result will be a preservation
> of existing behavior, which leaves us no worse than before.
>
>
> With all that said, I've thought about how to minimize or at least hide the
> API changes as much as possible. I've had two thoughts:
>
> 1. On the
> SinkRecord::originalTopic/originalKafkaPartition/originalKafkaOffset front,
> we could replace these methods with headers that the Connect runtime
> automatically injects into records directly before dispatching them to
> SinkTask::put. The names can be the proposed method names (e.g.,
> "originalTopic"). I believe this is inferior to the current proposal and
> should be a rejected alternative, but it at least seemed worth floating in
> the name of compromise. I dislike this approach for two reasons: first, it
> seems even less intuitive, and second, it doesn't come with the benefit of
> encouraging connector developers to understand the SMT interface and take
> it into account when designing connectors.
>
> 2. Although I'd hate to see the same bookkeeping logic implemented in
> multiple connectors, we can consider eliminating the overridden
> SinkTask::open/close methods. A note should be added to both methods
> clarifying that they are only invoked with the original, pre-transform
> topic partitions, and developers will be on their own if they want to deal
> with post-transform topic partitions instead. I'm on the fence with this
> one, but if it's a choice between passing this KIP without modifying
> SinkTask::open/close, or letting the KIP go dormant, I'd happily choose the
> former.
>
> Thanks Yash and Greg for the discussion so far, and apologies for the wall
> of text. Looking forward to your thoughts.
>
> Cheers,
>
> Chris
>
> [1] -
>
> https://kafka.apache.org/34/javadoc/org/apache/kafka/connect/source/SourceTask.html#commitRecord(org.apache.kafka.connect.source.SourceRecord,org.apache.kafka.clients.producer.RecordMetadata)
>
> On Sun, Apr 23, 2023 at 11:20 AM Yash Mayya  wrote:
>
> > Hi Greg,
> >
> > Thanks for the response and sorry for the late reply.
> >
> > > Currently the AK tests have a lot of calls to, for example, new
> > > SinkRecord(String topic, int partition, Schema keySchema,
> > > Object key, Schema valueSchema, Object value, long kafkaOffset)
> > > , a constructor without the original T/P/O values. I assumed that for
> > > backwards compatibility these constructors would still be usable in
> > > new runtimes. I imagine that there are also tests in downstream
> projects
> > > which make use of these constructors, whenever a Transform, Predicate,
> > > or Task is tested without a corresponding Converter. My question was
> > > abo

Re: [ANNOUNCE] New committer: Divij Vaidya

2023-06-13 Thread Yash Mayya
Congratulations Divij!

On Tue, Jun 13, 2023 at 9:20 PM Bruno Cadonna  wrote:

> Hi all,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer
> Divij Vaidya.
>
> Divij's major contributions are:
>
> GDPR compliance enforcement of kafka-site -
> https://issues.apache.org/jira/browse/KAFKA-13868
>
> Performance improvements:
>
> Improve performance of VarInt encoding and decoding -
> https://github.com/apache/kafka/pull/13312
>
> Reduce data copy & buffer allocation during decompression -
> https://github.com/apache/kafka/pull/13135
>
> He also was heavily involved in the migration to Mockito.
>
> Furthermore, Divij is very active on the mailing lists as well as in
> maintaining and reviewing pull requests.
>
> Congratulations, Divij!
>
> Thanks,
>
> Bruno (on behalf of the Apache Kafka PMC)
>
>
>


Re: [VOTE] 3.4.1 RC3

2023-06-06 Thread Yash Mayya
Hi Prem,

You can follow the instructions here if you wish to unsubscribe from one or
more mailing lists - https://kafka.apache.org/contact

On Tue, Jun 6, 2023 at 2:44 PM Prem Sagar 
wrote:

> Please remove my mail ID from the database.
>
>
> Warm Regards
> K Prem Sagar
> Sr.Manager - Procurements
>  M: + 91 - 9100939886
> p...@pidatacenters.com
>  Amaravati | Bengaluru | Chennai | Delhi | Hyderabad | Kochi | Mumbai
> 
>
> <
> https://pidatacenters.com/news/indias-best-multi-tenant-data-center-service-provider-dcd/
> >
> 
> 
> 
>  
>
>
> On Mon, Jun 5, 2023 at 2:47 PM Josep Prat 
> wrote:
>
> > Hi Luke,
> >
> > Thanks a lot for the patience you had for this release!
> >
> > @Prem you are probably subscribed to either the user or dev mailing list
> > for Apache Kafka, this is why you are receiving these emails.
> >
> > Best,
> >
> > On Mon, Jun 5, 2023 at 10:32 AM Prem Sagar  > .invalid>
> > wrote:
> >
> > > Why this mail is marked to me ?
> > >
> > >
> > > Warm Regards
> > > K Prem Sagar
> > > Sr.Manager - Procurements
> > >  M: + 91 - 9100939886
> > > p...@pidatacenters.com
> > >  Amaravati | Bengaluru | Chennai | Delhi | Hyderabad | Kochi | Mumbai
> > > 
> > >
> > > <
> > >
> >
> https://pidatacenters.com/news/indias-best-multi-tenant-data-center-service-provider-dcd/
> > > >
> > > 
> > > 
> > > 
> > >  
> > >
> > >
> > > On Mon, Jun 5, 2023 at 1:47 PM Luke Chen  wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > Thanks for the vote.
> > > > I've re-run the 3.4 jenkins build, and the
> > > > `DynamicBrokerReconfigurationTest.testAdvertisedListenerUpdate` test
> > > still
> > > > pass.
> > > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/142/
> > > >
> > > > And now, I've got:
> > > > Binding +1 PMC votes:
> > > > * Chris Egerton
> > > > * Mickael Maison
> > > > * Tom Bentley
> > > >
> > > > Non-binding votes:
> > > > * Federico Valeri
> > > > * Jakub Scholz
> > > > * Josep Prat
> > > >
> > > > I will close this vote thread and go ahead to complete the release
> > > process.
> > > >
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Fri, Jun 2, 2023 at 5:06 PM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Tom,
> > > > > it failed for me a couple of times, I rebooted and things suddenly
> > > > worked.
> > > > > So maybe there was a dangling process holding a port from previous
> > test
> > > > > failures.
> > > > >
> > > > > On Fri, Jun 2, 2023 at 10:52 AM Tom Bentley 
> > > wrote:
> > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > Thanks for running the release.
> > > > > >
> > > > > > I've checked signatures, eyeballed the Javadocs, built from
> source
> > > and
> > > > > run
> > > > > > the unit and integration tests.
> > > > > > DynamicBrokerReconfigurationTest.testAdvertisedListenerUpdate
> fails
> > > for
> > > > > me
> > > > > > repeatedly. I opened
> > > https://issues.apache.org/jira/browse/KAFKA-15049
> > > > > for
> > > > > > it since I couldn't find an existing issue for this one. I note
> > that
> > > > > others
> > > > > > seem to have run the integration tests without problems, so I
> don't
> > > > think
> > > > > > this is a blocker. I also did the Kafka, Connect and Streams
> > > > quickstarts.
> > > > > >
> > > > > > +1 binding.
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, 1 Jun 2023 at 08:46, Luke Chen 
> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks to everyone who has tested and voted for the RC3 so far!
> > > > > > > Currently, I've got 2 binding votes and 3 non-binding votes:
> > > > > > >
> > > > > > > Binding +1 PMC votes:
> > > > > > > * Chris Egerton
> > > > > > > * Mickael Maison
> > > > > > >
> > > > > > > Non-binding votes:
> > > > > > > * Federico Valeri
> > > > > > > * Jakub Scholz
> > > > > > > * Josep Prat
> > > > > > >
> > > > > > > If anyone is available (especially PMC members :)), please help
> > > > verify
> > > > > > the
> > > > > > > RC build.
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Wed, May 31, 2023 at 1:53 AM Chris Egerton
> > > >  > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Luke,
> > > > > > > >
> > > > > > > > Many thanks for your continued work on this release!
> > > > > > > >
> > > > > > > > To verify, I:
> > > > > > > > - Built from source using Java 11 with both:
> > > > > > > > - - the 3.4.1-rc3 tag on GitHub
> > > > > > > > - - the kafka-3.4.1-src.tgz artifact from
> > > > > > > > 

[jira] [Resolved] (KAFKA-14956) Flaky test org.apache.kafka.connect.integration.OffsetsApiIntegrationTest#testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted

2023-05-29 Thread Yash Mayya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yash Mayya resolved KAFKA-14956.

Resolution: Fixed

> Flaky test 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest#testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted
> --
>
> Key: KAFKA-14956
> URL: https://issues.apache.org/jira/browse/KAFKA-14956
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Sagar Rao
>Assignee: Yash Mayya
>Priority: Major
>  Labels: flaky-test
>
> ```
> h4. Error
> org.opentest4j.AssertionFailedError: Condition not met within timeout 15000. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
> h4. Stacktrace
> org.opentest4j.AssertionFailedError: Condition not met within timeout 15000. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
>  at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>  at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
>  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
>  at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211)
>  at 
> app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)
>  at 
> app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:291)
>  at 
> app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.getAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:150)
>  at 
> app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted(OffsetsApiIntegrationTest.java:131)
>  at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
>  at 
> app//org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> app//org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> app//org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> app//org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> app//org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> app//org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at app//org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>  at 
> app//org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at app//org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>  at 
> app//org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> app//org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at app//org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>  at app//org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>  at app//org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>  at app//org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>  at app//org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>  at app//org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>  at app//org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:108)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
>  at 
> org.gradle.api.interna

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-24 Thread Yash Mayya
Hi Justine,

Thanks for the response. Non-committers don't have Apache accounts; are you
suggesting that there wasn't a need to sign in earlier and a change in this
policy is restricting collaborators from triggering Jenkins builds?

Thanks,
Yash

On Wed, May 24, 2023 at 9:30 PM Justine Olshan 
wrote:

> Yash,
>
> When I rebuild, I go to the CloudBees CI page and I have to log in with my
> apache account.
> Not sure if the change in the build system or the need to sign in is part
> of the problem.
>
>
> On Wed, May 24, 2023 at 4:54 AM Federico Valeri 
> wrote:
>
> > +1 on Divij suggestions
> >
> >
> > On Wed, May 24, 2023 at 12:04 PM Divij Vaidya 
> > wrote:
> > >
> > > Hey folks
> > >
> > > A week into this experiment, I am finding the ability to add labels,
> > > request for reviewers and ability to close PRs very useful.
> > >
> > > 1. May I suggest an improvement to the process by requesting for some
> > > guidance on the interest areas for various committers. This would help
> us
> > > request for reviews from the right set of individuals.
> > > As a reference, we have tried something similar with Apache TinkerPop
> > (see
> > > TinkerPop Contributors section at the end) [1], where the committers
> self
> > > identify their preferred area of interest.
> > >
> > > 2. I would also request creation of the following new labels:
> > > tiered-storage, transactions, security, refactor, zk-migration,
> > > first-contribution (so that we can prioritize reviews for first time
> > > contributors as an encouragement), build, metrics
> > >
> > > [1] https://tinkerpop.apache.org/
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Mon, May 15, 2023 at 11:07 PM John Roesler 
> > wrote:
> > >
> > > > Hello again, all,
> > > >
> > > > Just a quick update: after merging the changes to asf.yaml, I
> received
> > a
> > > > notification that the list is limited to only 10 people, not 20 as
> the
> > > > documentation states.
> > > >
> > > > Here is the list of folks who will now be able to triage PRs and
> > trigger
> > > > builds: Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash
> > > > Mayya, Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and
> > andymg3
> > > >
> > > > Thanks all,
> > > > -John
> > > >
> > > > On 2023/05/12 15:53:40 John Roesler wrote:
> > > > > Thanks again for bringing this up, David!
> > > > >
> > > > > As an update to the community, the PMC has approved a process to
> make
> > > > use of this feature.
> > > > >
> > > > > Here are the relevant updates:
> > > > >
> > > > > PR to add the policy:
> https://github.com/apache/kafka-site/pull/510
> > > > >
> > > > > PR to update the list: https://github.com/apache/kafka/pull/13713
> > > > >
> > > > > Ticket to automate this process.. Contributions welcome :)
> > > > https://issues.apache.org/jira/browse/KAFKA-14995
> > > > >
> > > > > And to make sure it doesn't fall through the cracks in the mean
> time,
> > > > here's the release process step:
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
> > > > >
> > > > > Unfortunately, the "collaborator" feature only allows 20 usernames,
> > so
> > > > we have decided to simply take the top 20 non-committer authors from
> > the
> > > > past year (according to git shortlog). Congratulations to our new
> > > > collaborators!
> > > > >
> > > > > Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash
> Mayya,
> > > > Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and andymg3
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > > On 2023/04/27 18:45:09 David Arthur wrote:
> > > > > > Hey folks,
> > > > > >
> > > > > > I stumbled across this wiki page from the infra team that
> > describes the
> > > > > > various features supported in the ".asf.yaml" file:
> > > > > >
> > > >
> >
> https:/

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-24 Thread Yash Mayya
Hey folks,

Are there any additional steps required in order to be able to trigger
Jenkins builds on pull requests? The seemingly relevant link in the
asf.yaml file appears to be a dead one -
https://github.com/apache/kafka/blob/5b3b385881d5518ef1b2c63cb55244cf80a80da2/.asf.yaml#L25
(else it's a permissions issue).

Thanks,
Yash

On Wed, May 24, 2023 at 3:35 PM Divij Vaidya 
wrote:

> Hey folks
>
> A week into this experiment, I am finding the ability to add labels,
> request for reviewers and ability to close PRs very useful.
>
> 1. May I suggest an improvement to the process by requesting for some
> guidance on the interest areas for various committers. This would help us
> request for reviews from the right set of individuals.
> As a reference, we have tried something similar with Apache TinkerPop (see
> TinkerPop Contributors section at the end) [1], where the committers self
> identify their preferred area of interest.
>
> 2. I would also request creation of the following new labels:
> tiered-storage, transactions, security, refactor, zk-migration,
> first-contribution (so that we can prioritize reviews for first time
> contributors as an encouragement), build, metrics
>
> [1] https://tinkerpop.apache.org/
>
> --
> Divij Vaidya
>
>
>
> On Mon, May 15, 2023 at 11:07 PM John Roesler  wrote:
>
> > Hello again, all,
> >
> > Just a quick update: after merging the changes to asf.yaml, I received a
> > notification that the list is limited to only 10 people, not 20 as the
> > documentation states.
> >
> > Here is the list of folks who will now be able to triage PRs and trigger
> > builds: Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash
> > Mayya, Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and
> andymg3
> >
> > Thanks all,
> > -John
> >
> > On 2023/05/12 15:53:40 John Roesler wrote:
> > > Thanks again for bringing this up, David!
> > >
> > > As an update to the community, the PMC has approved a process to make
> > use of this feature.
> > >
> > > Here are the relevant updates:
> > >
> > > PR to add the policy: https://github.com/apache/kafka-site/pull/510
> > >
> > > PR to update the list: https://github.com/apache/kafka/pull/13713
> > >
> > > Ticket to automate this process.. Contributions welcome :)
> > https://issues.apache.org/jira/browse/KAFKA-14995
> > >
> > > And to make sure it doesn't fall through the cracks in the mean time,
> > here's the release process step:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
> > >
> > > Unfortunately, the "collaborator" feature only allows 20 usernames, so
> > we have decided to simply take the top 20 non-committer authors from the
> > past year (according to git shortlog). Congratulations to our new
> > collaborators!
> > >
> > > Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya,
> > Philip Nee, vamossagar12, Christo Lolov, Federico Valeri, and andymg3
> > >
> > > Thanks,
> > > -John
> > >
> > > On 2023/04/27 18:45:09 David Arthur wrote:
> > > > Hey folks,
> > > >
> > > > I stumbled across this wiki page from the infra team that describes
> the
> > > > various features supported in the ".asf.yaml" file:
> > > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> > > >
> > > > One section that looked particularly interesting was
> > > >
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> > > >
> > > > github:
> > > >   collaborators:
> > > > - userA
> > > > - userB
> > > >
> > > > This would allow us to define non-committers as collaborators on the
> > Github
> > > > project. Concretely, this means they would receive the "triage"
> Github
> > role
> > > > (defined here
> > > >
> >
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role
> > ).
> > > > Practically, this means we could let non-committers do things like
> > assign
> > > > labels and reviewers on Pull Requests.
> > > >
> > > > I wanted to see what the committer group thought about this feature.
> I
> > > > think it could be useful.
> > > >
> > > > Cheers,
> > > > David
> > > >
> > >
> >
>


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-12 Thread Yash Mayya
Hey folks,

Thanks for driving this initiative! I think the ability to assign reviewers
/ apply labels to PRs and re-trigger Jenkins builds is really useful and
will also allow us to help out the community a bit more.

Thanks,
Yash

On Fri, May 12, 2023 at 9:24 PM John Roesler  wrote:

> Thanks again for bringing this up, David!
>
> As an update to the community, the PMC has approved a process to make use
> of this feature.
>
> Here are the relevant updates:
>
> PR to add the policy: https://github.com/apache/kafka-site/pull/510
>
> PR to update the list: https://github.com/apache/kafka/pull/13713
>
> Ticket to automate this process.. Contributions welcome :)
> https://issues.apache.org/jira/browse/KAFKA-14995
>
> And to make sure it doesn't fall through the cracks in the mean time,
> here's the release process step:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
>
> Unfortunately, the "collaborator" feature only allows 20 usernames, so we
> have decided to simply take the top 20 non-committer authors from the past
> year (according to git shortlog). Congratulations to our new collaborators!
>
> Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya,
> Philip Nee, vamossagar12,, Christo Lolov, Federico Valeri, andymg3,
> RivenSun, Kirk True, Matthew de Detrich, Akhilesh C, Alyssa Huang, Artem
> Livshits, Gantigmaa Selenge, Hao Li, Niket, and hudeqi
>
> Thanks,
> -John
>
> On 2023/04/27 18:45:09 David Arthur wrote:
> > Hey folks,
> >
> > I stumbled across this wiki page from the infra team that describes the
> > various features supported in the ".asf.yaml" file:
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> >
> > One section that looked particularly interesting was
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> >
> > github:
> >   collaborators:
> > - userA
> > - userB
> >
> > This would allow us to define non-committers as collaborators on the
> Github
> > project. Concretely, this means they would receive the "triage" Github
> role
> > (defined here
> >
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role
> ).
> > Practically, this means we could let non-committers do things like assign
> > labels and reviewers on Pull Requests.
> >
> > I wanted to see what the committer group thought about this feature. I
> > think it could be useful.
> >
> > Cheers,
> > David
> >
>


[jira] [Created] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-08 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14974:
--

 Summary: Restore backward compatibility in KafkaBasedLog
 Key: KAFKA-14974
 URL: https://issues.apache.org/jira/browse/KAFKA-14974
 Project: Kafka
  Issue Type: Task
Reporter: Yash Mayya
Assignee: Yash Mayya


{{KafkaBasedLog}} is a widely used utility class that provides a generic 
implementation of a shared, compacted log of records in a Kafka topic. It isn't 
in Connect's public API, but has been used outside of Connect and we try to 
preserve backward compatibility whenever possible. 
https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
change is source compatible, it isn't binary compatible. We can restore 
backward compatibility simply by re-instating the older send methods, and 
renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14933) Document Kafka Connect's log level REST APIs added in KIP-495

2023-04-25 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14933:
--

 Summary: Document Kafka Connect's log level REST APIs added in 
KIP-495
 Key: KAFKA-14933
 URL: https://issues.apache.org/jira/browse/KAFKA-14933
 Project: Kafka
  Issue Type: Task
  Components: documentation, KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


[KIP-495|https://cwiki.apache.org/confluence/display/KAFKA/KIP-495%3A+Dynamically+Adjust+Log+Levels+in+Connect]
 added 3 REST APIs to allow dynamically adjusting log levels on Kafka Connect 
workers. This was added a long time ago (released in AK 2.4.0) but was never 
publicly documented. These REST APIs should be documented in 
[https://kafka.apache.org/documentation/#connect_rest]. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New PMC chair: Mickael Maison

2023-04-24 Thread Yash Mayya
Congratulations Mickael!

On Fri, Apr 21, 2023 at 8:39 PM Jun Rao  wrote:

> Hi, everyone,
>
> After more than 10 years, I am stepping down as the PMC chair of Apache
> Kafka. We now have a new chair Mickael Maison, who has been a PMC member
> since 2020. I plan to continue to contribute to Apache Kafka myself.
>
> Congratulations, Mickael!
>
> Jun
>


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-04-24 Thread Yash Mayya
Hi Sagar,

Thanks for the KIP! I have a few questions and comments:

1) I agree with Chris' point about the separation of a connector heartbeat
mechanism and allowing source connectors to generate offsets without
producing data. What is the purpose of the heartbeat topic here and are
there any concrete use cases for downstream consumers on this topic? Why
can't we instead simply introduce a mechanism to retrieve a list of source
partition / source offset pairs from the source tasks?

2) With the currently described mechanism, the new
"SourceTask::produceHeartbeatRecords" method returns a "List"
- what happens with the topic in each of these source records? Chris
pointed this out above, but it doesn't seem to have been addressed? The
"SourceRecord" class also has a bunch of other fields which will be
irrelevant here (partition, key / value schema, key / value data,
timestamp, headers). In fact, it seems like only the source partition and
source offset are relevant here, so we should either introduce a new
abstraction or simply use a data structure like a mapping from source
partitions to source offsets (adds to the above point)?

3) I'm not sure I fully follow why the heartbeat timer / interval is
needed? What are the downsides of
calling "SourceTask::produceHeartbeatRecords" in every execution loop
(similar to the existing "SourceTask::poll" method)? Is this only to
prevent the generation of a lot of offset records? Since Connect's offsets
topics are log compacted (and source partitions are used as keys for each
source offset), I'm not sure if such concerns are valid and such a
heartbeat timer / interval mechanism is required?

4) The first couple of rejected alternatives state that the use of a null
topic / key / value are preferably avoided - but the current proposal would
also likely require connectors to use such workarounds (null topic when the
heartbeat topic is configured at a worker level and always for the key /
value)?

5) The third rejected alternative talks about subclassing the
"SourceRecord" class - this presumably means allowing connectors to pass
special offset only records via the existing poll mechanism? Why was this
considered a more invasive option? Was it because of the backward
compatibility issues that would be introduced for plugins using the new
public API class that still need to be deployed onto older Connect workers?

Thanks,
Yash

On Fri, Apr 14, 2023 at 6:45 PM Sagar  wrote:

> One thing I forgot to mention in my previous email was that the reason I
> chose to include the opt-in behaviour via configs was that the users of the
> connector know their workload patterns. If the workload is such that the
>  connector would receive regular valid updates then there’s ideally no need
> for moving offsets since it would update automatically.
>
> This way they aren’t forced to use this feature and can use it only when
> the workload is expected to be batchy or not frequent.
>
> Thanks!
> Sagar.
>
>
> On Fri, 14 Apr 2023 at 5:32 PM, Sagar  wrote:
>
> > Hi Chris,
> >
> > Thanks for following up on the response. Sharing my thoughts further:
> >
> > If we want to add support for connectors to emit offsets without
> >> accompanying source records, we could (and IMO should) do that without
> >> requiring users to manually enable that feature by adjusting worker or
> >> connector configurations.
> >
> >
> > With the current KIP design, I have tried to implement this in an opt-in
> > manner via configs. I guess what you are trying to say is that this
> doesn't
> > need a config of it's own and instead could be part of the poll ->
> > transform etc -> produce -> commit cycle. That way, the users don't need
> to
> > set any config and if the connector supports moving offsets w/o producing
> > SourceRecords, it should happen automatically. Is that correct? If that
> > is the concern, then I can think of not exposing a config and try to make
> > this process automatically. That should ease the load on connector users,
> > but your point about cognitive load on Connector developers, I am still
> not
> > sure how to address that. The offsets are privy to a connector and the
> > framework at best can provide hooks to the tasks to update their offsets.
> > Connector developers would still have to consider all cases before
> updating
> > offsets.  And if I ignore the heartbeat topic and heartbeat interval ms
> > configs, then what the KIP proposes currently isn't much different in
> that
> > regard. Just that it produces a List of SourceRecord which can be changed
> > to a Map of SourcePartition and their offsets if you think that would
> > simplify things. Are there other cases in your mind which need
> addressing?
> >
> > Here's my take on the usecases:
> >
> >1. Regarding the example about SMTs with Object Storage based
> >connectors, it was one of the scenarios identified. We have some
> connectors
> >that rely on the offsets topic to check if the next batch of files
> should
> >be 

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-04-23 Thread Yash Mayya
tes see from the original T/P/O methods?
> If you inject the original T/P/O only before and after the chain, SMTs
> after an SMT which changes the original T/P/O will see whatever the earlier
> SMT emitted. Is this intentional, or should this be avoided?
> For existing SMTs use the SinkRecord constructor, either directly or via
> subclasses of ConnectRecord, they will drop the original T/P/O and fall
> back to the logic from question (1).
>
> > The rejected alternative basically says that we can't do a
> deterministic mapping from virtual coordinates to physical coordinates
> without doing a lot of book-keeping.
>
> I suppose there is a possible implementation of metadata book-keeping which
> provides a reasonable system of virtual coordinates, it just ended up
> equivalent to hydrating intermediate topics to compute a consistent record
> ordering. I wasn't convinced by calling it "book-keeping" since i've seen
> that phrase used to disregard much less complicated state management, and
> had to see exactly where that solution becomes unreasonable.
>
> Thanks,
> Greg
>
> On Sun, Mar 12, 2023 at 6:30 AM Yash Mayya  wrote:
>
> > Hi Greg,
> >
> > Thanks for the detailed review!
> >
> > > What is the expected state/behavior for SinkRecords
> > > which do not have original T/P/O information after the
> > > upgrade? Just browsing, it appears that tests make
> > > extensive use of the existing public SinkRecord
> > > constructors  for both Transformations and Connectors.
> >
> > I'm not sure I follow - are you asking about how the tests will be
> updated
> > post this change or about how upgrades will look like for clusters in
> > production? For the latter, we won't have to worry about sink records
> > without original T/P/O information at all once a cluster is fully rolled
> > and we will make it (hopefully) abundantly clear that connectors need to
> > account for missing original T/P/O getter methods if they expect to be
> > deployed on older Connect runtimes.
> >
> > > What is the expected behavior for Transformation
> > > implementations which do not use the newRecord
> > > methods and instead use public SinkRecord constructors?
> > > The KIP mentions this as a justification for the
> > > originalKafkaOffset method, but if existing implementations
> > > are using the existing constructors, those constructors won't
> > > forward the original T/P/O information to later transforms or
> > > the task.
> >
> > There shouldn't be any difference in behavior here - the framework will
> add
> > the original T/P/O metadata to the record after the entire transformation
> > chain has been applied and just before sending the record to the task for
> > processing. The KIP doesn't propose that transformations themselves
> should
> > also be able to retrieve original T/P/O information for a sink record.
> >
> > > This reasoning and the KIP design seems to imply that the
> > > connector is better equipped to solve this problem than the
> > > framework, but the stated reasons are not convincing for me.
> >
> > This was added to the KIP by the original author, but I don't think the
> > intention was to imply that the connector is better equipped to solve
> this
> > problem than the framework. The intention is to provide complete
> > information to the connector ("physical" and "virtual coordinates"
> instead
> > of the currently incomplete "virtual coordinates" as you've termed it) so
> > that connectors can use the virtual coordinates for writing data to the
> > sink system and physical coordinates for offset reporting back to the
> > framework. The rejected alternative basically says that we can't do a
> > deterministic mapping from virtual coordinates to physical coordinates
> > without doing a lot of book-keeping.
> >
> > I agree with the rest of your analysis on the tradeoffs between the
> > proposed approach versus the seemingly more attractive approach of
> handling
> > everything purely in the framework and only exposing "virtual
> coordinates"
> > to the connectors. I think the biggest thorn here is maintaining backward
> > compatibility with the considerable ecosystem of existing connectors
> which
> > is something Connect has always been burdened by.
> >
> > Thanks,
> > Yash
> >
> > On Wed, Mar 8, 2023 at 6:54 AM Greg Harris  >
> > wrote:
> >
> > > Hi Yash,
> > >
> > > I always use this issue as an example of a bug being caused by design

[jira] [Created] (KAFKA-14910) Consider cancelling ongoing alter connector offsets requests when the connector is resumed

2023-04-14 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14910:
--

 Summary: Consider cancelling ongoing alter connector offsets 
requests when the connector is resumed
 Key: KAFKA-14910
 URL: https://issues.apache.org/jira/browse/KAFKA-14910
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya


See discussion here for more details - 
[https://github.com/apache/kafka/pull/13465#discussion_r1164465874]

The implementation for the _*PATCH /connectors/\{connector}/offsets*_ and 
_*DELETE /connectors/\{connector}/offsets*_ APIs is completely asynchronous and 
the check for whether the connector is stopped will only be made at the 
beginning of the request. 

If the connector is resumed while the alter / reset offsets request is being 
processed, this can lead to certain issues (especially with non-EoS source 
connectors). For sink connectors, admin client requests to alter / reset 
offsets for a consumer group will be rejected if the consumer group is active 
(i.e. when the connector tasks come up). For source connectors when exactly 
once support is enabled on the worker, we do a round of zombie fencing before 
the tasks are brought up and this will basically disable the transactional 
producer used to alter offsets (the transactional producer uses the 
transactional ID for task 0 of the connector). However, for source connectors 
when exactly once support is not enabled on the worker (this is the default), 
there are no such safeguards. We could potentially add some interruption logic 
that cancels ongoing alter / reset offset requests when a connector is resumed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-875: First-class offsets support in Kafka Connect

2023-04-12 Thread Yash Mayya
Hi Chris,

Thanks, that makes sense - I hadn't considered the case where the worker
itself becomes a zombie.

Thanks,
Yash

On Wed, Apr 12, 2023 at 10:40 PM Chris Egerton 
wrote:

> Hi Yash,
>
> Great, we can use the transactional ID for task zero for now 
>
> As far as why: we'd need to fence out that producer in the event that tasks
> for the connector are brought up while the alter offsets request is still
> taking place, since we'd want to make sure that the offsets aren't altered
> after tasks are brought up. I think it may be possible right now in
> extremely niche circumstances where the worker servicing the alter offsets
> request loses leadership of the cluster. More realistically, I think it
> leaves room for tweaking the logic for handling these requests to not use a
> five-second timeout in the future (we could potentially just
> fire-and-forget the request and, if new tasks are started for the connector
> in the meantime, trust that the request will get automatically fenced out
> without doing any more work).
>
> Cheers,
>
> Chris
>
> On Wed, Apr 12, 2023 at 1:01 PM Yash Mayya  wrote:
>
> > Hi Chris and Greg,
> >
> > The current implementation does already use the transactional ID for
> task 0
> > so no complaints from me. Although I'm not sure I follow the concerns
> w.r.t
> > zombie fencing? In which cases would we need to fence out the
> transactional
> > producer instantiated for altering offsets?
> >
> > Thanks,
> > Yash
> >
> > On Wed, Apr 12, 2023 at 9:02 PM Chris Egerton  wrote:
> >
> > > Hi Greg,
> > >
> > > I hadn't considered the implications W/R/T zombie fencing. I agree that
> > > using the transactional ID for task 0 is better in that case.
> > >
> > > Yash (who is implementing this part, cc'd), does this seem reasonable
> to
> > > you?
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Apr 11, 2023 at 3:23 PM Greg Harris
>  > >
> > > wrote:
> > >
> > >> Chris & Yash,
> > >>
> > >> 1. Since the global offsets topic does not have transactions on it
> > >> already,
> > >> I don't think adding transactions just for these reset operations
> would
> > be
> > >> an improvement. The transactional produce would not exclude other
> > >> non-transactional producers, but hanging transactions on the global
> > >> offsets
> > >> topic would negatively impact the general cluster health. Your
> proposed
> > >> strategy seems reasonable to me.
> > >>
> > >> 2. While it may be the connector performing the offset reset and not
> the
> > >> task, I think it would be preferable for the connector to use task 0's
> > >> task-id and 'impersonate' the task for the purpose of changing the
> > >> offsets.
> > >> I think the complication elsewhere (getting users to provide a new
> ACL,
> > >> expanding fencing to also fence the connector transaction id, etc) is
> > not
> > >> practically worth it to change 1 string value in the logs.
> > >> I would find a separate transaction ID beneficial if the connector
> could
> > >> be
> > >> given a different principal from the task, and be given distinct ACLs.
> > >> However, I don't think this is possible or desirable, and so I don't
> > think
> > >> it's relevant right now. Let me know if there are any other ways that
> > the
> > >> connector transaction ID would be useful.
> > >>
> > >> Thanks for all the effort on this feature!
> > >> Greg
> > >>
> > >> On Tue, Apr 11, 2023 at 7:52 AM Chris Egerton  >
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > A couple slight tweaks to the design have been proposed during
> > >> > implementation and I'd like to report them here to make sure that
> > >> they're
> > >> > acceptable to all who previously voted for this KIP. I've updated
> the
> > >> KIP
> > >> > to include these changes but will be happy to revert and/or amend if
> > >> there
> > >> > are any concerns.
> > >> >
> > >> > 1. We would like to refrain from using a transaction when resetting
> > >> source
> > >> > connector offsets in the worker's global offsets topic when
> > exactly-once
> > >> > support is enabled. We would 

Re: [VOTE] KIP-875: First-class offsets support in Kafka Connect

2023-04-12 Thread Yash Mayya
Hi Chris and Greg,

The current implementation does already use the transactional ID for task 0
so no complaints from me. Although I'm not sure I follow the concerns w.r.t
zombie fencing? In which cases would we need to fence out the transactional
producer instantiated for altering offsets?

Thanks,
Yash

On Wed, Apr 12, 2023 at 9:02 PM Chris Egerton  wrote:

> Hi Greg,
>
> I hadn't considered the implications W/R/T zombie fencing. I agree that
> using the transactional ID for task 0 is better in that case.
>
> Yash (who is implementing this part, cc'd), does this seem reasonable to
> you?
>
> Cheers,
>
> Chris
>
> On Tue, Apr 11, 2023 at 3:23 PM Greg Harris 
> wrote:
>
>> Chris & Yash,
>>
>> 1. Since the global offsets topic does not have transactions on it
>> already,
>> I don't think adding transactions just for these reset operations would be
>> an improvement. The transactional produce would not exclude other
>> non-transactional producers, but hanging transactions on the global
>> offsets
>> topic would negatively impact the general cluster health. Your proposed
>> strategy seems reasonable to me.
>>
>> 2. While it may be the connector performing the offset reset and not the
>> task, I think it would be preferable for the connector to use task 0's
>> task-id and 'impersonate' the task for the purpose of changing the
>> offsets.
>> I think the complication elsewhere (getting users to provide a new ACL,
>> expanding fencing to also fence the connector transaction id, etc) is not
>> practically worth it to change 1 string value in the logs.
>> I would find a separate transaction ID beneficial if the connector could
>> be
>> given a different principal from the task, and be given distinct ACLs.
>> However, I don't think this is possible or desirable, and so I don't think
>> it's relevant right now. Let me know if there are any other ways that the
>> connector transaction ID would be useful.
>>
>> Thanks for all the effort on this feature!
>> Greg
>>
>> On Tue, Apr 11, 2023 at 7:52 AM Chris Egerton 
>> wrote:
>>
>> > Hi all,
>> >
>> > A couple slight tweaks to the design have been proposed during
>> > implementation and I'd like to report them here to make sure that
>> they're
>> > acceptable to all who previously voted for this KIP. I've updated the
>> KIP
>> > to include these changes but will be happy to revert and/or amend if
>> there
>> > are any concerns.
>> >
>> > 1. We would like to refrain from using a transaction when resetting
>> source
>> > connector offsets in the worker's global offsets topic when exactly-once
>> > support is enabled. We would continue to use a transaction when
>> resetting
>> > offsets in the connector's offsets topic. Discussed in [1].
>> >
>> > 2. We would like to use a transactional ID of ${groupId}-${connector} to
>> > alter/reset source connector offsets when exactly-once support is
>> enabled,
>> > where ${groupId} is the group ID of the Connect cluster and
>> ${connector} is
>> > the name of the connector. This is raised here because it would
>> introduce
>> > an additional ACL requirement for this API. A less-elegant alternative
>> that
>> > would obviate the additional ACL requirement is to use the
>> transactional ID
>> > that would be used by task 0 of the connector, but this may be
>> confusing to
>> > users as it could indicate that the task is actually running. Discussed
>> in
>> > [2].
>> >
>> > [1] -
>> https://github.com/apache/kafka/pull/13465/#issuecomment-1486718538
>> > [2] -
>> https://github.com/apache/kafka/pull/13465/#discussion_r1159694956
>> >
>> > Cheers,
>> >
>> > Chris
>> >
>> > On Fri, Mar 3, 2023 at 10:22 AM Chris Egerton  wrote:
>> >
>> > > Hi all,
>> > >
>> > > Thanks for the votes! I'll cast a final +1 myself and close the vote
>> out.
>> > >
>> > > This KIP passes with the following +1 votes (and no +0 or -1 votes):
>> > >
>> > > • Greg Harris
>> > > • Yash Mayya
>> > > • Knowles Atchison Jr
>> > > • Mickael Maison (binding)
>> > > • Tom Bentley (binding)
>> > > • Josep Prat (binding)
>> > > • Chris Egerton (binding, author)
>> > >
>> > > I'll write up Jira tickets and begin implementing things next week.
>> > >
>> > > Ch

[jira] [Created] (KAFKA-14876) Public documentation for new Kafka Connect offset management REST APIs

2023-04-02 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14876:
--

 Summary: Public documentation for new Kafka Connect offset 
management REST APIs
 Key: KAFKA-14876
 URL: https://issues.apache.org/jira/browse/KAFKA-14876
 Project: Kafka
  Issue Type: Sub-task
Reporter: Yash Mayya
Assignee: Yash Mayya


Add public documentation for the 3 new Kafka Connect offset management REST 
APIs being introduced in 
[KIP-875:|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
 * *GET* /connectors/\{connector}/offsets
 * *PATCH* /connectors/\{connector}/offsets
 * *DELETE* /connectors/\{connector}/offsets)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14844) Kafka Connect's OffsetBackingStore interface should handle (de)serialization and connector namespacing

2023-03-24 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14844:
--

 Summary: Kafka Connect's OffsetBackingStore interface should 
handle (de)serialization and connector namespacing
 Key: KAFKA-14844
 URL: https://issues.apache.org/jira/browse/KAFKA-14844
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


Relevant discussion here - 
[https://github.com/apache/kafka/pull/13434/files#r114972]

 

TLDR - we should move serialization / deserialization and key construction 
(connector namespacing) for source connector offsets from the 
OffsetStorageWriter / OffsetStorageReader interfaces into the 
OffsetBackingStore interface. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

2023-03-19 Thread Yash Mayya
Hi Greg,

> Applying this to a hypothetical extension of timeouts
> to other endpoints: my concern was primarily about
> whether this design applied everywhere felt natural or
> awkward. In my opinion, a request parameter would
> draw a lot of outsized attention in documentation, and
> be the first parameter for a lot of the endpoints, while a
> header could be specified as a more obscure global
> modifier to all requests.

That's a fair point, I do agree that the choice of a request header here
seems more appropriate now.

> My chief concern was about the clarity of the API in the
> future when both configurable timeouts and asynchronous
>  APIs exist. Will the configurable timeouts still make sense,
>  and will they be a feature that people will use?

I do believe that there's a place for both of these to co-exist - there
will always be use-cases that require config validation errors to be
reported synchronously so that connector configurations can be corrected or
iterated upon. I don't really see there being valid use cases for having a
timeout of 20 or 60 minutes as you suggested, but I do think that the
genuine edge cases that currently exist requiring a config validation
timeout of greater than the current 90 seconds (2-3 minutes perhaps)
deserve a way to be handled until we have an asynchronous config validation
API (if we ever do go down that path).

I've updated the KIP to reflect the change to use a request header rather
than a query parameter to configure the timeout value.

Thanks,
Yash

On Tue, Mar 14, 2023 at 2:46 AM Greg Harris 
wrote:

> Yash,
>
> 1. I think that Request-Timeout is a fine choice. I did not see any
> standard header for a use-case like this, so we are free to choose a name
> for ourselves.
>
> 2. While such proposals do not exist, I don't think that means that we
> should exclude them from consideration. If we do, then we run the risk of
> making future implementations more complicated, impossible, or awkward to
> use. This is a balance of course, as we cannot let future features paralyze
> us from making improvements today. We cannot hold ourselves responsible for
> handling unknown unknowns, but we can at least try to plan around known
> unknowns.
> Applying this to a hypothetical extension of timeouts to other endpoints:
> my concern was primarily about whether this design applied everywhere felt
> natural or awkward. In my opinion, a request parameter would draw a lot of
> outsized attention in documentation, and be the first parameter for a lot
> of the endpoints, while a header could be specified as a more obscure
> global modifier to all requests.
>
> 3. 4. Applying the same standard to a hypothetical asynchronous I don't
> believe it makes async validation substantially more complicated, except
> the "request timeout interrupts connector creation" interaction. I also do
> not think that it would make it impossible to implement an async validation
> scheme in the future, since the synchronization is already in place, the
> timeout is just different.
> My chief concern was about the clarity of the API in the future when both
> configurable timeouts and asynchronous APIs exist. Will the configurable
> timeouts still make sense, and will they be a feature that people will use?
> Are we expecting users to hold a single REST request open for 5, 20, or 60
> minutes to accommodate a connector with an excessively long validation
> step? Successfully validating a request requires the caller and the network
> to remain stable for the entire duration of the request to have a chance of
> success, and consumes (some small) amount of resources during that time.
> And what if they decide they no longer want the validation result, and wish
> to cancel the operation to free resources (both connect and external
> system)? There is no standard mechanism to discard or cancel an HTTP
> request (not even a connection loss).
>
> You have said that the default timeout for other endpoints is more than
> satisfactory for the typical user, and I agree. If other operations
> (deleting, restarting, pause/unpause, etc) had a substantial synchronous
> component that would benefit from a configurable timeout, I would be more
> inclined to support configurable timeouts (all at once, or validation first
> and other operations later).
> But as it stands, I don't think I see the value-add of configurable
> timeouts once there is another solution which targets the validate call
> duration specifically, and I'm concerned that configurable timeouts would
> be made irrelevant in that case.
>
> Thanks,
> Greg
>
> On Wed, Mar 8, 2023 at 7:21 AM Yash Mayya  wrote:
>
> > Hi Greg,
> >
> > Thanks for the response!
> >
> > 1. Hm that's a fair point, and while there aren't any 

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-03-12 Thread Yash Mayya
n code review after the KIP is approved.
>
> I had some questions:
>
> 4. What is the expected state/behavior for SinkRecords which do not have
> original T/P/O information after the upgrade? Just browsing, it appears
> that tests make extensive use of the existing public SinkRecord
> constructors for both Transformations and Connectors.
>
> 5. What is the expected behavior for Transformation implementations which
> do not use the newRecord methods and instead use public SinkRecord
> constructors? The KIP mentions this as a justification for the
> originalKafkaOffset method, but if existing implementations are using the
> existing constructors, those constructors won't forward the original T/P/O
> information to later transforms or the task.
>
> For the last few points, I want to discuss this rejected alternative:
>
> > Address the offsets problem entirely within the framework, doing some
> kind of mapping from the transformed topic back to the original topic.
> > * This would only work in the cases where there’s no overlap between the
> transformed topic names, but would break for the rest of the
> transformations (e.g. static transformation, topic = “a”).
> > * Even if we wanted to limit the support to those cases, it would require
> considerable bookkeeping to add a validation to verify that the
> transformation chain adheres to that expectation (and fail fast if it
> doesn’t).
>
> 6. This reasoning and the KIP design seems to imply that the connector is
> better equipped to solve this problem than the framework, but the stated
> reasons are not convincing for me.
> * A static transformation still causes an offset collision in the connector
> * The connector is not permitted to see the transformation chain to do any
> fail-fast assertions
>
> Suppose we were to think of the records at the end of the transformation
> chain as being in "virtual partitions" with "virtual offsets".
> For example, with identity-routing SMTs, the virtual coordinates are
> exactly the same as the underlying physical coordinates. For 1-1 renames,
> each virtual topic would be the renamed topic corresponding to the
> underlying topic. For fan-out from one topic to multiple virtual topics,
> virtual offsets would use the underlying kafka offsets with gaps for
> records going to other virtual partitions. Virtual topics with dropped
> records have similar gaps in the offsets.
> Currently, these virtual coordinates are passed into the connector via
> SinkTask::put, but SinkTask::open/close/preCommit and
> SinkTaskContext::assignment/offsets/pause/resume all use physical
> coordinates.
> This proposal patches put,open, and close to have both physical and virtual
> coordinates, but leaves the other methods with physical coordinates. After
> this proposal, connectors would be intentionally made aware of the
> distinction between physical and virtual coordinates, and manage their own
> bookkeeping for the two systems.
>
> To avoid that connector logic, we could use virtual coordinates in all
> connector calls, never revealing that they are different from the physical
> coordinates. There's a whole design shopping list that we'd need:
> * Renumbering mechanism for disambiguating and making virtual offsets
> monotonic in the case of topic/partition collisions
> * Data structure and strategy for translating virtual offsets back to
> physical offsets
> * New limits on SinkTaskContext::offsets() calls to prevent rewinding
> before the latest commit
> * Backwards compatibility and upgrade design
>
> 7. This alternative was very appealing to me, because the strength of a
> plugin framework is the composability of different components. Among a
> collection of N connectors and M transforms, it should ideally only take
> N + M work to understand how the components combine to build the whole.
> However, once you start adding special cases to some plugins to support
> interactions with others, the whole system can take N * M work to
> understand. From a complexity standpoint, it would be very good for the
> framework to solve this in a way which was connector-agnostic.
> The current design compromises the logical isolation of the plugins
> slightly, but they can collapse offsets very memory-efficiently, and re-use
> the existing raw coordinate functions and keep everything else backwards
> compatible. After deriving all of the above, I think that's a reasonable
> tradeoff to make.
>
> Thanks,
> Greg
>
> On Tue, Feb 21, 2023 at 10:17 AM Chris Egerton 
> wrote:
>
> > Hi Yash,
> >
> > We'll probably want to make a few tweaks to the Javadocs for the new
> > methods (I'm imagining that notes on compatibility with older versions
> will
> > be required), b

Re: [ANNOUNCE] New Kafka PMC Member: Chris Egerton

2023-03-09 Thread Yash Mayya
Congratulations Chris!

On Thu, Mar 9, 2023, 23:42 Jun Rao  wrote:

> Hi, Everyone,
>
> Chris Egerton has been a Kafka committer since July 2022. He has been very
> instrumental to the community since becoming a committer. It's my pleasure
> to announce that Chris is now a member of Kafka PMC.
>
> Congratulations Chris!
>
> Jun
> on behalf of Apache Kafka PMC
>


Re: [ANNOUNCE] New Kafka PMC Member: David Arthur

2023-03-09 Thread Yash Mayya
Congrats David!

On Thu, Mar 9, 2023, 23:42 Jun Rao  wrote:

> Hi, Everyone,
>
> David Arthur has been a Kafka committer since 2013. He has been very
> instrumental to the community since becoming a committer. It's my pleasure
> to announce that David is now a member of Kafka PMC.
>
> Congratulations David!
>
> Jun
> on behalf of Apache Kafka PMC
>


Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

2023-03-08 Thread Yash Mayya
Hi Greg,

Thanks for the response!

1. Hm that's a fair point, and while there aren't any strict guidelines or
rules governing whether something should be a query parameter versus a
request header, I agree that it might be more idiomatic for a request
timeout value to be accepted via a custom request header. What do you
think about using a header name like "X-Request-Timeout", or maybe simply
"Request-Timeout" if we want to take into account
https://www.rfc-editor.org/rfc/rfc6648?

2. 3. 4. Adding asynchronous validations via new endpoints / new request
parameters on existing endpoints is definitely an interesting idea but I'm
not sure whether this KIP should take into consideration a hypothetical
future proposal? If there were already such a concrete proposal, I would
definitely agree that this KIP should take that into consideration - but
that isn't the case today. Furthermore, I'm not sure I understand why the
addition of request timeout configurability to the existing endpoints would
preclude the introduction of an asynchronous validation API in the future?

Thanks,
Yash

On Sat, Mar 4, 2023 at 1:13 AM Greg Harris 
wrote:

> Yash,
>
> 1.
> Currently the request parameters in the REST API are individual and pertain
> to just one endpoint.
> They also change the content of the query result, or change the action
> taken on the cluster.
> I think that a request timeout is a property of the HTTP request more than
> it is a property of the cluster query or cluster action.
> The two solutions have very similar tradeoffs, but I'm interested in
> whether one is more idiomatic and more obvious to users.
>
> 2.
> I understand that only these three endpoints are in need of increased
> timeouts at this time due to long connector validations.
> From another perspective, this change is making the API more irregular and
> someone in the future might choose to make it more regular by standardizing
> the configurable timeout functionality.
> I wouldn't (in this KIP) dismiss someone's desire to configure other
> timeouts in the future (in another KIP), and design them into a corner.
> It is acceptable to limit the scope of this change to just the three
> endpoints due to practical reasons, but I don't think that should prevent
> us from trying to ensure that this design fits in the "end goal" state of
> the Connect service.
>
> 3. 4.
> I am not suggesting an incompatible change, as the current synchronous
> behavior is still a useful API for certain situations. I think that it is
> possible to add asynchronous validations in a backwards compatible way,
> using new endpoints or other new request parameters.
> The interface could be designed such that users with connectors that exceed
> the synchronous timeouts can utilize the asynchronous API. Tooling can use
> the asynchronous API when it is available, and fall back to the synchronous
> API when it is not.
> I think that it also may be more in-line with the design of the rest of the
> REST API, where nearly every other request is asynchronous. That's why
> you're only targeting these three endpoints, they're the only ones with a
> synchronicity constraint.
> Again, I'm not necessarily saying that you must implement such an
> asynchronous validation scheme in this KIP, but we should consider if that
> is a more extensible solution. If we decided to implement configurable
> synchronous timeouts now, how would that complement an asynchronous API in
> the future?
>
> On Thu, Mar 2, 2023 at 10:00 PM Yash Mayya  wrote:
>
> > Hi Greg,
> >
> > Thanks for taking a look!
> >
> > 1. I believe Chris suggested the use of a query parameter above as we
> > already have precedents for using query parameters to configure per
> request
> > behavior in Kafka Connect (the restart connectors API and the get
> > connectors API for instance). Also, the limited choice of endpoints
> > targeted is intentional (see my reply to the next point).
> >
> > 2. I intentionally targeted just the three listed endpoints where
> > synchronous connector config validations come into the picture. This is
> > because of the legitimate cases where config validation for specific
> > connector plugins might exceed the default request timeout in edge case
> > scenarios (outlined in the KIP's motivation section). Other Connect REST
> > endpoints shouldn't be taking longer than the default 90 second request
> > timeout; if they do so, it would either be indicative of a bug in the
> > Connect framework or a cluster health issue - neither of which should be
> > covered up by manually setting a longer request timeout.
> >
> > 3. 4. I think changing the config validation behavior would be a backward
> > incompatible chang

Re: [VOTE] KIP-882: Kafka Connect REST API timeout improvements

2023-03-02 Thread Yash Mayya
Hi Ashwin,

Thanks for the feedback. I intentionally went with the choice of
an unambiguous milliseconds unit and this would be clearly documented in
the public Kafka Connect documentation. I think supporting multiple time
units might lead to unnecessary confusion (what happens when no unit is
specified; what if an alternate short form for a time unit is used etc.).
Furthermore, the worker configuration for max request timeout will also
only accept millisecond values so this will be consistent across both the
places.

However, it's definitely not too late to incorporate the change and I'd be
happy to do so if there are others from the community who share the same
opinion!

Thanks,
Yash

On Thu, Mar 2, 2023 at 12:18 PM Ashwin  wrote:

> Thanks for the KIP Yash - +1 (non-binding)
>
> One nitpick -
> In one of the responses Chris had mentioned that the timeout param could be
> like 'timeout=10s'.
> The KIP seems to favour a millisecond timeout value and has an unwieldy
> value in the example - `POST /connectors?timeout=12`
>
> I like the idea of having smaller timeout values and the unit as part of
> the value eg: '2m', '10s' or '500ms'.
> Is it too late to incorporate this change?
>
> Thanks,
> Ashwin
>
>
> On Wed, Mar 1, 2023 at 6:32 PM Yash Mayya  wrote:
>
> > Hi all,
> >
> > I'd like to call for a vote on the (hopefully) straightforward KIP-882
> > which adds support for configuring request timeouts on Kafka Connect REST
> > APIs via query parameters along with a couple of related small
> > improvements.
> >
> > KIP -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements
> >
> > Discussion thread -
> > https://lists.apache.org/thread/cygy115qmwpc3nj5omnj0crws2dw8nor
> >
> > Thanks,
> > Yash
> >
>


Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

2023-03-02 Thread Yash Mayya
Hi Greg,

Thanks for taking a look!

1. I believe Chris suggested the use of a query parameter above as we
already have precedents for using query parameters to configure per request
behavior in Kafka Connect (the restart connectors API and the get
connectors API for instance). Also, the limited choice of endpoints
targeted is intentional (see my reply to the next point).

2. I intentionally targeted just the three listed endpoints where
synchronous connector config validations come into the picture. This is
because of the legitimate cases where config validation for specific
connector plugins might exceed the default request timeout in edge case
scenarios (outlined in the KIP's motivation section). Other Connect REST
endpoints shouldn't be taking longer than the default 90 second request
timeout; if they do so, it would either be indicative of a bug in the
Connect framework or a cluster health issue - neither of which should be
covered up by manually setting a longer request timeout.

3. 4. I think changing the config validation behavior would be a backward
incompatible change and I wanted to avoid that in this particular KIP.
There are multiple programmatic UIs out there which rely on the current
synchronous config validation behavior and breaking the existing contract
would definitely require a larger discussion.

Thanks,
Yash

On Fri, Mar 3, 2023 at 12:04 AM Greg Harris 
wrote:

> Hey Yash,
>
> Thanks for the KIP, and sorry for the late review.
>
> 1. Have you considered a HTTP header to provide the client-configurable
> timeout? A header might more naturally extend to all of the other endpoints
> in the future, rather than duplicating the query parameter across
> endpoints.
>
> 2. I understand that this change is targeted at just long duration
> Connector::validation calls, is that due to voluntary scope constraints?
> Implementing configurable timeouts for all endpoints in a uniform way could
> be desirable, even if the default timeout will work for nearly all of the
> other calls.
>
> 3. Did you consider adding asynchronous validation as a user-facing
> feature? I think that relying on the synchronous validation results in a
> single HTTP request is a bit of an anti-pattern, and one that we've
> inherited from the original REST design. It seems useful when using the
> REST API by hand, but seems to be a liability when used in environments
> with an external management layer.
>
> 4. Perhaps rather than allowing synchronous calls to Connector:validate to
> increase in duration, we should provide a way for connectors to surface
> validation problems later in their lifecycle. Currently there is the
> ConnectorContext::raiseError that transitions the task to FAILED, perhaps a
> similar API could asynchronously emit re-validation results. We've also had
> a problem with long start() duration for the same reasons as validate().
>
> I understand if you want to keep this KIP as tightly focused as possible,
> but i'm worried that it is addressing the symptom and not the problem. I
> want to make sure that this change is impactful and isn't obsoleted by a
> later improvement.
>
> Thanks,
> Greg
>
>
> On Wed, Mar 1, 2023 at 5:07 AM Yash Mayya  wrote:
>
> > Hi all,
> >
> > Thanks for all the feedback and discussion. I've renamed the KIP title to
> > "Kafka Connect REST API timeout improvements" since we're introducing a
> > couple of improvements (cancelling create / update connector requests
> when
> > config validations timeout and avoiding double config validations in
> > distributed mode) along with making the request timeouts configurable.
> The
> > new KIP link is
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements
> > and I've called for a vote for the KIP here -
> > https://lists.apache.org/thread/mgx8lczx2f57pk7x3vh0nqk00s79grgp.
> >
> > Thanks,
> > Yash
> >
> > On Sat, Nov 5, 2022 at 11:42 PM Sagar  wrote:
> >
> > > Hey Yash,
> > >
> > > Thanks for the explanation. I think it should be fine to delegate the
> > > validation directly to the leader.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Sat, Nov 5, 2022 at 10:42 AM Yash Mayya 
> wrote:
> > >
> > > > Hi Sagar,
> > > >
> > > > Thanks for chiming in!
> > > >
> > > > > Having said that, why does the worker forward to the
> > > > > leader? I am thinking if the worker can perform the validation on
> > it's
> > > > own,
> > > > > we could let it do the validation instead of forwarding everything
> to
> > > the
> > 

Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

2023-03-01 Thread Yash Mayya
Hi all,

Thanks for all the feedback and discussion. I've renamed the KIP title to
"Kafka Connect REST API timeout improvements" since we're introducing a
couple of improvements (cancelling create / update connector requests when
config validations timeout and avoiding double config validations in
distributed mode) along with making the request timeouts configurable. The
new KIP link is
https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements
and I've called for a vote for the KIP here -
https://lists.apache.org/thread/mgx8lczx2f57pk7x3vh0nqk00s79grgp.

Thanks,
Yash

On Sat, Nov 5, 2022 at 11:42 PM Sagar  wrote:

> Hey Yash,
>
> Thanks for the explanation. I think it should be fine to delegate the
> validation directly to the leader.
>
> Thanks!
> Sagar.
>
> On Sat, Nov 5, 2022 at 10:42 AM Yash Mayya  wrote:
>
> > Hi Sagar,
> >
> > Thanks for chiming in!
> >
> > > Having said that, why does the worker forward to the
> > > leader? I am thinking if the worker can perform the validation on it's
> > own,
> > > we could let it do the validation instead of forwarding everything to
> the
> > > leader
> >
> > Only the leader is allowed to perform writes to the config topic, so any
> > request that requires a config topic write ends up being forwarded to the
> > leader. The `POST /connectors` and `PUT /connectors/{connector}/config`
> > endpoints call `Herder::putConnectorConfig` which internally does a
> config
> > validation first before initiating another herder request to write to the
> > config topic (in distributed mode). If we want to allow the first worker
> to
> > do the config validation, and then forward the request to the leader just
> > for the write to the config topic, we'd either need something like a skip
> > validations query parameter on the endpoint like Chris talks about above
> or
> > else a new internal only endpoint that just does a write to the config
> > topic without any prior config validation. However, we agreed that this
> > optimization doesn't really seem necessary for now and can be done later
> if
> > deemed useful. I'd be happy to hear differing thoughts if any, however.
> >
> > > I think a bound is certainly needed but IMO it shouldn't go beyond
> > > 10 mins considering this is just validation
> >
> > Yeah, I agree that this seems like a fair value - I've elected to go
> with a
> > default value of 10 minutes for the proposed worker configuration that
> sets
> > an upper bound for the timeout query parameter.
> >
> > Thanks,
> > Yash
> >
> > On Sat, Nov 5, 2022 at 10:30 AM Yash Mayya  wrote:
> >
> > > Hi Chris,
> > >
> > > Thanks again for your feedback. I think a worker configuration for the
> > > upper bound makes sense - I initially thought we could hardcode it
> (just
> > > like the current request timeout is), but there's no reason to set
> > another
> > > artificial bound that isn't user configurable which is exactly what
> we're
> > > trying to change here in the first place. I've updated the KIP based on
> > all
> > > our discussion above.
> > >
> > > Thanks,
> > > Yash
> > >
> > > On Thu, Nov 3, 2022 at 11:01 PM Sagar 
> wrote:
> > >
> > >> Hey Yash,
> > >>
> > >> Thanks for the KIP! This looks like a useful feature.
> > >>
> > >> I think the discussion thread already has some great points by Chris.
> > Just
> > >> a couple of points/clarifications=>
> > >>
> > >> Regarding, pt#2 , I guess it might be better to forward to the leader
> as
> > >> suggested by Yash. Having said that, why does the worker forward to
> the
> > >> leader? I am thinking if the worker can perform the validation on it's
> > >> own,
> > >> we could let it do the validation instead of forwarding everything to
> > the
> > >> leader(even though it might be cheap to forward all requests to the
> > >> leader).
> > >>
> > >> Pt#3 => I think a bound is certainly needed but IMO it shouldn't go
> > beyond
> > >> 10 mins considering this is just validation. We shouldn't end up in a
> > >> situation where a few faulty connectors end up blocking a lot of
> request
> > >> processing threads, so while increasing the config is certainly
> helpful,
> > >> we
> > >> shouldn't set too high a value IMO. Of course I am also open to
> > >&g

[VOTE] KIP-882: Kafka Connect REST API timeout improvements

2023-03-01 Thread Yash Mayya
Hi all,

I'd like to call for a vote on the (hopefully) straightforward KIP-882
which adds support for configuring request timeouts on Kafka Connect REST
APIs via query parameters along with a couple of related small improvements.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements

Discussion thread -
https://lists.apache.org/thread/cygy115qmwpc3nj5omnj0crws2dw8nor

Thanks,
Yash


Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-21 Thread Yash Mayya
Hi Chris,

> we might try to introduce a framework-level configuration
> property to dictate which of the pre-transform and post-transform
> topic partitions are used for the fallback call to the single-arg
> variant if a task class has not overridden the multi-arg variant

Thanks for the explanation and I agree that this will be a tad bit too
convoluted. :)

Please do let me know if you'd like any further amendments to the KIP!

Thanks,
Yash

On Tue, Feb 21, 2023 at 8:42 PM Chris Egerton 
wrote:

> Hi Yash,
>
> I think the use case for pre-transform TPO coordinates (and topic partition
> writers created/destroyed in close/open) tends to boil down to exactly-once
> semantics, where it's desirable to preserve the guarantees that Kafka
> provides (every record has a unique TPO trio, and records are ordered by
> offset within a topic partition).
>
> It's my understanding that this approach is utilized in several connectors
> out there today, and it might break these connectors to start using the
> post-transform topic partitions automatically in their open/close methods.
>
> If we want to get really fancy with this and try to obviate or at least
> reduce the need for per-connector code changes, we might try to introduce a
> framework-level configuration property to dictate which of the
> pre-transform and post-transform topic partitions are used for the fallback
> call to the single-arg variant if a task class has not overridden the
> multi-arg variant. But I think this is going a bit too far and would prefer
> to keep things simple(r) for now.
>
> Cheers,
>
> Chris
>
>
> On Sun, Feb 19, 2023 at 2:34 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > > I was actually envisioning something like `void
> > > open(Collection originalPartitions,
> > > Collection transformedPartitions)`
> >
> > Ah okay, this does make a lot more sense. Sorry, I think I misunderstood
> > you earlier. I do agree with you that this seems better than splitting it
> > off into two new sets of open / close methods from a complexity
> standpoint.
> >
> > > Plus, if a connector is intentionally designed to use
> > > pre-transformation topic partitions in its open/close
> > > methods, wouldn't we just be trading one form of the
> > >  problem for another by making this switch?
> >
> > On thinking about this a bit more, I'm not so convinced that we need to
> > expose the pre-transform / original topic partitions in the new open /
> > close methods. The purpose of the open / close methods is to allow sink
> > tasks to allocate and deallocate resources for each topic partition
> > assigned to the task and the purpose of topic-mutating SMTs is to
> > essentially modify the source topic name from the point of view of the
> sink
> > connector. Why would a sink connector ever need to or want to allocate
> > resources for pre-transform topic partitions? Is the argument here that
> > since we'll be exposing both the pre-transform and post-transform topic
> > partitions per record, we should also expose the same info via open /
> close
> > and allow sink connector implementations to disregard topic-mutating SMTs
> > completely if they wanted to?
> >
> > Either way, I've gone ahead and updated the KIP to reflect all of
> > our previous discussion here since it had become quite outdated. I've
> also
> > updated the KIP title from "Sink Connectors: Support topic-mutating SMTs
> > for async connectors (preCommit users)" to "Allow sink connectors to be
> > used with topic-mutating SMTs" since the improvements to the open / close
> > mechanism doesn't pertain only to asynchronous sink connectors. The new
> KIP
> > URL is:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs
> >
> >
> > Thanks,
> > Yash
> >
> > On Tue, Feb 14, 2023 at 11:39 PM Chris Egerton 
> > wrote:
> >
> > > Hi Yash,
> > >
> > > I was actually envisioning something like `void
> > > open(Collection
> > > originalPartitions, Collection transformedPartitions)`,
> > > since we already convert and transform each batch of records that we
> poll
> > > from the sink task's consumer en masse, meaning we could discover
> several
> > > new transformed partitions in between consecutive calls to
> SinkTask::put.
> > >
> > > It's also worth noting that we'll probably want to deprecate the
> existing
> > > open/close methods, at which point keeping one non-deprecated variant
> of
> > > each seem

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-18 Thread Yash Mayya
Hi Chris,

> I was actually envisioning something like `void
> open(Collection originalPartitions,
> Collection transformedPartitions)`

Ah okay, this does make a lot more sense. Sorry, I think I misunderstood
you earlier. I do agree with you that this seems better than splitting it
off into two new sets of open / close methods from a complexity standpoint.

> Plus, if a connector is intentionally designed to use
> pre-transformation topic partitions in its open/close
> methods, wouldn't we just be trading one form of the
>  problem for another by making this switch?

On thinking about this a bit more, I'm not so convinced that we need to
expose the pre-transform / original topic partitions in the new open /
close methods. The purpose of the open / close methods is to allow sink
tasks to allocate and deallocate resources for each topic partition
assigned to the task and the purpose of topic-mutating SMTs is to
essentially modify the source topic name from the point of view of the sink
connector. Why would a sink connector ever need to or want to allocate
resources for pre-transform topic partitions? Is the argument here that
since we'll be exposing both the pre-transform and post-transform topic
partitions per record, we should also expose the same info via open / close
and allow sink connector implementations to disregard topic-mutating SMTs
completely if they wanted to?

Either way, I've gone ahead and updated the KIP to reflect all of
our previous discussion here since it had become quite outdated. I've also
updated the KIP title from "Sink Connectors: Support topic-mutating SMTs
for async connectors (preCommit users)" to "Allow sink connectors to be
used with topic-mutating SMTs" since the improvements to the open / close
mechanism doesn't pertain only to asynchronous sink connectors. The new KIP
URL is:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs


Thanks,
Yash

On Tue, Feb 14, 2023 at 11:39 PM Chris Egerton 
wrote:

> Hi Yash,
>
> I was actually envisioning something like `void
> open(Collection
> originalPartitions, Collection transformedPartitions)`,
> since we already convert and transform each batch of records that we poll
> from the sink task's consumer en masse, meaning we could discover several
> new transformed partitions in between consecutive calls to SinkTask::put.
>
> It's also worth noting that we'll probably want to deprecate the existing
> open/close methods, at which point keeping one non-deprecated variant of
> each seems more appealing and less complex than keeping two.
>
> Honestly though, I think we're both on the same page enough that I wouldn't
> object to either approach. We've probably reached the saturation point for
> ROI here and as long as we provide developers a way to get the information
> they need from the runtime and take care to add Javadocs and update our
> docs page (possibly including the connector development quickstart), it
> should be fine.
>
> At this point, it might be worth updating the KIP based on recent
> discussion so that others can see the latest proposal, and we can both take
> a look and make sure everything looks good enough before opening a vote
> thread.
>
> Finally, I think you make a convincing case for a time-based eviction
> policy. I wasn't thinking about the fairly common SMT pattern of deriving a
> topic name from, e.g., a record field or header.
>
> Cheers,
>
> Chris
>
> On Tue, Feb 14, 2023 at 11:42 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > > Plus, if a connector is intentionally designed to
> > > use pre-transformation topic partitions in its
> > > open/close methods, wouldn't we just be trading
> > > one form of the problem for another by making this
> > > switch?
> >
> > Thanks, this makes sense, and given that the KIP already proposes a way
> for
> > sink connector implementations to distinguish between pre-transform and
> > post-transform topics per record, I think I'm convinced that going with
> new
> > `open()` / `close()` methods is the right approach. However, I still feel
> > like having overloaded methods will make it a lot less unintuitive given
> > that the two sets of methods would be different in terms of when they're
> > called and what arguments they are passed (also I'm presuming that the
> > overloaded methods you're prescribing will only have a single
> > `TopicPartition` rather than a `Collection` as their
> > parameters). I guess my concern is largely around the fact that it won't
> be
> > possible to distinguish between the overloaded methods' use cases just
> from
> > the method signatures. I agree that naming is going to be difficult h

[jira] [Created] (KAFKA-14732) Use an exponential backoff retry mechanism while reconfiguring connector tasks

2023-02-18 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14732:
--

 Summary: Use an exponential backoff retry mechanism while 
reconfiguring connector tasks
 Key: KAFKA-14732
 URL: https://issues.apache.org/jira/browse/KAFKA-14732
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


Kafka Connect in distributed mode retries infinitely with a fixed retry backoff 
(250 ms) in case of errors arising during connector task reconfiguration. Tasks 
can be "reconfigured" during connector startup (to get the initial task configs 
from the connector), a connector resume or if a connector explicitly requests 
it via its context. Task reconfiguration essentially entails requesting a 
connector instance for its task configs and writing them to the Connect 
cluster's config storage (in case a change in task configs is detected). A 
fixed retry backoff of 250 ms leads to very aggressive retries - consider a 
Debezium connector which attempts to initiate a database connection in its 
[taskConfigs 
method|https://github.com/debezium/debezium/blob/bf347da71ad9b0819998a3bc9754b3cc96cc1563/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java#L63].
 If the connection fails due to something like an invalid login, the Connect 
worker will essentially spam connection attempts frequently and indefinitely 
(until the connector config / database side configs are fixed). An exponential 
backoff retry mechanism seems more well suited for the 
[DistributedHerder::reconfigureConnectorTasksWithRetry|https://github.com/apache/kafka/blob/a54a34a11c1c867ff62a7234334cad5139547fd7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1873-L1898]
 method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New committer: Lucas Bradstreet

2023-02-16 Thread Yash Mayya
Congratulations Lucas!

On Fri, Feb 17, 2023 at 3:25 AM Jun Rao  wrote:

> Hi, Everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Lucas
> Bradstreet.
>
> Lucas has been a long time Kafka contributor since Oct. 2018. He has been
> extremely valuable for Kafka on both performance and correctness
> improvements.
>
> The following are his performance related contributions.
>
> KAFKA-9820: validateMessagesAndAssignOffsetsCompressed allocates batch
> iterator which is not used
> KAFKA-9685: Solve Set concatenation perf issue in AclAuthorizer
> KAFKA-9729: avoid readLock in authorizer ACL lookups
> KAFKA-9039: Optimize ReplicaFetcher fetch path
> KAFKA-8841: Reduce overhead of ReplicaManager.updateFollowerFetchState
>
> The following are his correctness related contributions.
>
> KAFKA-13194: LogCleaner may clean past highwatermark
> KAFKA-10432: LeaderEpochCache is incorrectly recovered on segment recovery
> for epoch 0
> KAFKA-9137: Fix incorrect FetchSessionCache eviction logic
>
> Congratulations, Lucas!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>


Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-14 Thread Yash Mayya
Hi Chris,

> Plus, if a connector is intentionally designed to
> use pre-transformation topic partitions in its
> open/close methods, wouldn't we just be trading
> one form of the problem for another by making this
> switch?

Thanks, this makes sense, and given that the KIP already proposes a way for
sink connector implementations to distinguish between pre-transform and
post-transform topics per record, I think I'm convinced that going with new
`open()` / `close()` methods is the right approach. However, I still feel
like having overloaded methods will make it a lot less unintuitive given
that the two sets of methods would be different in terms of when they're
called and what arguments they are passed (also I'm presuming that the
overloaded methods you're prescribing will only have a single
`TopicPartition` rather than a `Collection` as their
parameters). I guess my concern is largely around the fact that it won't be
possible to distinguish between the overloaded methods' use cases just from
the method signatures. I agree that naming is going to be difficult here,
but I think that having two sets of `SinkTask::openXyz` /
`SinkTask::closeXyz` methods will be less complicated to understand from a
connector developer perspective (as compared to overloaded methods with
only differing documentation). Of your suggested options, I think
`openPreTransform` / `openPostTransform` are the most comprehensible ones.

> BTW, I wouldn't say that we can't make assumptions
> about the relationships between pre- and post-transformation
>  topic partitions.

I meant that the framework wouldn't be able to deterministically know when
to close a post-transform topic partition given that SMTs could use
per-record data / metadata to manipulate the topic names as and how
required (which supports the suggestion to use an eviction policy based
mechanism to call SinkTask::close for post-transform topic partitions).

> We might utilize a policy that assumes a deterministic
> mapping from the former to the latter, for example.

Wouldn't this be making the assumption that SMTs only use the topic name
itself and no other data / metadata while computing the new topic name? Are
you suggesting that since this assumption could work for a majority of
SMTs, it might be more efficient overall in terms of reducing the number of
"false-positive" calls to `SinkTask::closePostTransform` (and we'll also be
able to call `SinkTask::closePostTransform` immediately after topic
partitions are revoked from the consumer)? I was thinking something more
generic along the lines of a simple time based eviction policy that
wouldn't be making any assumptions regarding the SMT implementations.
Either way, I do like your earlier suggestion of keeping this logic
internal and not painting ourselves into a corner by promising any
particular behavior in the KIP.

Thanks,
Yash

On Tue, Feb 14, 2023 at 1:08 AM Chris Egerton 
wrote:

> Hi Yash,
>
> I think the key difference between adding methods/overloads related to
> SinkTask::open/SinkTask::close and SinkTask::put is that this isn't
> auxiliary information that may or may not be useful to connector
> developers. It's actually critical for them to understand the difference
> between the two concepts here, even if they look very similar. And yes, I
> do believe that switching from pre-transform to post-transform topic
> partitions is too big a change in behavior here. Plus, if a connector is
> intentionally designed to use pre-transformation topic partitions in its
> open/close methods, wouldn't we just be trading one form of the problem for
> another by making this switch?
>
> One possible alternative to overloading the existing methods is to split
> SinkTask::open into openOriginal (or possibly openPhysical or
> openPreTransform) and openTransformed (or openLogical or
> openPostTransform), with a similar change for SinkTask::close. The default
> implementation for SinkTask::openOriginal can be to call SinkTask::open,
> and the same can go for SinkTask::close. However, I prefer overloading the
> existing methods since this alternative increases complexity and none of
> the names are very informative.
>
> BTW, I wouldn't say that we can't make assumptions about the relationships
> between pre- and post-transformation topic partitions. We might utilize a
> policy that assumes a deterministic mapping from the former to the latter,
> for example. The distinction I'd draw is that the assumptions we make can
> and probably should favor some cases in terms of performance (i.e.,
> reducing the number of unnecessary calls to close/open over a given sink
> task's lifetime), but should not lead to guaranteed resource leaks or
> failure to obey API contract in any cases.
>
> Cheers,
>
> Chris
>
> On Mon, Feb 13, 2023 at 10:54 AM Yash Mayya  wrote:
>
> > Hi Chris,
>

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-13 Thread Yash Mayya
Hi Chris,

> especially if connectors are intentionally designed around
> original topic partitions instead of transformed ones.

Ha, that's a good point and reminds me of Hyrum's Law [1] :)

> I think we have to provide connector developers with some
> way to differentiate between the two, but maybe there's a way
>  to do this that I haven't thought of yet

I can't think of a better way to do this either; would invoking the
existing `SinkTask::open` and `SinkTask::close` methods with post-transform
topic partitions instead of pre-transform topic partitions not be
acceptable even in a minor / major AK release? I feel like the proposed
approach of adding overloaded `SinkTask::open` / `SinkTask::close` methods
to differentiate between pre-transform and post-transform topic partitions
has similar pitfalls to the idea of the overloaded `SinkTask::put` method
we discarded earlier.

> Either way, I'm glad that the general idea of a cache and
> eviction policy for SinkTask::close seem reasonable; if
> we decide to go this route, it might make sense for the KIP
> to include an outline of one or more high-level strategies
> we might take, but without promising any particular behavior
> beyond occasionally calling SinkTask::close for post-transform
> topic partitions. I'm hoping that this logic can stay internal,
> and by notpainting ourselves into a corner with the KIP, we
> give ourselves leeway to tweak it in the future if necessary
> without filing another KIP or introducing a pluggable interface.

Thanks, that's a good idea. Given the flexibility of SMTs, the framework
can't really make any assumptions around topic partitions post
transformation nor does it have any way to definitively get any such
information from transformations which is why the idea of a cache with an
eviction policy makes perfect sense!

[1] - https://www.hyrumslaw.com/


Thanks,
Yash

On Thu, Feb 9, 2023 at 9:38 PM Chris Egerton 
wrote:

> Hi Yash,
>
> > So it looks like with the current state of affairs, sink tasks that only
> instantiate writers in the SinkTask::open method (and don't do the lazy
> instantiation in SinkTask::put that you mentioned) might fail when used
> with topic/partition mutating SMTs even if they don't do any asynchronous
> processing?
>
> Yep, exactly 
>
> > What do you think about retaining just the existing methods
> but changing when they're called in the Connect runtime? For instance,
> instead of calling SinkTask::open after partition assignment post a
> consumer group rebalance, we could cache the currently "seen" topic
> partitions (post transformation) and before each call to SinkTask::put
> check whether there's any new "unseen" topic partitions, and if so call
> SinkTask::open (and also update the cache of course).
>
> IMO the issue here is that it's a drastic change in behavior to start
> invoking SinkTask::open and SinkTask::close with post-transform topic
> partitions instead of pre-transform, especially if connectors are
> intentionally designed around original topic partitions instead of
> transformed ones. I think we have to provide connector developers with some
> way to differentiate between the two, but maybe there's a way to do this
> that I haven't thought of yet. Interested to hear your thoughts.
>
> Either way, I'm glad that the general idea of a cache and eviction policy
> for SinkTask::close seem reasonable; if we decide to go this route, it
> might make sense for the KIP to include an outline of one or more
> high-level strategies we might take, but without promising any particular
> behavior beyond occasionally calling SinkTask::close for post-transform
> topic partitions. I'm hoping that this logic can stay internal, and by not
> painting ourselves into a corner with the KIP, we give ourselves leeway to
> tweak it in the future if necessary without filing another KIP or
> introducing a pluggable interface.
>
> Cheers,
>
> Chris
>
> On Thu, Feb 9, 2023 at 7:39 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Thanks for the feedback.
> >
> > 1) That's a fair point; while I did scan everything publicly available on
> > GitHub, you're right in that it won't cover all possible SMTs that are
> out
> > there. Thanks for the example use-case as well, I've updated the KIP to
> add
> > the two new proposed methods.
> >
> > 2) So it looks like with the current state of affairs, sink tasks that
> only
> > instantiate writers in the SinkTask::open method (and don't do the lazy
> > instantiation in SinkTask::put that you mentioned) might fail when used
> > with topic/partition mutating SMTs even if they don't do any asynchronous
> > processing? Since they could encounter records in SinkTask::put with
> > topi

Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-09 Thread Yash Mayya
Hi Chris,

Thanks for the feedback.

1) That's a fair point; while I did scan everything publicly available on
GitHub, you're right in that it won't cover all possible SMTs that are out
there. Thanks for the example use-case as well, I've updated the KIP to add
the two new proposed methods.

2) So it looks like with the current state of affairs, sink tasks that only
instantiate writers in the SinkTask::open method (and don't do the lazy
instantiation in SinkTask::put that you mentioned) might fail when used
with topic/partition mutating SMTs even if they don't do any asynchronous
processing? Since they could encounter records in SinkTask::put with
topics/partitions that they might not have created writers for. Thanks for
pointing this out, it's definitely another incompatibility that needs to be
called out and fixed. The overloaded method approach is interesting, but
comes with the caveat of yet more new methods that will need to be
implemented by existing connectors if they want to make use of this new
functionality. What do you think about retaining just the existing methods
but changing when they're called in the Connect runtime? For instance,
instead of calling SinkTask::open after partition assignment post a
consumer group rebalance, we could cache the currently "seen" topic
partitions (post transformation) and before each call to SinkTask::put
check whether there's any new "unseen" topic partitions, and if so call
SinkTask::open (and also update the cache of course). I don't think this
would break the existing contract with sink tasks where SinkTask::open is
expected to be called for a topic partition before any records from the
topic partition are sent via SinkTask::put? The SinkTask::close case is a
lot trickier however, and would require some sort of cache eviction policy
that would be deemed appropriate as you pointed out too.

Thanks,
Yash

On Mon, Feb 6, 2023 at 11:27 PM Chris Egerton 
wrote:

> Hi Yash,
>
> I've had some time to think on this KIP and I think I'm in agreement about
> not blocking it on an official compatibility library or adding the "ack"
> API for sink records.
>
> I only have two more thoughts:
>
> 1. Because it is possible to manipulate sink record partitions and offsets
> with the current API we provide for transformations, I still believe
> methods should be added to the SinkRecord class to expose the original
> partition and offset, not just the original topic. The additional cognitive
> burden from these two methods is going to be minimal anyways; once users
> understand the difference between the transformed topic name and the
> original one, it's going to be trivial for them to understand how that same
> difference applies for partitions and offsets. It's not enough to scan the
> set of SMTs provided out of the box with Connect, ones developed by
> Confluent, or even everything available on GitHub, since there may be
> closed-source projects out there that rely on this ability. One potential
> use case could be re-routing partitions between Kafka and some other
> sharded system.
>
> 2. We still have to address the SinkTask::open [1] and SinkTask::close [2]
> methods. If a connector writes to the external system using the transformed
> topic partitions it reads from Kafka, then it's possible for the connector
> to lazily instantiate writers for topic partitions as it encounters them
> from records provided in SinkTask::put. However, connectors also need a way
> to de-allocate those writers (and the resources used by them) over time,
> which they can't do as easily. One possible approach here is to overload
> SinkTask::open and SinkTask::close with variants that distinguish between
> transformed and original topic partitions, and default to invoking the
> existing methods with just the original topic partitions. We would then
> have several options for how the Connect runtime can invoke these methods,
> but in general, an approach that guarantees that tasks are notified of
> transformed topic partitions in SinkTask::open before any records for that
> partition are given to it in SinkTask::put, and makes a best-effort attempt
> to close transformed topic partitions that appear to no longer be in use
> based on some eviction policy, would probably be sufficient.
>
> [1] -
>
> https://kafka.apache.org/33/javadoc/org/apache/kafka/connect/sink/SinkTask.html#open(java.util.Collection)
> [2] -
>
> https://kafka.apache.org/33/javadoc/org/apache/kafka/connect/sink/SinkTask.html#close(java.util.Collection)
>
> Cheers,
>
> Chris
>
> On Sat, Nov 5, 2022 at 5:46 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Thanks a lot for your inputs!
> >
> > > would provide a simple, clean interface for developers to determine
> > > which features are supported by the ver

Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2023-02-03 Thread Yash Mayya
Hi Mickael,

Thanks for the updates.

> the PluginMetrics implementation will append a
> suffix to sensor names to unique identify
> the plugin (based on the class name and tags).

Can we call this out explicitly in the KIP, since it is important to avoid
clashes in sensor naming? Also, should we allow plugins to retrieve sensors
from `PluginMetrics` if we can check / verify that they own the sensor
(based on the suffix)?

Other than the above minor points, this looks good to me now!

Thanks,
Yash

On Fri, Feb 3, 2023 at 2:29 AM Chris Egerton 
wrote:

> Hi Mickael,
>
> This is looking great. I have one small question left but I do not consider
> it a blocker.
>
> What is the intended use case for PluginMetrics::close? To me at least, it
> implies that plugin developers will be responsible for invoking that method
> themselves in order to clean up metrics that they've created, but wouldn't
> we want the runtime (i.e., KafkaProducer class, Connect framework, etc.) to
> handle that automatically when the resource that the plugin applies to is
> closed?
>
> Cheers,
>
> Chris
>
> On Thu, Jan 26, 2023 at 10:22 AM Mickael Maison 
> wrote:
>
> > Hi Yash,
> >
> > 1) To avoid conflicts with other sensors, the PluginMetrics
> > implementation will append a suffix to sensor names to unique identify
> > the plugin (based on the class name and tags). Also I changed the
> > semantics of the sensor() method to only create sensors (originally it
> > was get or create). If a sensor with the same name already exists, the
> > method will throw.
> > 2) Tags will be automatically added to metrics and sensors to unique
> > identify the plugin. For Connect plugins, the connector name, task id
> > and alias can be added if available. The class implementing
> > PluginMetrics will be similar to ConnectMetrics, as in it will provide
> > a simplified API wrapping Metrics. I'm planning to use PluginMetrics
> > for Connect plugin too and should not need to interact with
> > ConnectMetrics.
> > 3) Right, I fixed the last rejected alternative.
> >
> > Thanks,
> > Mickael
> >
> > On Thu, Jan 26, 2023 at 4:04 PM Mickael Maison  >
> > wrote:
> > >
> > > Hi Federico,
> > >
> > > - The metricName() method does not register anything, it just builds a
> > > MetricName instance which is just a container holding a name, group,
> > > description and tags for a metric. Each time it is called, it returns
> > > a new instance. If called with the same arguments, the returned value
> > > will be equal.
> > > - Initially I just copied the API of Metrics. I made some small
> > > changes so the metric and sensor methods are a bit more similar
> > > - Good catch! I fixed the example.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Thu, Jan 26, 2023 at 3:54 PM Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> > > >
> > > > Hi Chris,
> > > >
> > > > 1) I updated the KIP to only mention the interface.
> > > > 2) This was a mistake. I've added ReplicationPolicy to the list of
> > plugins.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > > On Tue, Jan 24, 2023 at 11:16 AM Yash Mayya 
> > wrote:
> > > > >
> > > > > Hi Mickael,
> > > > >
> > > > > Thanks for the updated KIP, this is looking really good! I had a
> > couple
> > > > > more questions -
> > > > >
> > > > > 1) Sensor names need to be unique across all groups for a `Metrics`
> > > > > instance. How are we planning to avoid naming clashes (both between
> > > > > different plugins as well as with pre-defined sensors)?
> > > > >
> > > > > 2) Connect has a `ConnectMetrics` wrapper around `Metrics` via
> which
> > > > > rebalance / worker / connector / task metrics are recorded. Could
> you
> > > > > please elaborate in the KIP how the plugin metrics for connectors /
> > tasks
> > > > > will inter-operate with this?
> > > > >
> > > > > Another minor point is that the third rejected alternative appears
> > to be an
> > > > > incomplete sentence?
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > > > On Fri, Jan 13, 2023 at 10:56 PM Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > w

Re: [VOTE] KIP-875: First-class offsets support in Kafka Connect

2023-01-24 Thread Yash Mayya
Hi Chris,

I'm +1 (non-binding). Thanks again for proposing this extremely
valuable addition to Kafka Connect!

Thanks,
Yash

On Thu, Jan 19, 2023 at 12:11 AM Chris Egerton 
wrote:

> Hi all,
>
> I'd like to call for a vote on KIP-875, which adds support for viewing and
> manipulating the offsets of connectors to the Kafka Connect REST API.
>
> The KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect
>
> The discussion thread:
> https://lists.apache.org/thread/m5bklnh5w4mwr9nbzrmfk0pftpxfjd02
>
> Cheers,
>
> Chris
>


Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2023-01-24 Thread Yash Mayya
Hi Mickael,

Thanks for the updated KIP, this is looking really good! I had a couple
more questions -

1) Sensor names need to be unique across all groups for a `Metrics`
instance. How are we planning to avoid naming clashes (both between
different plugins as well as with pre-defined sensors)?

2) Connect has a `ConnectMetrics` wrapper around `Metrics` via which
rebalance / worker / connector / task metrics are recorded. Could you
please elaborate in the KIP how the plugin metrics for connectors / tasks
will inter-operate with this?

Another minor point is that the third rejected alternative appears to be an
incomplete sentence?

Thanks,
Yash

On Fri, Jan 13, 2023 at 10:56 PM Mickael Maison 
wrote:

> Hi,
>
> I've updated the KIP based on the feedback.
>
> Now instead of receiving a Metrics instance, plugins get access to
> PluginMetrics that exposes a much smaller API. I've removed the
> special handling for connectors and tasks and they must now implement
> the Monitorable interface as well to use this feature. Finally the
> goal is to allow all plugins (apart from metrics reporters) to use
> this feature. I've listed them all (there are over 30 pluggable APIs)
> but I've not added the list in the KIP. The reason is that new plugins
> could be added in the future and instead I'll focus on adding support
> in all the place that instantiate classes.
>
> Thanks,
> Mickael
>
> On Tue, Jan 10, 2023 at 7:00 PM Mickael Maison 
> wrote:
> >
> > Hi Chris/Yash,
> >
> > Thanks for taking a look and providing feedback.
> >
> > 1) Yes you're right, when using incompatible version, metrics() would
> > trigger NoSuchMethodError. I thought using the context to pass the
> > Metrics object would be more idiomatic for Connect but maybe
> > implementing Monitorable would be simpler. It would also allow other
> > Connect plugins (transformations, converters, etc) to register
> > metrics. So I'll make that change.
> >
> > 2) As mentioned in the rejected alternatives, I considered having a
> > PluginMetrics class/interface with a limited API. But since Metrics is
> > part of the public API, I thought it would be simpler to reuse it.
> > That said you bring interesting points so I took another look today.
> > It's true that the Metrics API is pretty complex and most methods are
> > useless for plugin authors. I'd expect most use cases only need one
> > addMetric and one sensor methods. Rather than subclassing Metrics, I
> > think a delegate/forwarding pattern might work well here. A
> > PluginMetric class would forward its method to the Metrics instance
> > and could perform some basic validations such as only letting plugins
> > delete metrics they created, or automatically injecting tags with the
> > class name for example.
> >
> > 3) Between the clients, brokers, streams and connect, Kafka has quite
> > a lot! In practice I think registering metrics should be beneficial
> > for all plugins, I think the only exception would be metrics reporters
> > (which are instantiated before the Metrics object). I'll try to build
> > a list of all plugin types and add that to the KIP.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> > On Tue, Dec 27, 2022 at 4:54 PM Chris Egerton 
> wrote:
> > >
> > > Hi Yash,
> > >
> > > Yes, a default no-op is exactly what I had in mind should the
> Connector and
> > > Task classes implement the Monitorable interface.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Dec 20, 2022 at 2:46 AM Yash Mayya 
> wrote:
> > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for creating this KIP, this will be a super useful feature to
> > > > enhance existing connectors in the Kafka Connect ecosystem.
> > > >
> > > > I have some similar concerns to the ones that Chris has outlined
> above,
> > > > especially with regard to directly exposing Connect's Metrics object
> to
> > > > plugins. I believe it would be a lot friendlier to developers if we
> instead
> > > > exposed wrapper methods in the context classes - such as one for
> > > > registering a new metric, one for recording metric values and so on.
> This
> > > > would also have the added benefit of minimizing the surface area for
> > > > potential misuse by custom plugins.
> > > >
> > > > > for connectors and tasks they should handle the
> > > > > metrics() method returning null when deployed on
> > > > > an older runtime.
> > > >
> > > &g

Re: [ANNOUNCE] New committer: Walker Carlson

2023-01-17 Thread Yash Mayya
Congrats, Walker!

On Wed, Jan 18, 2023 at 3:27 AM Matthias J. Sax  wrote:

> Dear community,
>
> I am pleased to announce Walker Carlson as a new Kafka committer.
>
> Walker has been contributing to Apache Kafka since November 2019. He
> made various contributions including the following KIPs.
>
> KIP-671: Introduce Kafka Streams Specific Uncaught Exception Handler
> KIP-696: Update Streams FSM to clarify ERROR state meaning
> KIP-715: Expose Committed offset in streams
>
>
> Congratulations Walker and welcome on board!
>
>
> Thanks,
>-Matthias (on behalf of the Apache Kafka PMC)
>


  1   2   >