[GitHub] [kafka-site] anlance closed pull request #492: Fixed typos in introduction documentation

2023-02-18 Thread via GitHub


anlance closed pull request #492: Fixed typos in introduction documentation
URL: https://github.com/apache/kafka-site/pull/492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-02-18 Thread Yash Mayya
Hi Chris,

> I was actually envisioning something like `void
> open(Collection originalPartitions,
> Collection transformedPartitions)`

Ah okay, this does make a lot more sense. Sorry, I think I misunderstood
you earlier. I do agree with you that this seems better than splitting it
off into two new sets of open / close methods from a complexity standpoint.

> Plus, if a connector is intentionally designed to use
> pre-transformation topic partitions in its open/close
> methods, wouldn't we just be trading one form of the
>  problem for another by making this switch?

On thinking about this a bit more, I'm not so convinced that we need to
expose the pre-transform / original topic partitions in the new open /
close methods. The purpose of the open / close methods is to allow sink
tasks to allocate and deallocate resources for each topic partition
assigned to the task and the purpose of topic-mutating SMTs is to
essentially modify the source topic name from the point of view of the sink
connector. Why would a sink connector ever need to or want to allocate
resources for pre-transform topic partitions? Is the argument here that
since we'll be exposing both the pre-transform and post-transform topic
partitions per record, we should also expose the same info via open / close
and allow sink connector implementations to disregard topic-mutating SMTs
completely if they wanted to?

Either way, I've gone ahead and updated the KIP to reflect all of
our previous discussion here since it had become quite outdated. I've also
updated the KIP title from "Sink Connectors: Support topic-mutating SMTs
for async connectors (preCommit users)" to "Allow sink connectors to be
used with topic-mutating SMTs" since the improvements to the open / close
mechanism doesn't pertain only to asynchronous sink connectors. The new KIP
URL is:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs


Thanks,
Yash

On Tue, Feb 14, 2023 at 11:39 PM Chris Egerton 
wrote:

> Hi Yash,
>
> I was actually envisioning something like `void
> open(Collection
> originalPartitions, Collection transformedPartitions)`,
> since we already convert and transform each batch of records that we poll
> from the sink task's consumer en masse, meaning we could discover several
> new transformed partitions in between consecutive calls to SinkTask::put.
>
> It's also worth noting that we'll probably want to deprecate the existing
> open/close methods, at which point keeping one non-deprecated variant of
> each seems more appealing and less complex than keeping two.
>
> Honestly though, I think we're both on the same page enough that I wouldn't
> object to either approach. We've probably reached the saturation point for
> ROI here and as long as we provide developers a way to get the information
> they need from the runtime and take care to add Javadocs and update our
> docs page (possibly including the connector development quickstart), it
> should be fine.
>
> At this point, it might be worth updating the KIP based on recent
> discussion so that others can see the latest proposal, and we can both take
> a look and make sure everything looks good enough before opening a vote
> thread.
>
> Finally, I think you make a convincing case for a time-based eviction
> policy. I wasn't thinking about the fairly common SMT pattern of deriving a
> topic name from, e.g., a record field or header.
>
> Cheers,
>
> Chris
>
> On Tue, Feb 14, 2023 at 11:42 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > > Plus, if a connector is intentionally designed to
> > > use pre-transformation topic partitions in its
> > > open/close methods, wouldn't we just be trading
> > > one form of the problem for another by making this
> > > switch?
> >
> > Thanks, this makes sense, and given that the KIP already proposes a way
> for
> > sink connector implementations to distinguish between pre-transform and
> > post-transform topics per record, I think I'm convinced that going with
> new
> > `open()` / `close()` methods is the right approach. However, I still feel
> > like having overloaded methods will make it a lot less unintuitive given
> > that the two sets of methods would be different in terms of when they're
> > called and what arguments they are passed (also I'm presuming that the
> > overloaded methods you're prescribing will only have a single
> > `TopicPartition` rather than a `Collection` as their
> > parameters). I guess my concern is largely around the fact that it won't
> be
> > possible to distinguish between the overloaded methods' use cases just
> from
> > the method signatures. I agree that naming is going to be difficult here,
> > but I think that having two sets of `SinkTask::openXyz` /
> > `SinkTask::closeXyz` methods will be less complicated to understand from
> a
> > connector developer perspective (as compared to overloaded methods with
> > only differing documentation). Of your suggested options, I think
> > 

Re: [DISCUSS] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Chia-Ping Tsai



On 2023/02/18 08:44:05 Tom Bentley wrote:
> Hi Chia-Ping,
> 
> To be honest the stateful version, setting an input stream once using the
> `readFrom(InputStream)` method and then repeatedly asking for the next
> record using a parameterless `readRecord()`, seems a bit more natural to me
> than `readRecord(InputStream inputStream)` being called repeatedly with (I
> assume) the same input stream. I think the contract is simpler to describe
> and understand.

I prefer readRecord() also. It is a trade-off between having `Configurable` 
interface and having a parameterless readRecord(). If the `Configurable` is not 
required, I'd like to revert to readRecord(). WDYT?

> 
> It's worth thinking about how implementers might have to read bytes from
> the stream to discover the end of one record and the start of the next.
> Unless we've guaranteed that the input stream supports mark and reset then
> they have to buffer the initial bytes of the next record that they've just
> read from the stream so that they can use them when called next time. So I
> think RecordReaders are (in general) inherently stateful and therefore it
> seems harmless for them to also have the input stream itself as some of
> that state.

you are right. As the input stream is keyboard input, it would be hard to 
expect the number of bytes for one record.



Re: [VOTE] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Chia-Ping Tsai
Dear all,

There is a change after I start this vote.

The package is moved from "org.apache.kafka.common" to 
"org.apache.kafka.clients.tool". The purpose of the change is to enable 
tool-related interfaces to access code from "producer", "consumer", and "admin".

Please verify the change before voting. thanks!
--
chia-ping


Re: [DISCUSS] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Chia-Ping Tsai
Dear all,

I noticed a issue after starting the vote :(

The package "org.apache.kafka.common" disallows to import 
"org.apache.kafka.clients.producer.ProducerRecord". Hence, 
"org.apache.kafka.common.RecordReader" can't reference 
"org.apache.kafka.clients.producer.ProducerRecord" directly. It seems to me the 
tool-related interface should be able to access all public classes from 
"common", "producer", "consumer", and "admin". Accordingly, I'd like to move 
the package of "RecordReader"  from "org.apache.kafka.common" to 
""org.apache.kafka.clients.tool". 

Please take a look at above change.

thanks,
chia-ping


[jira] [Created] (KAFKA-14732) Use an exponential backoff retry mechanism while reconfiguring connector tasks

2023-02-18 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-14732:
--

 Summary: Use an exponential backoff retry mechanism while 
reconfiguring connector tasks
 Key: KAFKA-14732
 URL: https://issues.apache.org/jira/browse/KAFKA-14732
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


Kafka Connect in distributed mode retries infinitely with a fixed retry backoff 
(250 ms) in case of errors arising during connector task reconfiguration. Tasks 
can be "reconfigured" during connector startup (to get the initial task configs 
from the connector), a connector resume or if a connector explicitly requests 
it via its context. Task reconfiguration essentially entails requesting a 
connector instance for its task configs and writing them to the Connect 
cluster's config storage (in case a change in task configs is detected). A 
fixed retry backoff of 250 ms leads to very aggressive retries - consider a 
Debezium connector which attempts to initiate a database connection in its 
[taskConfigs 
method|https://github.com/debezium/debezium/blob/bf347da71ad9b0819998a3bc9754b3cc96cc1563/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java#L63].
 If the connection fails due to something like an invalid login, the Connect 
worker will essentially spam connection attempts frequently and indefinitely 
(until the connector config / database side configs are fixed). An exponential 
backoff retry mechanism seems more well suited for the 
[DistributedHerder::reconfigureConnectorTasksWithRetry|https://github.com/apache/kafka/blob/a54a34a11c1c867ff62a7234334cad5139547fd7/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1873-L1898]
 method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Federico Valeri
+ 1 (non binding)

Thanks

On Sat, Feb 18, 2023 at 9:36 AM Chia-Ping Tsai  wrote:
>
> Hi,
>
> I'd like to start the vote on KIP-614: An new java interface to replace 
> 'kafka.common.MessageReader'
>
> KIP-614: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866569
>
> thread: 
> https://lists.apache.org/thread.html/r6db6708f64345bb8fe0d573e05014fb790e69d501f21f855ca65619a%40%3Cdev.kafka.apache.org%3E
>
> Cheers,
> Chia-Ping


Re: [DISCUSS] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Tom Bentley
Hi Chia-Ping,

To be honest the stateful version, setting an input stream once using the
`readFrom(InputStream)` method and then repeatedly asking for the next
record using a parameterless `readRecord()`, seems a bit more natural to me
than `readRecord(InputStream inputStream)` being called repeatedly with (I
assume) the same input stream. I think the contract is simpler to describe
and understand.

It's worth thinking about how implementers might have to read bytes from
the stream to discover the end of one record and the start of the next.
Unless we've guaranteed that the input stream supports mark and reset then
they have to buffer the initial bytes of the next record that they've just
read from the stream so that they can use them when called next time. So I
think RecordReaders are (in general) inherently stateful and therefore it
seems harmless for them to also have the input stream itself as some of
that state.

Cheers,

Tom

On Sat, 18 Feb 2023 at 08:25, Chia-Ping Tsai  wrote:

>
>
> On 2023/02/17 06:47:18 Luke Chen wrote:
> > Hi Chia-Ping,
> >
> > Thanks for the KIP!
> >
> > Overall LGTM, just one minor comment:
> > Could we log warning messages to users when using deprecated
> MessageReader?
>
> Sure. I will address it when implementing the KIP.
>
>


[VOTE] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Chia-Ping Tsai
Hi,

I'd like to start the vote on KIP-614: An new java interface to replace 
'kafka.common.MessageReader'

KIP-614: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866569

thread: 
https://lists.apache.org/thread.html/r6db6708f64345bb8fe0d573e05014fb790e69d501f21f855ca65619a%40%3Cdev.kafka.apache.org%3E

Cheers,
Chia-Ping

Re: [DISCUSS] KIP-641 An new java interface to replace 'kafka.common.MessageReader'

2023-02-18 Thread Chia-Ping Tsai



On 2023/02/17 06:47:18 Luke Chen wrote:
> Hi Chia-Ping,
> 
> Thanks for the KIP!
> 
> Overall LGTM, just one minor comment:
> Could we log warning messages to users when using deprecated MessageReader?

Sure. I will address it when implementing the KIP.