Re: [DISCUSS] KIP-933 Publish metrics when source connector fails to poll data
Hi Ravindra, One minor thing, the discussion thread URL that you had provided points to an incorrect page. Can you plz update it to this ( https://www.mail-archive.com/dev@kafka.apache.org/msg131894.html)? Thanks! Sagar. On Sun, Jul 2, 2023 at 12:06 AM Ravindra Nath Kakarla < ravindhran...@gmail.com> wrote: > Thanks for reviewing and providing the feedback. > > > 1) Does it make sense to drop the *record *part from the metric name as > it > doesn't seem to serve much purpose? I would rather call the metric as > *source-poll-errors-total > > Yes, "records" is not needed and misleading. > > > Staying on names, I am thinking, does it make more sense to have > *failures* in the name instead of *errors *i.e.* > source-poll-failures-total* and > *source-poll-failures-rate*? What do you think? > > Agree, "failures" is a more appropriate term here. > > > Regarding the inclusion of retriable exceptions, as of today, source > tasks don't retry even in cases of RetriableException. A PR was created to > modify this behaviour (https://github.com/apache/kafka/pull/13726) but the > reason I bring it up is that in that PR, the failures etc for retry context > would be computed from the RetryWithToleranceOperator. I am not sure when > would that get merged, but does it change the failure counting logic in any > ways? > > In my opinion, we should ignore retryable exceptions when SourceTasks > switches to using RetryWithToleranceOperator. I can update the KIP to call > this out. If the PR for this KIP is implemented first, we can include both > retriable and non-retriable exceptions. I can also add a comment on > https://github.com/apache/kafka/pull/13726 to remove them. What do you > think? > > Thank you > > > On Wed, Jun 28, 2023 at 1:09 PM Sagar wrote: > > > Hey Ravindra, > > > > Thanks for the KIP! It appears to be a useful addition to the metrics to > > understand poll related failures which can go untracked as of now. I just > > have a couple of minor comments: > > > > 1) Does it make sense to drop the *record *part from the metric name as > it > > doesn't seem to serve much purpose? I would rather call the metric as > > *source-poll-errors-total > > *and *source-poll-errors-rate*. > > 2) Staying on names, I am thinking, does it make more sense to have > > *failures* in the name instead of *errors *i.e.* > > source-poll-failures-total* and > > *source-poll-failures-rate*? What do you think? > > 3) Regarding the inclusion of retriable exceptions, as of today, source > > tasks don't retry even in cases of RetriableException. A PR was created > to > > modify this behaviour (https://github.com/apache/kafka/pull/13726) but > the > > reason I bring it up is that in that PR, the failures etc for retry > context > > would be computed from the RetryWithToleranceOperator. I am not sure when > > would that get merged, but does it change the failure counting logic in > any > > ways? > > > > Thanks! > > Sagar. > > > > > > On Sun, Jun 25, 2023 at 12:40 AM Ravindra Nath Kakarla < > > ravindhran...@gmail.com> wrote: > > > > > Hello, > > > > > > I would like to start a discussion on KIP-933 to add new metrics to > Kafka > > > Connect that helps monitoring polling failures with source connectors. > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-933%3A+Publish+metrics+when+source+connector+fails+to+poll+data > > > > > > Looking forward to feedback on this. > > > > > > Thank you, > > > Ravindranath > > > > > >
Re: [DISCUSS] KIP-933 Publish metrics when source connector fails to poll data
Hi Ravindra, When you say we should ignore retryable exceptions when SourceTasks switches to using > RetryWithToleranceOperator. you mean the metrics computation should be avoided? Now that I think about it, it might be better to keep the PR and the KIP separated from each other. For now, because there are no retries via RetryToleranceOperator, if poll() fails, we can just count it as a poll failure for both retriable and non-retriable(as you pointed out). Let me know what you think. Thanks! Sagar. On Sun, Jul 2, 2023 at 12:06 AM Ravindra Nath Kakarla < ravindhran...@gmail.com> wrote: > Thanks for reviewing and providing the feedback. > > > 1) Does it make sense to drop the *record *part from the metric name as > it > doesn't seem to serve much purpose? I would rather call the metric as > *source-poll-errors-total > > Yes, "records" is not needed and misleading. > > > Staying on names, I am thinking, does it make more sense to have > *failures* in the name instead of *errors *i.e.* > source-poll-failures-total* and > *source-poll-failures-rate*? What do you think? > > Agree, "failures" is a more appropriate term here. > > > Regarding the inclusion of retriable exceptions, as of today, source > tasks don't retry even in cases of RetriableException. A PR was created to > modify this behaviour (https://github.com/apache/kafka/pull/13726) but the > reason I bring it up is that in that PR, the failures etc for retry context > would be computed from the RetryWithToleranceOperator. I am not sure when > would that get merged, but does it change the failure counting logic in any > ways? > > In my opinion, we should ignore retryable exceptions when SourceTasks > switches to using RetryWithToleranceOperator. I can update the KIP to call > this out. If the PR for this KIP is implemented first, we can include both > retriable and non-retriable exceptions. I can also add a comment on > https://github.com/apache/kafka/pull/13726 to remove them. What do you > think? > > Thank you > > > On Wed, Jun 28, 2023 at 1:09 PM Sagar wrote: > > > Hey Ravindra, > > > > Thanks for the KIP! It appears to be a useful addition to the metrics to > > understand poll related failures which can go untracked as of now. I just > > have a couple of minor comments: > > > > 1) Does it make sense to drop the *record *part from the metric name as > it > > doesn't seem to serve much purpose? I would rather call the metric as > > *source-poll-errors-total > > *and *source-poll-errors-rate*. > > 2) Staying on names, I am thinking, does it make more sense to have > > *failures* in the name instead of *errors *i.e.* > > source-poll-failures-total* and > > *source-poll-failures-rate*? What do you think? > > 3) Regarding the inclusion of retriable exceptions, as of today, source > > tasks don't retry even in cases of RetriableException. A PR was created > to > > modify this behaviour (https://github.com/apache/kafka/pull/13726) but > the > > reason I bring it up is that in that PR, the failures etc for retry > context > > would be computed from the RetryWithToleranceOperator. I am not sure when > > would that get merged, but does it change the failure counting logic in > any > > ways? > > > > Thanks! > > Sagar. > > > > > > On Sun, Jun 25, 2023 at 12:40 AM Ravindra Nath Kakarla < > > ravindhran...@gmail.com> wrote: > > > > > Hello, > > > > > > I would like to start a discussion on KIP-933 to add new metrics to > Kafka > > > Connect that helps monitoring polling failures with source connectors. > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-933%3A+Publish+metrics+when+source+connector+fails+to+poll+data > > > > > > Looking forward to feedback on this. > > > > > > Thank you, > > > Ravindranath > > > > > >
Re: [DISCUSS] KIP-933 Publish metrics when source connector fails to poll data
Thanks for reviewing and providing the feedback. > 1) Does it make sense to drop the *record *part from the metric name as it doesn't seem to serve much purpose? I would rather call the metric as *source-poll-errors-total Yes, "records" is not needed and misleading. > Staying on names, I am thinking, does it make more sense to have *failures* in the name instead of *errors *i.e.* source-poll-failures-total* and *source-poll-failures-rate*? What do you think? Agree, "failures" is a more appropriate term here. > Regarding the inclusion of retriable exceptions, as of today, source tasks don't retry even in cases of RetriableException. A PR was created to modify this behaviour (https://github.com/apache/kafka/pull/13726) but the reason I bring it up is that in that PR, the failures etc for retry context would be computed from the RetryWithToleranceOperator. I am not sure when would that get merged, but does it change the failure counting logic in any ways? In my opinion, we should ignore retryable exceptions when SourceTasks switches to using RetryWithToleranceOperator. I can update the KIP to call this out. If the PR for this KIP is implemented first, we can include both retriable and non-retriable exceptions. I can also add a comment on https://github.com/apache/kafka/pull/13726 to remove them. What do you think? Thank you On Wed, Jun 28, 2023 at 1:09 PM Sagar wrote: > Hey Ravindra, > > Thanks for the KIP! It appears to be a useful addition to the metrics to > understand poll related failures which can go untracked as of now. I just > have a couple of minor comments: > > 1) Does it make sense to drop the *record *part from the metric name as it > doesn't seem to serve much purpose? I would rather call the metric as > *source-poll-errors-total > *and *source-poll-errors-rate*. > 2) Staying on names, I am thinking, does it make more sense to have > *failures* in the name instead of *errors *i.e.* > source-poll-failures-total* and > *source-poll-failures-rate*? What do you think? > 3) Regarding the inclusion of retriable exceptions, as of today, source > tasks don't retry even in cases of RetriableException. A PR was created to > modify this behaviour (https://github.com/apache/kafka/pull/13726) but the > reason I bring it up is that in that PR, the failures etc for retry context > would be computed from the RetryWithToleranceOperator. I am not sure when > would that get merged, but does it change the failure counting logic in any > ways? > > Thanks! > Sagar. > > > On Sun, Jun 25, 2023 at 12:40 AM Ravindra Nath Kakarla < > ravindhran...@gmail.com> wrote: > > > Hello, > > > > I would like to start a discussion on KIP-933 to add new metrics to Kafka > > Connect that helps monitoring polling failures with source connectors. > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-933%3A+Publish+metrics+when+source+connector+fails+to+poll+data > > > > Looking forward to feedback on this. > > > > Thank you, > > Ravindranath > > >
Re: [DISCUSS] KIP-933 Publish metrics when source connector fails to poll data
Hey Ravindra, Thanks for the KIP! It appears to be a useful addition to the metrics to understand poll related failures which can go untracked as of now. I just have a couple of minor comments: 1) Does it make sense to drop the *record *part from the metric name as it doesn't seem to serve much purpose? I would rather call the metric as *source-poll-errors-total *and *source-poll-errors-rate*. 2) Staying on names, I am thinking, does it make more sense to have *failures* in the name instead of *errors *i.e.* source-poll-failures-total* and *source-poll-failures-rate*? What do you think? 3) Regarding the inclusion of retriable exceptions, as of today, source tasks don't retry even in cases of RetriableException. A PR was created to modify this behaviour (https://github.com/apache/kafka/pull/13726) but the reason I bring it up is that in that PR, the failures etc for retry context would be computed from the RetryWithToleranceOperator. I am not sure when would that get merged, but does it change the failure counting logic in any ways? Thanks! Sagar. On Sun, Jun 25, 2023 at 12:40 AM Ravindra Nath Kakarla < ravindhran...@gmail.com> wrote: > Hello, > > I would like to start a discussion on KIP-933 to add new metrics to Kafka > Connect that helps monitoring polling failures with source connectors. > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-933%3A+Publish+metrics+when+source+connector+fails+to+poll+data > > Looking forward to feedback on this. > > Thank you, > Ravindranath >
[DISCUSS] KIP-933 Publish metrics when source connector fails to poll data
Hello, I would like to start a discussion on KIP-933 to add new metrics to Kafka Connect that helps monitoring polling failures with source connectors. https://cwiki.apache.org/confluence/display/KAFKA/KIP-933%3A+Publish+metrics+when+source+connector+fails+to+poll+data Looking forward to feedback on this. Thank you, Ravindranath