Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2024-03-31 Thread Kamal Chandraprakash
Hi all,

While testing the patch [1], realised that introducing a
new REMOTE_STORAGE_NOT_READY error-code
is not compatible with the consumer. Consumer does not retry the FETCH
request for all the retriable
exceptions [2] instead it retries only for specific error codes. Dropping
the KIP-1007

as
it is not compatible
with the older clients. Thanks!

[1]: https://github.com/apache/kafka/pull/14822
[2]:
https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java?L325

--
Kamal

On Fri, Jan 5, 2024 at 5:03 PM Divij Vaidya  wrote:

> Thank you for addressing my concerns Kamal. Though, instead of the KIP, I
> actually was suggesting to add it in JavaDoc so that someone looking at the
> exception is able to understand what it means. We can discuss that during
> the PR review though.
>
> The KIP looks good to me.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Jan 5, 2024 at 10:44 AM Satish Duggana 
> wrote:
>
> > Thanks for the KIP Kamal, LGTM.
> >
> > On Tue, 26 Dec 2023 at 10:23, Kamal Chandraprakash
> >  wrote:
> > >
> > > Hi Divij,
> > >
> > > Thanks for reviewing the KIP! I've updated the KIP with the below
> > > documentation. Let me know if it needs to be changed:
> > >
> > > The consumer can read the local data as long as it knows the offset
> from
> > > where to fetch the data from.
> > > When there is no initial offset, the consumer decides the offset based
> on
> > > the below config:
> > >
> > > ```
> > > auto.offset.reset = earliest / latest / none
> > > ```
> > >
> > >- For `earliest` offset policy and any offset that lies in the
> remote
> > >storage, the consumer (FETCH request)
> > >cannot be able to make progress until the remote log metadata gets
> > >synced.
> > >- In a FETCH request, when there are multiple partitions where a
> > subset
> > >of them are consuming from local
> > >and others from remote, then only the partitions which are consuming
> > >from the remote cannot make
> > >progress and the partitions that fetch data from local storage
> should
> > be
> > >able to make progress.
> > >- In a FETCH request, when the fetch-offset for a partition is
> within
> > >the local-storage, then it should be able
> > >to consume the messages.
> > >- All the calls to LIST_OFFETS will fail until the remote log
> metadata
> > >gets synced.
> > >
> > >
> > > The code link that is mentioned is referring to the `LIST_OFFSETS`
> > handler.
> > > Usually, consumers don't use this API
> > > unless it's explicitly called by the user.
> > >
> > >
> > > On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya 
> > > wrote:
> > >
> > > > Thanks for the KIP, Kamal.
> > > >
> > > > The change looks good to me, though, I think we can do a better job
> at
> > > > documenting what the error means for the clients and users.
> > > >
> > > > Correct me if I'm wrong, when remote metadata is being synced on a
> new
> > > > leader, we cannot fetch even the local data (as per [1]), hence,
> > partition
> > > > is considered "unreadable" but writes (and all other operations such
> as
> > > > admin operations) can continue to work on that partition. If my
> > > > understanding is correct, perhaps, please clarify this in the error
> > > > description. In absence of it, it is difficult to determine what this
> > error
> > > > means for operations that can be performed on a partition.
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336
> > > >
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > > > Thanks Luke for reviewing this KIP!
> > > > >
> > > > > If there are no more comments from others, I'll start the VOTE
> since
> > this
> > > > > is a minor KIP.
> > > > >
> > > > > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen 
> wrote:
> > > > >
> > > > > > Hi Kamal,
> > > > > >
> > > > > > Thanks for the KIP!
> > > > > > LGTM.
> > > > > >
> > > > > > Thanks.
> > > > > > Luke
> > > > > >
> > > > > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to start a discussion to introduce a new error
> code
> > for
> > > > > > > retriable remote storage errors. Please take a look at the
> > proposal:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2024-01-05 Thread Divij Vaidya
Thank you for addressing my concerns Kamal. Though, instead of the KIP, I
actually was suggesting to add it in JavaDoc so that someone looking at the
exception is able to understand what it means. We can discuss that during
the PR review though.

The KIP looks good to me.

--
Divij Vaidya



On Fri, Jan 5, 2024 at 10:44 AM Satish Duggana 
wrote:

> Thanks for the KIP Kamal, LGTM.
>
> On Tue, 26 Dec 2023 at 10:23, Kamal Chandraprakash
>  wrote:
> >
> > Hi Divij,
> >
> > Thanks for reviewing the KIP! I've updated the KIP with the below
> > documentation. Let me know if it needs to be changed:
> >
> > The consumer can read the local data as long as it knows the offset from
> > where to fetch the data from.
> > When there is no initial offset, the consumer decides the offset based on
> > the below config:
> >
> > ```
> > auto.offset.reset = earliest / latest / none
> > ```
> >
> >- For `earliest` offset policy and any offset that lies in the remote
> >storage, the consumer (FETCH request)
> >cannot be able to make progress until the remote log metadata gets
> >synced.
> >- In a FETCH request, when there are multiple partitions where a
> subset
> >of them are consuming from local
> >and others from remote, then only the partitions which are consuming
> >from the remote cannot make
> >progress and the partitions that fetch data from local storage should
> be
> >able to make progress.
> >- In a FETCH request, when the fetch-offset for a partition is within
> >the local-storage, then it should be able
> >to consume the messages.
> >- All the calls to LIST_OFFETS will fail until the remote log metadata
> >gets synced.
> >
> >
> > The code link that is mentioned is referring to the `LIST_OFFSETS`
> handler.
> > Usually, consumers don't use this API
> > unless it's explicitly called by the user.
> >
> >
> > On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya 
> > wrote:
> >
> > > Thanks for the KIP, Kamal.
> > >
> > > The change looks good to me, though, I think we can do a better job at
> > > documenting what the error means for the clients and users.
> > >
> > > Correct me if I'm wrong, when remote metadata is being synced on a new
> > > leader, we cannot fetch even the local data (as per [1]), hence,
> partition
> > > is considered "unreadable" but writes (and all other operations such as
> > > admin operations) can continue to work on that partition. If my
> > > understanding is correct, perhaps, please clarify this in the error
> > > description. In absence of it, it is difficult to determine what this
> error
> > > means for operations that can be performed on a partition.
> > >
> > > [1]
> > >
> > >
> https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336
> > >
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Thanks Luke for reviewing this KIP!
> > > >
> > > > If there are no more comments from others, I'll start the VOTE since
> this
> > > > is a minor KIP.
> > > >
> > > > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
> > > >
> > > > > Hi Kamal,
> > > > >
> > > > > Thanks for the KIP!
> > > > > LGTM.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to start a discussion to introduce a new error code
> for
> > > > > > retriable remote storage errors. Please take a look at the
> proposal:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > > > >
> > > > >
> > > >
> > >
>


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2024-01-05 Thread Satish Duggana
Thanks for the KIP Kamal, LGTM.

On Tue, 26 Dec 2023 at 10:23, Kamal Chandraprakash
 wrote:
>
> Hi Divij,
>
> Thanks for reviewing the KIP! I've updated the KIP with the below
> documentation. Let me know if it needs to be changed:
>
> The consumer can read the local data as long as it knows the offset from
> where to fetch the data from.
> When there is no initial offset, the consumer decides the offset based on
> the below config:
>
> ```
> auto.offset.reset = earliest / latest / none
> ```
>
>- For `earliest` offset policy and any offset that lies in the remote
>storage, the consumer (FETCH request)
>cannot be able to make progress until the remote log metadata gets
>synced.
>- In a FETCH request, when there are multiple partitions where a subset
>of them are consuming from local
>and others from remote, then only the partitions which are consuming
>from the remote cannot make
>progress and the partitions that fetch data from local storage should be
>able to make progress.
>- In a FETCH request, when the fetch-offset for a partition is within
>the local-storage, then it should be able
>to consume the messages.
>- All the calls to LIST_OFFETS will fail until the remote log metadata
>gets synced.
>
>
> The code link that is mentioned is referring to the `LIST_OFFSETS` handler.
> Usually, consumers don't use this API
> unless it's explicitly called by the user.
>
>
> On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya 
> wrote:
>
> > Thanks for the KIP, Kamal.
> >
> > The change looks good to me, though, I think we can do a better job at
> > documenting what the error means for the clients and users.
> >
> > Correct me if I'm wrong, when remote metadata is being synced on a new
> > leader, we cannot fetch even the local data (as per [1]), hence, partition
> > is considered "unreadable" but writes (and all other operations such as
> > admin operations) can continue to work on that partition. If my
> > understanding is correct, perhaps, please clarify this in the error
> > description. In absence of it, it is difficult to determine what this error
> > means for operations that can be performed on a partition.
> >
> > [1]
> >
> > https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336
> >
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Thanks Luke for reviewing this KIP!
> > >
> > > If there are no more comments from others, I'll start the VOTE since this
> > > is a minor KIP.
> > >
> > > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
> > >
> > > > Hi Kamal,
> > > >
> > > > Thanks for the KIP!
> > > > LGTM.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to start a discussion to introduce a new error code for
> > > > > retriable remote storage errors. Please take a look at the proposal:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-25 Thread Kamal Chandraprakash
Hi Divij,

Thanks for reviewing the KIP! I've updated the KIP with the below
documentation. Let me know if it needs to be changed:

The consumer can read the local data as long as it knows the offset from
where to fetch the data from.
When there is no initial offset, the consumer decides the offset based on
the below config:

```
auto.offset.reset = earliest / latest / none
```

   - For `earliest` offset policy and any offset that lies in the remote
   storage, the consumer (FETCH request)
   cannot be able to make progress until the remote log metadata gets
   synced.
   - In a FETCH request, when there are multiple partitions where a subset
   of them are consuming from local
   and others from remote, then only the partitions which are consuming
   from the remote cannot make
   progress and the partitions that fetch data from local storage should be
   able to make progress.
   - In a FETCH request, when the fetch-offset for a partition is within
   the local-storage, then it should be able
   to consume the messages.
   - All the calls to LIST_OFFETS will fail until the remote log metadata
   gets synced.


The code link that is mentioned is referring to the `LIST_OFFSETS` handler.
Usually, consumers don't use this API
unless it's explicitly called by the user.


On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya 
wrote:

> Thanks for the KIP, Kamal.
>
> The change looks good to me, though, I think we can do a better job at
> documenting what the error means for the clients and users.
>
> Correct me if I'm wrong, when remote metadata is being synced on a new
> leader, we cannot fetch even the local data (as per [1]), hence, partition
> is considered "unreadable" but writes (and all other operations such as
> admin operations) can continue to work on that partition. If my
> understanding is correct, perhaps, please clarify this in the error
> description. In absence of it, it is difficult to determine what this error
> means for operations that can be performed on a partition.
>
> [1]
>
> https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336
>
>
> --
> Divij Vaidya
>
>
>
> On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Thanks Luke for reviewing this KIP!
> >
> > If there are no more comments from others, I'll start the VOTE since this
> > is a minor KIP.
> >
> > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
> >
> > > Hi Kamal,
> > >
> > > Thanks for the KIP!
> > > LGTM.
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to start a discussion to introduce a new error code for
> > > > retriable remote storage errors. Please take a look at the proposal:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-22 Thread Divij Vaidya
Thanks for the KIP, Kamal.

The change looks good to me, though, I think we can do a better job at
documenting what the error means for the clients and users.

Correct me if I'm wrong, when remote metadata is being synced on a new
leader, we cannot fetch even the local data (as per [1]), hence, partition
is considered "unreadable" but writes (and all other operations such as
admin operations) can continue to work on that partition. If my
understanding is correct, perhaps, please clarify this in the error
description. In absence of it, it is difficult to determine what this error
means for operations that can be performed on a partition.

[1]
https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336


--
Divij Vaidya



On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Thanks Luke for reviewing this KIP!
>
> If there are no more comments from others, I'll start the VOTE since this
> is a minor KIP.
>
> On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
>
> > Hi Kamal,
> >
> > Thanks for the KIP!
> > LGTM.
> >
> > Thanks.
> > Luke
> >
> > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to start a discussion to introduce a new error code for
> > > retriable remote storage errors. Please take a look at the proposal:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > >
> >
>


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-12 Thread Kamal Chandraprakash
Thanks Luke for reviewing this KIP!

If there are no more comments from others, I'll start the VOTE since this
is a minor KIP.

On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:

> Hi Kamal,
>
> Thanks for the KIP!
> LGTM.
>
> Thanks.
> Luke
>
> On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to start a discussion to introduce a new error code for
> > retriable remote storage errors. Please take a look at the proposal:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> >
>


Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-10 Thread Luke Chen
Hi Kamal,

Thanks for the KIP!
LGTM.

Thanks.
Luke

On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi,
>
> I would like to start a discussion to introduce a new error code for
> retriable remote storage errors. Please take a look at the proposal:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
>


[DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-11-22 Thread Kamal Chandraprakash
Hi,

I would like to start a discussion to introduce a new error code for
retriable remote storage errors. Please take a look at the proposal:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception