Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-11-03 Thread Péter Váry
I’m joining the UUIDv7 vs. generic unique identifier discussion a bit late,
so please correct me if I’ve missed any previously mentioned points.

>From what I see, the main potential benefits for UUIDv7 keys are:

   - Debugging: For example, making it easier to trace when a key was
   generated.
   - Server-side optimizations: Such as enabling the server to discard
   stale keys or manage storage for expired keys more efficiently.

Of these, the first (debugging) seems more valuable, but it can also be
addressed through proper logging (including request source, user,
timestamp, and key). If users want to provide additional information to the
server, they can always encode it in the key or use UUIDv7, which is
optional but meets all requirements.
The second benefit is less convincing. Servers shouldn’t rely on any
client-provided data for critical decisions, especially considering the
risk of malicious clients. Also, stale keys should be extremely rare, so
optimizing for them on the server side may not be worthwhile.

Given these points, I would favor option 2.




yun zou  ezt írta (időpont: 2025. nov. 1., Szo,
3:15):

> -- I'm not sure why because the value is present to instruct the client
> for how long it is okay to retry the same request with the same key without
> refreshing the whole state of the operation, and since the only information
> the client has is the client timestamp (which is a local timestamp), I'm
> not sure how the client could infer the server receive timestamp...
>
> My understanding is that this field specifies how long the server retains
> an idempotency key after receiving it. During this retention period, the
> client can decide—based on its local clock—how long to continue reusing the
> same key. For instance, the client may stop using the key once its
> configured lifetime has elapsed since the first use on the client side.
> However, the server does not depend on the client’s timestamp to determine
> when to expire the key; it handles expiration independently.
>
> -- I realize I'm answering a question with another question but what is
> the value of creating a custom format with a timestamp in it vs using a
> standard format with an identifier in it? Is it better to create a special
> function to generate and parse this special identifier because it is the
> "iceberg" idempotency-key format vs actually creating a UUIDv7
> parser/generator (if none are already present in core or 3rd party
> libraries)?
>
> I’m not entirely sure I understand this question, as there was no mention
> of a customized format in the previous discussion. The main point raised
> was about introducing a dedicated field (or possibly another HTTP header in
> this context), such as *idempotency-key-timestamp*. The advantage of
> having a separate field is that it can be managed and evolved
> independently, with a clear and specific purpose.  However, before going
> further, I think the first question we need to answer is whether we
> actually need this field.
>
> Additionally, as Kevin mentioned, both client and server implementations
> have the flexibility to decide what format to produce and what format to
> accept.
>
> Best Regards,
> Yun
>
>
> On Thu, Oct 30, 2025 at 10:03 PM Kevin Liu  wrote:
>
>> Hey folks,
>>
>> Thanks for the great discussion on this topic. I missed the most recent
>> catalog sync but just caught up by watching the recording on YouTube [1]
>> (Thanks again Honah for uploading it). I also reviewed this thread and the
>> GitHub comments.
>>
>> It seems we’re aligned on the proposal, and the remaining question is
>> whether to explicitly require UUIDv7 as the idempotency key for clients and
>> server implementations.
>>
>> I believe the spec as written right now,  "*Key format: UUID (V7
>> preferred) in string format*" and "*idempotency key must be globally
>> unique*", is sufficient to address the original issue: making Iceberg’s
>> REST API mutation requests safely retryable. Clients and servers could also
>> choose to use UUIDv7 as part of their own implementation detail.
>>
>> Personally, I don’t see a strong reason to mandate UUIDv7 in the
>> specification. Doing so feels overly prescriptive and restrictive for both
>> client and server implementations.
>>
>> Most resources on idempotency simply require keys to be “universally
>> unique” without prescribing a specific version. For example,
>> Microsoft/Azure documentation states:
>> ```
>> The value of the client-request-id header MUST be a globally unique
>> identifier (GUID) in standard string representation
>> ```[2]
>> And further clarifies:
>> ```
>> globally unique identifier (GUID): A term used interchangeably with
>> universally unique identifier (UUID) in Microsoft protocol technical
>> documents (TDs). Interchanging the usage of these terms does not imply or
>> require a specific algorithm or mechanism to generate the value.
>> Specifically, the use of this term does not imply or require that the
>> algorithms describ

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-31 Thread yun zou
-- I'm not sure why because the value is present to instruct the client for
how long it is okay to retry the same request with the same key without
refreshing the whole state of the operation, and since the only information
the client has is the client timestamp (which is a local timestamp), I'm
not sure how the client could infer the server receive timestamp...

My understanding is that this field specifies how long the server retains
an idempotency key after receiving it. During this retention period, the
client can decide—based on its local clock—how long to continue reusing the
same key. For instance, the client may stop using the key once its
configured lifetime has elapsed since the first use on the client side.
However, the server does not depend on the client’s timestamp to determine
when to expire the key; it handles expiration independently.

-- I realize I'm answering a question with another question but what is the
value of creating a custom format with a timestamp in it vs using a
standard format with an identifier in it? Is it better to create a special
function to generate and parse this special identifier because it is the
"iceberg" idempotency-key format vs actually creating a UUIDv7
parser/generator (if none are already present in core or 3rd party
libraries)?

I’m not entirely sure I understand this question, as there was no mention
of a customized format in the previous discussion. The main point raised
was about introducing a dedicated field (or possibly another HTTP header in
this context), such as *idempotency-key-timestamp*. The advantage of having
a separate field is that it can be managed and evolved independently, with
a clear and specific purpose.  However, before going further, I think the
first question we need to answer is whether we actually need this field.

Additionally, as Kevin mentioned, both client and server implementations
have the flexibility to decide what format to produce and what format to
accept.

Best Regards,
Yun


On Thu, Oct 30, 2025 at 10:03 PM Kevin Liu  wrote:

> Hey folks,
>
> Thanks for the great discussion on this topic. I missed the most recent
> catalog sync but just caught up by watching the recording on YouTube [1]
> (Thanks again Honah for uploading it). I also reviewed this thread and the
> GitHub comments.
>
> It seems we’re aligned on the proposal, and the remaining question is
> whether to explicitly require UUIDv7 as the idempotency key for clients and
> server implementations.
>
> I believe the spec as written right now,  "*Key format: UUID (V7
> preferred) in string format*" and "*idempotency key must be globally
> unique*", is sufficient to address the original issue: making Iceberg’s
> REST API mutation requests safely retryable. Clients and servers could also
> choose to use UUIDv7 as part of their own implementation detail.
>
> Personally, I don’t see a strong reason to mandate UUIDv7 in the
> specification. Doing so feels overly prescriptive and restrictive for both
> client and server implementations.
>
> Most resources on idempotency simply require keys to be “universally
> unique” without prescribing a specific version. For example,
> Microsoft/Azure documentation states:
> ```
> The value of the client-request-id header MUST be a globally unique
> identifier (GUID) in standard string representation
> ```[2]
> And further clarifies:
> ```
> globally unique identifier (GUID): A term used interchangeably with
> universally unique identifier (UUID) in Microsoft protocol technical
> documents (TDs). Interchanging the usage of these terms does not imply or
> require a specific algorithm or mechanism to generate the value.
> Specifically, the use of this term does not imply or require that the
> algorithms described in [RFC4122] or [C706] have to be used for generating
> the GUID.
> ```[3]
> From an implementation perspective, Microsoft’s Iceberg REST Catalog
> (OneLake Table API) will likely use UUIDv4, which aligns with our existing
> infrastructure.
>
> Several folks have highlighted that UUIDv7 offers potential optimizations
> for server-side logic. Since servers are primarily responsible for
> enforcing idempotency guarantees, they can choose to require UUIDv7 if
> desired.. The spec as written allows this:
> * If the client provides the correct key format and the server validates
> it, proceed.
> * If the client provides an incorrect format, the server can reject the
> request.
>
> This approach is consistent with the Idempotency-Key HTTP header
> documentation as "Server responsibilities" [4]. Amogh also mentioned
> similar behavior in the PR comment:
> https://github.com/apache/iceberg/pull/14196#discussion_r2400619934.
>
>
> *In summary, I’m +1 on option (2): the spec should not require a specific
> UUID version.*
>
> Best,
> Kevin Liu
>
>
> [1] Iceberg Catalog Community Sync Oct 22nd 2025,
> https://youtu.be/rj9wGVPeTY0?si=Qj3fv2h8SaWTCvs3&t=2660
> [2]
> https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-kpp/57256

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-30 Thread Kevin Liu
Hey folks,

Thanks for the great discussion on this topic. I missed the most recent
catalog sync but just caught up by watching the recording on YouTube [1]
(Thanks again Honah for uploading it). I also reviewed this thread and the
GitHub comments.

It seems we’re aligned on the proposal, and the remaining question is
whether to explicitly require UUIDv7 as the idempotency key for clients and
server implementations.

I believe the spec as written right now,  "*Key format: UUID (V7 preferred)
in string format*" and "*idempotency key must be globally unique*",
is sufficient to address the original issue: making Iceberg’s REST API
mutation requests safely retryable. Clients and servers could also choose
to use UUIDv7 as part of their own implementation detail.

Personally, I don’t see a strong reason to mandate UUIDv7 in the
specification. Doing so feels overly prescriptive and restrictive for both
client and server implementations.

Most resources on idempotency simply require keys to be “universally
unique” without prescribing a specific version. For example,
Microsoft/Azure documentation states:
```
The value of the client-request-id header MUST be a globally unique
identifier (GUID) in standard string representation
```[2]
And further clarifies:
```
globally unique identifier (GUID): A term used interchangeably with
universally unique identifier (UUID) in Microsoft protocol technical
documents (TDs). Interchanging the usage of these terms does not imply or
require a specific algorithm or mechanism to generate the value.
Specifically, the use of this term does not imply or require that the
algorithms described in [RFC4122] or [C706] have to be used for generating
the GUID.
```[3]
>From an implementation perspective, Microsoft’s Iceberg REST Catalog
(OneLake Table API) will likely use UUIDv4, which aligns with our existing
infrastructure.

Several folks have highlighted that UUIDv7 offers potential optimizations
for server-side logic. Since servers are primarily responsible for
enforcing idempotency guarantees, they can choose to require UUIDv7 if
desired.. The spec as written allows this:
* If the client provides the correct key format and the server validates
it, proceed.
* If the client provides an incorrect format, the server can reject the
request.

This approach is consistent with the Idempotency-Key HTTP header
documentation as "Server responsibilities" [4]. Amogh also mentioned
similar behavior in the PR comment:
https://github.com/apache/iceberg/pull/14196#discussion_r2400619934.


*In summary, I’m +1 on option (2): the spec should not require a specific
UUID version.*

Best,
Kevin Liu


[1] Iceberg Catalog Community Sync Oct 22nd 2025,
https://youtu.be/rj9wGVPeTY0?si=Qj3fv2h8SaWTCvs3&t=2660
[2]
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-kpp/57256b09-b8be-43f8-be95-dfdb00ffcf56
[3]
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-kpp/1335b5f2-435a-41ca-8e0c-ad209814f98d#gt_f49694cc-c350-462d-ab8e-816f0103c6c1
[4]
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Idempotency-Key#server_responsibilities

On Thu, Oct 30, 2025 at 5:33 PM Laurent Goujon 
wrote:

> I'm going over the replies and the github comments so it's taking me a
> while to do a full reply to your message, but I also noticed you mentioned
> a couple of times intermediaries/proxies as actors which could also retry
> the request, and I don't see any support for it at the RFC level nor at the
> IRC spec level either. The likely reason is that the interpretation of the
> key, the expiration policy and the error messages is defined at the
> application level, not at the transport level so I don't see a generic
> proxy being capable of doing such a thing in the current state of the RFC.
>
> On Wed, Oct 29, 2025 at 10:01 PM Dennis Huo  wrote:
>
>> +1 to Option 2 for me.
>>
>> From what I remember of the meeting (it will be nice to review once the
>> recording is posted to make sure we didn't forget any important
>> viewpoints), the major concern wasn't just difficulty of client-side
>> implementations of UUIDv7, but two aspects of being risky to allow
>> underspecification:
>>
>> 1. In the same vein as what Yun said here, some people expressed a
>> preference for an explicit separate field to be more explicit about intent
>> and semantics instead of relying on the embedded portion of the UUIDv7
>> 2. There was concern about letting servers have divergent *behaviors*
>> that depend on interpreting the UUIDv7 in ways not prescribed in the spec
>>
>> For the second concern, my take on it is that if we're really going to
>> intend for servers to use it in conjunction with their
>> idempotency-key-expiration window to reject old requests, we should go all
>> the way and really specify the time-range rejection behaviors very
>> explicitly.
>>
>> Otherwise, overspecifying half of the protocol (UUIDv7) while
>> underspecifying the server-side behaviors could lead to problems down the
>> road.

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-30 Thread Laurent Goujon
I'm going over the replies and the github comments so it's taking me a
while to do a full reply to your message, but I also noticed you mentioned
a couple of times intermediaries/proxies as actors which could also retry
the request, and I don't see any support for it at the RFC level nor at the
IRC spec level either. The likely reason is that the interpretation of the
key, the expiration policy and the error messages is defined at the
application level, not at the transport level so I don't see a generic
proxy being capable of doing such a thing in the current state of the RFC.

On Wed, Oct 29, 2025 at 10:01 PM Dennis Huo  wrote:

> +1 to Option 2 for me.
>
> From what I remember of the meeting (it will be nice to review once the
> recording is posted to make sure we didn't forget any important
> viewpoints), the major concern wasn't just difficulty of client-side
> implementations of UUIDv7, but two aspects of being risky to allow
> underspecification:
>
> 1. In the same vein as what Yun said here, some people expressed a
> preference for an explicit separate field to be more explicit about intent
> and semantics instead of relying on the embedded portion of the UUIDv7
> 2. There was concern about letting servers have divergent *behaviors* that
> depend on interpreting the UUIDv7 in ways not prescribed in the spec
>
> For the second concern, my take on it is that if we're really going to
> intend for servers to use it in conjunction with their
> idempotency-key-expiration window to reject old requests, we should go all
> the way and really specify the time-range rejection behaviors very
> explicitly.
>
> Otherwise, overspecifying half of the protocol (UUIDv7) while
> underspecifying the server-side behaviors could lead to problems down the
> road.
>
> There was a comment in the PR discussion (
> https://github.com/apache/iceberg/pull/14196#discussion_r2453115088)
> which I think is worth discussing further here:
>
> if a client is misbehaving but assume things are fine because one server
>> is not complaining, but things start to go wrong when the client is
>> addressing a different server (like rejecting a non v7 uuid or an old one),
>> then it's a major client issue.
>
>
>
> IMHO server is not required to do anything (it's not even required to
>> support idempotency key) but if it chooses to, it should be able to
>> leverage as much information the key can give.
>
>
> The problem here is that if the spec is underspecified, then it's not
> clearly defined whether the client is "misbehaving" in those underspecified
> scenarios - for example, is the client misbehaving if their clock skews
> from the server by 1 second? 10 seconds? 60 seconds? 300 seconds? Is it
> misbehaving if its time appears to be "in the future" even after accounting
> for an allowed clock skew? If the client doesn't know whether intermediate
> proxies are retrying, can it *rely* on the server to reject requests older
> than the idempotency-key-lifetime? Or does it still need to proactively
> guarantee no retries older than the idempotency key lifetime?
>
> The goal of the spec and consistent server implementations is to help
> ensure a good experience for the client-side, so it immediately knows
> whether it's misbehaving.
>
> I disagree with allowing the server to leverage as much information as the
> key can give when it comes to leveraging information for actual behavioral
> decisions. It's one thing to choose not to implement an optional feature,
> and advertise the server capabilities as such, but another thing for
> idiosyncratic behaviors to start arising when a feature *is* implemented.
> It's certainly true that it's *possible* to leave it unspecified and let
> various de-facto standards emerge organically, but that seems to be the
> kind of divergence we're precisely trying to prevent by having this shared
> REST spec, just like the stricter web standards enforcement in the early
> 2000s/2010s to make cross-browser development manageable.
>
> A thought experiment - if we were to specify UUIDv1 as required and leave
> the usage of the internals of UUIDv1 underspecified, would it be
> appropriate for a server to choose to use the MAC-address portion (bits 80
> through 127) for special behavioral handling that draws conclusion about
> multiple requests coming from the (allegedly) same client machine? For
> example, one could imagine fancy heuristics where one decides to reduce an
> idempotency-key retention window after consecutive requests from the same
> client machine indicate that the network connection from that machine is
> "healthy".
>
> Overall, it seems like getting the lifecycle-window handling fully thought
> through will require some work, and for such a complex piece it seems
> cleaner to have separation of concerns here, with the UUID in itself just
> serving the purpose of being a UUID and a separate explicit timestamp field
> allowing better reasoning about protocol expectations, or lack thereof
> (e.g. a client

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-30 Thread Laurent Goujon
>
> Yun has a good point here. Current spec PR for "idempotency-key-lifetime"
> should be "server-receive timestamp".


I'm not sure why because the value is present to instruct the client for
how long it is okay to retry the same request with the same key without
refreshing the whole state of the operation, and since the only information
the client has is the client timestamp (which is a local timestamp), I'm
not sure how the client could infer the server receive timestamp...

If we really want to protect against this scenario, should a separate
> timestamp field be used? This question has been raised by Dennis and Ryan
> in the community sync, and Yun above in this thread.


I realize I'm answering a question with another question but what is the
value of creating a custom format with a timestamp in it vs using a
standard format with an identifier in it? Is it better to create a special
function to generate and parse this special identifier because it is the
"iceberg" idempotency-key format vs actually creating a UUIDv7
parser/generator (if none are already present in core or 3rd party
libraries)?

On Wed, Oct 29, 2025 at 8:05 PM Steven Wu  wrote:

> > When the server decides to expire a key, will it rely on the
> server-receive timestamp, the client request timestamp, or the key creation
> timestamp?
>
> Yun has a good point here. Current spec PR for "idempotency-key-lifetime"
> should be "server-receive timestamp".
>
> > Now, let's say a client is non-conformant and is reusing the same key or
> is not applying correctly the lifetime directive, but send the same request
> to the server which has expired it (so it's a new request for the server),
> it could have the potential to cause some kind of corruption then, while at
> the same time adding a time component to the key would have prevented the
> issue?
>
> If we really want to protect against this scenario, should a separate
> timestamp field be used? This question has been raised by Dennis and Ryan
> in the community sync, and Yun above in this thread.
>
>
>
> On Wed, Oct 29, 2025 at 7:19 PM yun zou 
> wrote:
>
>> Hi All,
>>
>> It sounds like the idea is to use the timestamp component to help
>> determine whether a key should expire. However, I’m not clear on how
>> exactly the timestamp would be used for this purpose. When the server
>> decides to expire a key, will it rely on the server-receive timestamp,
>> the client request timestamp, or the key creation timestamp?
>>
>> If the timestamp component within the key is intended to represent the
>> key creation time, that raises a couple of concerns:
>> 1. A key could be created well before it’s actually used.
>> 2. Clock skew between the client and server could lead to inconsistent
>> expiration behavior.
>>
>> If the server is responsible for managing the key lifecycle, it’s
>> generally more robust and consistent to rely on the server clock for
>> expiration decisions rather than client-provided timestamps.
>>
>> Additionally, if the timestamp is an important piece of information,
>> it might be cleaner to make it an explicit field instead of
>> overloading the key itself with multiple purposes. Having a separate,
>> well-defined field would make the specification clearer and easier to
>> maintain.
>>
>> From the client’s perspective, requiring the use of UUIDv7 introduces
>> unnecessary constraints on implementation. That said, clients are free
>> to adopt UUIDv7 if they prefer. Since the server ultimately manages
>> expiration, it’s generally better to keep the client logic simple and
>> decoupled from server-side decisions.
>>
>> Best Regards,
>> Yun
>>
>> On Wed, Oct 29, 2025 at 12:50 PM Dmitri Bourlatchkov 
>> wrote:
>> >
>> > Hi All,
>> >
>> > From my POV (and I may be repeating what I put in GH comments), the
>> main point in using UUID v7 is specifying that a timestamp should be part
>> of the idempotency key. As previously discussed, having this timestamp is
>> beneficial to server implementations.
>> >
>> > The IETF Idempotency Key draft v7 [1] allows servers to require
>> specific ID generation algorithms.
>> >
>> > We could have a custom ID format, but UUID v7 is already defined and
>> fits this use case.
>> >
>> > If for some reason UUID v7 becomes "weak" in the future, such an event
>> will have a much greater impact than the REST Catalog API. In any case, if
>> that happens, nothing prevents revisioning the REST API spec to allow for
>> stronger ID generators.
>> >
>> > [1]
>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client
>> >
>> > Cheers,
>> > Dmitri.
>> >
>> > On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu  wrote:
>> >>
>> >> +1 on option 2: don’t mandate a specific key format.
>> >>
>> >> Concerns with option 1 (UUIDv7-mandatory):
>> >> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re
>> stuck with a brittle contract.
>> >> 2. Unnecessary constraints. It binds both client and server
>> implementations. One of IRC’s goa

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-29 Thread Dennis Huo
+1 to Option 2 for me.

>From what I remember of the meeting (it will be nice to review once the
recording is posted to make sure we didn't forget any important
viewpoints), the major concern wasn't just difficulty of client-side
implementations of UUIDv7, but two aspects of being risky to allow
underspecification:

1. In the same vein as what Yun said here, some people expressed a
preference for an explicit separate field to be more explicit about intent
and semantics instead of relying on the embedded portion of the UUIDv7
2. There was concern about letting servers have divergent *behaviors* that
depend on interpreting the UUIDv7 in ways not prescribed in the spec

For the second concern, my take on it is that if we're really going to
intend for servers to use it in conjunction with their
idempotency-key-expiration window to reject old requests, we should go all
the way and really specify the time-range rejection behaviors very
explicitly.

Otherwise, overspecifying half of the protocol (UUIDv7) while
underspecifying the server-side behaviors could lead to problems down the
road.

There was a comment in the PR discussion (
https://github.com/apache/iceberg/pull/14196#discussion_r2453115088) which
I think is worth discussing further here:

if a client is misbehaving but assume things are fine because one server is
> not complaining, but things start to go wrong when the client is addressing
> a different server (like rejecting a non v7 uuid or an old one), then it's
> a major client issue.



IMHO server is not required to do anything (it's not even required to
> support idempotency key) but if it chooses to, it should be able to
> leverage as much information the key can give.


The problem here is that if the spec is underspecified, then it's not
clearly defined whether the client is "misbehaving" in those underspecified
scenarios - for example, is the client misbehaving if their clock skews
from the server by 1 second? 10 seconds? 60 seconds? 300 seconds? Is it
misbehaving if its time appears to be "in the future" even after accounting
for an allowed clock skew? If the client doesn't know whether intermediate
proxies are retrying, can it *rely* on the server to reject requests older
than the idempotency-key-lifetime? Or does it still need to proactively
guarantee no retries older than the idempotency key lifetime?

The goal of the spec and consistent server implementations is to help
ensure a good experience for the client-side, so it immediately knows
whether it's misbehaving.

I disagree with allowing the server to leverage as much information as the
key can give when it comes to leveraging information for actual behavioral
decisions. It's one thing to choose not to implement an optional feature,
and advertise the server capabilities as such, but another thing for
idiosyncratic behaviors to start arising when a feature *is* implemented.
It's certainly true that it's *possible* to leave it unspecified and let
various de-facto standards emerge organically, but that seems to be the
kind of divergence we're precisely trying to prevent by having this shared
REST spec, just like the stricter web standards enforcement in the early
2000s/2010s to make cross-browser development manageable.

A thought experiment - if we were to specify UUIDv1 as required and leave
the usage of the internals of UUIDv1 underspecified, would it be
appropriate for a server to choose to use the MAC-address portion (bits 80
through 127) for special behavioral handling that draws conclusion about
multiple requests coming from the (allegedly) same client machine? For
example, one could imagine fancy heuristics where one decides to reduce an
idempotency-key retention window after consecutive requests from the same
client machine indicate that the network connection from that machine is
"healthy".

Overall, it seems like getting the lifecycle-window handling fully thought
through will require some work, and for such a complex piece it seems
cleaner to have separation of concerns here, with the UUID in itself just
serving the purpose of being a UUID and a separate explicit timestamp field
allowing better reasoning about protocol expectations, or lack thereof
(e.g. a client that doesn't have a reliable clock, but is only worried
about intermediate proxies performing retries can still be successful in
using UUIDv4 without giving the explicit idempotency-key-timestamp, if
intermediate proxies can only cause retries for a duration much shorter
than the key lifetime).


On Wed, Oct 29, 2025 at 8:05 PM Steven Wu  wrote:

> > When the server decides to expire a key, will it rely on the
> server-receive timestamp, the client request timestamp, or the key creation
> timestamp?
>
> Yun has a good point here. Current spec PR for "idempotency-key-lifetime"
> should be "server-receive timestamp".
>
> > Now, let's say a client is non-conformant and is reusing the same key or
> is not applying correctly the lifetime directive, but send the same request

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-29 Thread Steven Wu
> When the server decides to expire a key, will it rely on the
server-receive timestamp, the client request timestamp, or the key creation
timestamp?

Yun has a good point here. Current spec PR for "idempotency-key-lifetime"
should be "server-receive timestamp".

> Now, let's say a client is non-conformant and is reusing the same key or
is not applying correctly the lifetime directive, but send the same request
to the server which has expired it (so it's a new request for the server),
it could have the potential to cause some kind of corruption then, while at
the same time adding a time component to the key would have prevented the
issue?

If we really want to protect against this scenario, should a separate
timestamp field be used? This question has been raised by Dennis and Ryan
in the community sync, and Yun above in this thread.



On Wed, Oct 29, 2025 at 7:19 PM yun zou  wrote:

> Hi All,
>
> It sounds like the idea is to use the timestamp component to help
> determine whether a key should expire. However, I’m not clear on how
> exactly the timestamp would be used for this purpose. When the server
> decides to expire a key, will it rely on the server-receive timestamp,
> the client request timestamp, or the key creation timestamp?
>
> If the timestamp component within the key is intended to represent the
> key creation time, that raises a couple of concerns:
> 1. A key could be created well before it’s actually used.
> 2. Clock skew between the client and server could lead to inconsistent
> expiration behavior.
>
> If the server is responsible for managing the key lifecycle, it’s
> generally more robust and consistent to rely on the server clock for
> expiration decisions rather than client-provided timestamps.
>
> Additionally, if the timestamp is an important piece of information,
> it might be cleaner to make it an explicit field instead of
> overloading the key itself with multiple purposes. Having a separate,
> well-defined field would make the specification clearer and easier to
> maintain.
>
> From the client’s perspective, requiring the use of UUIDv7 introduces
> unnecessary constraints on implementation. That said, clients are free
> to adopt UUIDv7 if they prefer. Since the server ultimately manages
> expiration, it’s generally better to keep the client logic simple and
> decoupled from server-side decisions.
>
> Best Regards,
> Yun
>
> On Wed, Oct 29, 2025 at 12:50 PM Dmitri Bourlatchkov 
> wrote:
> >
> > Hi All,
> >
> > From my POV (and I may be repeating what I put in GH comments), the main
> point in using UUID v7 is specifying that a timestamp should be part of the
> idempotency key. As previously discussed, having this timestamp is
> beneficial to server implementations.
> >
> > The IETF Idempotency Key draft v7 [1] allows servers to require specific
> ID generation algorithms.
> >
> > We could have a custom ID format, but UUID v7 is already defined and
> fits this use case.
> >
> > If for some reason UUID v7 becomes "weak" in the future, such an event
> will have a much greater impact than the REST Catalog API. In any case, if
> that happens, nothing prevents revisioning the REST API spec to allow for
> stronger ID generators.
> >
> > [1]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client
> >
> > Cheers,
> > Dmitri.
> >
> > On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu  wrote:
> >>
> >> +1 on option 2: don’t mandate a specific key format.
> >>
> >> Concerns with option 1 (UUIDv7-mandatory):
> >> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re
> stuck with a brittle contract.
> >> 2. Unnecessary constraints. It binds both client and server
> implementations. One of IRC’s goals is to simplify client work; forcing
> UUIDv7 limits client choices for marginal gain (the embedded timestamp).
> >>
> >> Here are existing implementations for reference:
> >>
> >> Stripe[1]: recommends UUIDv4 but does not enforce a format for
> idempotency keys.
> >> AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
> characters for the client token.
> >>
> >> I'd propose to treat the idempotency key as an opaque string with basic
> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
> fine”) but avoid making the format mandatory. This keeps the API
> future-proof and client-friendly while preserving server-side flexibility.
> >>
> >> 1. https://docs.stripe.com/api/expanding_objects
> >> 2.
> https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
> >>
> >> Yufei
> >>
> >>
> >> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao 
> wrote:
> >>>
> >>> Hi Yun,
> >>> Thanks for the thoughtful feedback!
> >>>
> >>> Yes, the key itself is expected to be globally unique. You’re also
> right that we don’t need to mandate UUIDs to achieve that; other schemes
> can provide global uniqueness.
> >>>
> >>> I have chosen UUID because several folks in the community prefer it as
> a common, interoperable choice. That said,

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-29 Thread yun zou
Hi All,

It sounds like the idea is to use the timestamp component to help
determine whether a key should expire. However, I’m not clear on how
exactly the timestamp would be used for this purpose. When the server
decides to expire a key, will it rely on the server-receive timestamp,
the client request timestamp, or the key creation timestamp?

If the timestamp component within the key is intended to represent the
key creation time, that raises a couple of concerns:
1. A key could be created well before it’s actually used.
2. Clock skew between the client and server could lead to inconsistent
expiration behavior.

If the server is responsible for managing the key lifecycle, it’s
generally more robust and consistent to rely on the server clock for
expiration decisions rather than client-provided timestamps.

Additionally, if the timestamp is an important piece of information,
it might be cleaner to make it an explicit field instead of
overloading the key itself with multiple purposes. Having a separate,
well-defined field would make the specification clearer and easier to
maintain.

>From the client’s perspective, requiring the use of UUIDv7 introduces
unnecessary constraints on implementation. That said, clients are free
to adopt UUIDv7 if they prefer. Since the server ultimately manages
expiration, it’s generally better to keep the client logic simple and
decoupled from server-side decisions.

Best Regards,
Yun

On Wed, Oct 29, 2025 at 12:50 PM Dmitri Bourlatchkov  wrote:
>
> Hi All,
>
> From my POV (and I may be repeating what I put in GH comments), the main 
> point in using UUID v7 is specifying that a timestamp should be part of the 
> idempotency key. As previously discussed, having this timestamp is beneficial 
> to server implementations.
>
> The IETF Idempotency Key draft v7 [1] allows servers to require specific ID 
> generation algorithms.
>
> We could have a custom ID format, but UUID v7 is already defined and fits 
> this use case.
>
> If for some reason UUID v7 becomes "weak" in the future, such an event will 
> have a much greater impact than the REST Catalog API. In any case, if that 
> happens, nothing prevents revisioning the REST API spec to allow for stronger 
> ID generators.
>
> [1] 
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client
>
> Cheers,
> Dmitri.
>
> On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu  wrote:
>>
>> +1 on option 2: don’t mandate a specific key format.
>>
>> Concerns with option 1 (UUIDv7-mandatory):
>> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck 
>> with a brittle contract.
>> 2. Unnecessary constraints. It binds both client and server implementations. 
>> One of IRC’s goals is to simplify client work; forcing UUIDv7 limits client 
>> choices for marginal gain (the embedded timestamp).
>>
>> Here are existing implementations for reference:
>>
>> Stripe[1]: recommends UUIDv4 but does not enforce a format for idempotency 
>> keys.
>> AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII 
>> characters for the client token.
>>
>> I'd propose to treat the idempotency key as an opaque string with basic 
>> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are 
>> fine”) but avoid making the format mandatory. This keeps the API 
>> future-proof and client-friendly while preserving server-side flexibility.
>>
>> 1. https://docs.stripe.com/api/expanding_objects
>> 2. https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
>>
>> Yufei
>>
>>
>> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao  wrote:
>>>
>>> Hi Yun,
>>> Thanks for the thoughtful feedback!
>>>
>>> Yes, the key itself is expected to be globally unique. You’re also right 
>>> that we don’t need to mandate UUIDs to achieve that; other schemes can 
>>> provide global uniqueness.
>>>
>>> I have chosen UUID because several folks in the community prefer it as a 
>>> common, interoperable choice. That said, I agree that mandating UUIDv7 adds 
>>> constraints on clients without clear spec-level benefit.
>>>
>>> I also agree we should separate spec from implementation; details like the 
>>> key generation method can live in implementation guidance.
>>>
>>> From your note, it sounds like you support Option 2 
>>> (version-agnostic)—i.e., require a “globally unique idempotency key” and 
>>> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while 
>>> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this 
>>> as a +1 for Option 2.
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>>
>>> On Fri, Oct 24, 2025 at 7:00 PM yun zou  wrote:

 Sorry, I accidentally sent the email before complete, please ignore my
 previous email. Sorry for the noise and inconvenience.

 Hi Huaxin,

 This is a really interesting and valuable proposal — it provides a
 great way to address the issue of duplicate client requests. Thank you
 for proposing and driving this forwar

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-29 Thread Dmitri Bourlatchkov
Hi All,

>From my POV (and I may be repeating what I put in GH comments), the main
point in using UUID v7 is specifying that a timestamp should be part of the
idempotency key. As previously discussed, having this timestamp is
beneficial to server implementations.

The IETF Idempotency Key draft v7 [1] allows servers to require specific ID
generation algorithms.

We could have a custom ID format, but UUID v7 is already defined and fits
this use case.

If for some reason UUID v7 becomes "weak" in the future, such an event will
have a much greater impact than the REST Catalog API. In any case, if that
happens, nothing prevents revisioning the REST API spec to allow for
stronger ID generators.

[1]
https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07#name-client

Cheers,
Dmitri.

On Mon, Oct 27, 2025 at 2:33 PM Yufei Gu  wrote:

> +1 on option 2: don’t mandate a specific key format.
>
> Concerns with option 1 (UUIDv7-mandatory):
> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck
> with a brittle contract.
> 2. Unnecessary constraints. It binds both client and server
> implementations. One of IRC’s goals is to simplify client work; forcing
> UUIDv7 limits client choices for marginal gain (the embedded timestamp).
>
> Here are existing implementations for reference:
>
>- Stripe[1]: recommends UUIDv4 but does not enforce a format for
>idempotency keys.
>- AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
>characters for the client token.
>
> I'd propose to treat the idempotency key as an opaque string with basic
> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
> fine”) but avoid making the format mandatory. This keeps the API
> future-proof and client-friendly while preserving server-side flexibility.
>
> 1. https://docs.stripe.com/api/expanding_objects
> 2.
> https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
>
> Yufei
>
>
> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao  wrote:
>
>> Hi Yun,
>> Thanks for the thoughtful feedback!
>>
>> Yes, the key itself is expected to be globally unique. You’re also right
>> that we don’t need to mandate UUIDs to achieve that; other schemes can
>> provide global uniqueness.
>>
>> I have chosen UUID because several folks in the community prefer it as a
>> common, interoperable choice. That said, I agree that mandating UUIDv7 adds
>> constraints on clients without clear spec-level benefit.
>>
>> I also agree we should separate spec from implementation; details like
>> the key generation method can live in implementation guidance.
>>
>> From your note, it sounds like you support Option 2
>> (version-agnostic)—i.e., require a “globally unique idempotency key” and
>> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
>> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
>> as a +1 for Option 2.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Fri, Oct 24, 2025 at 7:00 PM yun zou 
>> wrote:
>>
>>> Sorry, I accidentally sent the email before complete, please ignore my
>>> previous email. Sorry for the noise and inconvenience.
>>>
>>> Hi Huaxin,
>>>
>>> This is a really interesting and valuable proposal — it provides a
>>> great way to address the issue of duplicate client requests. Thank you
>>> for proposing and driving this forward!
>>>
>>> One point that isn’t entirely clear to me is how the server uniquely
>>> identifies each request.  Are we relying solely on the idempotency-key
>>> being globally unique, or is there an additional identifier such as
>>> clientId + idempotency-key? Based on the current discussion, it sounds
>>> like the proposal expects the key itself to be globally unique, likely
>>> through the use of a UUID, but I’d like to double-check my
>>> understanding.
>>>
>>> If we are indeed relying on the client to generate a globally unique
>>> ID, that approach makes sense. However, it doesn’t seem necessary to
>>> mandate the use of UUIDs, as there are other valid methods for
>>> achieving global uniqueness. Imposing a further restriction to UUIDv7
>>> would place additional constraints on the client implementation.
>>>
>>> From a specification perspective, I think it would be better to
>>> separate the spec from the implementation. In other words, we should
>>> make it clear that the key must be globally unique, but we don’t need
>>> to specify that it must be a UUID or UUIDv7.
>>>
>>> Best Regards,
>>> Yun
>>>
>>> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Thank you for taking the time to review my proposal and PR—I really
>>> appreciate the input.
>>> >
>>> > There’s one remaining issue I’d like to settle. In the Iceberg Catalog
>>> Community sync, many preferred mandating UUIDv7 for the idempotency key. At
>>> the same time, there are some concerns:
>>> >
>>> > If we need a timestamp, it should be a separate field; we shouldn’t
>>> use the UUIDv7 timest

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-29 Thread Laurent Goujon
tl;dr; Option 1 for me

The topic was discussed quite a lot during the last catalog community sync
(video not published yet but the consensus at that time seemed to be to
mandate uuid v7) and before that on the github PR (here's the thread for
reference ->
https://github.com/apache/iceberg/pull/14196#discussion_r2392476299)

The biggest issue for people seems to be that generating a UUIDv7 is an
issue/too much of a burden for clients. My arguments are that:
- UUIDv7 is fully specified
- Many languages already offer uuid v7 as part of their standard library,
if not, external libraries also exist for the same task
- Even if we want to reduce our reliance on 3rd party libraries, we are
talking about 10 lines of code to generate a UUIDv7 ->
https://github.com/apache/iceberg/pull/14196#discussion_r2450317404

During the community sync, we mentioned that uuidv7 would at the minimum
provide some interesting debugging capability, but could help servers be
more robust/efficient, for example by automatically discarding keys which
are obviously too old. While many people are concerned about clients (whose
only task is to generate a key according to spec to send with each
request), servers have to process those keys, store them for a period of
time which has a non-negligible cost, all of that possibly in a distributed
manner. Now, let's say a client is non-conformant and is reusing the same
key or is not applying correctly the lifetime directive, but send the same
request to the server which has expired it (so it's a new request for the
server), it could have the potential to cause some kind of corruption then,
while at the same time adding a time component to the key would have
prevented the issue?



On Mon, Oct 27, 2025 at 11:33 AM Yufei Gu  wrote:

> +1 on option 2: don’t mandate a specific key format.
>
> Concerns with option 1 (UUIDv7-mandatory):
> 1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck
> with a brittle contract.
> 2. Unnecessary constraints. It binds both client and server
> implementations. One of IRC’s goals is to simplify client work; forcing
> UUIDv7 limits client choices for marginal gain (the embedded timestamp).
>
> Here are existing implementations for reference:
>
>- Stripe[1]: recommends UUIDv4 but does not enforce a format for
>idempotency keys.
>- AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
>characters for the client token.
>
> I'd propose to treat the idempotency key as an opaque string with basic
> requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
> fine”) but avoid making the format mandatory. This keeps the API
> future-proof and client-friendly while preserving server-side flexibility.
>
> 1. https://docs.stripe.com/api/expanding_objects
> 2.
> https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html
>
> Yufei
>
>
> On Mon, Oct 27, 2025 at 9:53 AM huaxin gao  wrote:
>
>> Hi Yun,
>> Thanks for the thoughtful feedback!
>>
>> Yes, the key itself is expected to be globally unique. You’re also right
>> that we don’t need to mandate UUIDs to achieve that; other schemes can
>> provide global uniqueness.
>>
>> I have chosen UUID because several folks in the community prefer it as a
>> common, interoperable choice. That said, I agree that mandating UUIDv7 adds
>> constraints on clients without clear spec-level benefit.
>>
>> I also agree we should separate spec from implementation; details like
>> the key generation method can live in implementation guidance.
>>
>> From your note, it sounds like you support Option 2
>> (version-agnostic)—i.e., require a “globally unique idempotency key” and
>> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
>> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
>> as a +1 for Option 2.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Fri, Oct 24, 2025 at 7:00 PM yun zou 
>> wrote:
>>
>>> Sorry, I accidentally sent the email before complete, please ignore my
>>> previous email. Sorry for the noise and inconvenience.
>>>
>>> Hi Huaxin,
>>>
>>> This is a really interesting and valuable proposal — it provides a
>>> great way to address the issue of duplicate client requests. Thank you
>>> for proposing and driving this forward!
>>>
>>> One point that isn’t entirely clear to me is how the server uniquely
>>> identifies each request.  Are we relying solely on the idempotency-key
>>> being globally unique, or is there an additional identifier such as
>>> clientId + idempotency-key? Based on the current discussion, it sounds
>>> like the proposal expects the key itself to be globally unique, likely
>>> through the use of a UUID, but I’d like to double-check my
>>> understanding.
>>>
>>> If we are indeed relying on the client to generate a globally unique
>>> ID, that approach makes sense. However, it doesn’t seem necessary to
>>> mandate the use of UUIDs, as there are other valid methods for
>>> achieving global uniqueness. Impos

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-27 Thread Yufei Gu
+1 on option 2: don’t mandate a specific key format.

Concerns with option 1 (UUIDv7-mandatory):
1. Overspecification risk. If UUIDv7 shows weaknesses later, we’re stuck
with a brittle contract.
2. Unnecessary constraints. It binds both client and server
implementations. One of IRC’s goals is to simplify client work; forcing
UUIDv7 limits client choices for marginal gain (the embedded timestamp).

Here are existing implementations for reference:

   - Stripe[1]: recommends UUIDv4 but does not enforce a format for
   idempotency keys.
   - AWS EC2[2]: accepts any unique, case-sensitive string up to 64 ASCII
   characters for the client token.

I'd propose to treat the idempotency key as an opaque string with basic
requirements and guidance(e.g., “unique string values; UUIDv4 or v7 are
fine”) but avoid making the format mandatory. This keeps the API
future-proof and client-friendly while preserving server-side flexibility.

1. https://docs.stripe.com/api/expanding_objects
2. https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html

Yufei


On Mon, Oct 27, 2025 at 9:53 AM huaxin gao  wrote:

> Hi Yun,
> Thanks for the thoughtful feedback!
>
> Yes, the key itself is expected to be globally unique. You’re also right
> that we don’t need to mandate UUIDs to achieve that; other schemes can
> provide global uniqueness.
>
> I have chosen UUID because several folks in the community prefer it as a
> common, interoperable choice. That said, I agree that mandating UUIDv7 adds
> constraints on clients without clear spec-level benefit.
>
> I also agree we should separate spec from implementation; details like the
> key generation method can live in implementation guidance.
>
> From your note, it sounds like you support Option 2
> (version-agnostic)—i.e., require a “globally unique idempotency key” and
> accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
> leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
> as a +1 for Option 2.
>
> Thanks,
>
> Huaxin
>
> On Fri, Oct 24, 2025 at 7:00 PM yun zou 
> wrote:
>
>> Sorry, I accidentally sent the email before complete, please ignore my
>> previous email. Sorry for the noise and inconvenience.
>>
>> Hi Huaxin,
>>
>> This is a really interesting and valuable proposal — it provides a
>> great way to address the issue of duplicate client requests. Thank you
>> for proposing and driving this forward!
>>
>> One point that isn’t entirely clear to me is how the server uniquely
>> identifies each request.  Are we relying solely on the idempotency-key
>> being globally unique, or is there an additional identifier such as
>> clientId + idempotency-key? Based on the current discussion, it sounds
>> like the proposal expects the key itself to be globally unique, likely
>> through the use of a UUID, but I’d like to double-check my
>> understanding.
>>
>> If we are indeed relying on the client to generate a globally unique
>> ID, that approach makes sense. However, it doesn’t seem necessary to
>> mandate the use of UUIDs, as there are other valid methods for
>> achieving global uniqueness. Imposing a further restriction to UUIDv7
>> would place additional constraints on the client implementation.
>>
>> From a specification perspective, I think it would be better to
>> separate the spec from the implementation. In other words, we should
>> make it clear that the key must be globally unique, but we don’t need
>> to specify that it must be a UUID or UUIDv7.
>>
>> Best Regards,
>> Yun
>>
>> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao 
>> wrote:
>> >
>> > Hi all,
>> >
>> > Thank you for taking the time to review my proposal and PR—I really
>> appreciate the input.
>> >
>> > There’s one remaining issue I’d like to settle. In the Iceberg Catalog
>> Community sync, many preferred mandating UUIDv7 for the idempotency key. At
>> the same time, there are some concerns:
>> >
>> > If we need a timestamp, it should be a separate field; we shouldn’t use
>> the UUIDv7 timestamp.
>> >
>> > If we use the UUID timestamp for expiry, we’d have to require keys to
>> be generated at request time, which feels over-engineered.
>> >
>> > If we want to use the UUIDv7 timestamp, it should be for debugging only.
>> >
>> > Based on that, here’s a draft update to the spec:
>> >
>> > Key Requirements:
>> > - Key format: UUIDv7 in string format as defined in RFC 9562.
>> >   See
>> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value
>> .
>> > - The idempotency key must be globally unique (no reuse across
>> different operations).
>> > - Catalogs SHOULD NOT expire keys before the end of the advertised
>> token lifetime.
>> > - If Idempotency-Key is used, clients MUST reuse the same key when
>> retrying the same
>> >   logical operation and MUST generate a new key for a different
>> operation.
>> > - Server behavior: Servers MUST validate the syntactic validity of
>> UUIDv7 (per RFC 9562).
>> >   Servers MUST NOT make behavioral decisions 

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-27 Thread huaxin gao
Hi Yun,
Thanks for the thoughtful feedback!

Yes, the key itself is expected to be globally unique. You’re also right
that we don’t need to mandate UUIDs to achieve that; other schemes can
provide global uniqueness.

I have chosen UUID because several folks in the community prefer it as a
common, interoperable choice. That said, I agree that mandating UUIDv7 adds
constraints on clients without clear spec-level benefit.

I also agree we should separate spec from implementation; details like the
key generation method can live in implementation guidance.

>From your note, it sounds like you support Option 2
(version-agnostic)—i.e., require a “globally unique idempotency key” and
accept any RFC 9562 UUID (with v7 as a non-normative recommendation), while
leaving timestamp/expiry mechanics to the server-side doc. I’ll count this
as a +1 for Option 2.

Thanks,

Huaxin

On Fri, Oct 24, 2025 at 7:00 PM yun zou  wrote:

> Sorry, I accidentally sent the email before complete, please ignore my
> previous email. Sorry for the noise and inconvenience.
>
> Hi Huaxin,
>
> This is a really interesting and valuable proposal — it provides a
> great way to address the issue of duplicate client requests. Thank you
> for proposing and driving this forward!
>
> One point that isn’t entirely clear to me is how the server uniquely
> identifies each request.  Are we relying solely on the idempotency-key
> being globally unique, or is there an additional identifier such as
> clientId + idempotency-key? Based on the current discussion, it sounds
> like the proposal expects the key itself to be globally unique, likely
> through the use of a UUID, but I’d like to double-check my
> understanding.
>
> If we are indeed relying on the client to generate a globally unique
> ID, that approach makes sense. However, it doesn’t seem necessary to
> mandate the use of UUIDs, as there are other valid methods for
> achieving global uniqueness. Imposing a further restriction to UUIDv7
> would place additional constraints on the client implementation.
>
> From a specification perspective, I think it would be better to
> separate the spec from the implementation. In other words, we should
> make it clear that the key must be globally unique, but we don’t need
> to specify that it must be a UUID or UUIDv7.
>
> Best Regards,
> Yun
>
> On Fri, Oct 24, 2025 at 4:41 PM huaxin gao  wrote:
> >
> > Hi all,
> >
> > Thank you for taking the time to review my proposal and PR—I really
> appreciate the input.
> >
> > There’s one remaining issue I’d like to settle. In the Iceberg Catalog
> Community sync, many preferred mandating UUIDv7 for the idempotency key. At
> the same time, there are some concerns:
> >
> > If we need a timestamp, it should be a separate field; we shouldn’t use
> the UUIDv7 timestamp.
> >
> > If we use the UUID timestamp for expiry, we’d have to require keys to be
> generated at request time, which feels over-engineered.
> >
> > If we want to use the UUIDv7 timestamp, it should be for debugging only.
> >
> > Based on that, here’s a draft update to the spec:
> >
> > Key Requirements:
> > - Key format: UUIDv7 in string format as defined in RFC 9562.
> >   See
> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value
> .
> > - The idempotency key must be globally unique (no reuse across different
> operations).
> > - Catalogs SHOULD NOT expire keys before the end of the advertised token
> lifetime.
> > - If Idempotency-Key is used, clients MUST reuse the same key when
> retrying the same
> >   logical operation and MUST generate a new key for a different
> operation.
> > - Server behavior: Servers MUST validate the syntactic validity of
> UUIDv7 (per RFC 9562).
> >   Servers MUST NOT make behavioral decisions based on the UUID’s
> internal timestamp fields.
> >   The idempotency key is an opaque, unique identifier used only for
> lookup/deduplication.
> >
> > This reads a bit awkward to me: we mandate UUIDv7 but prohibit using its
> timestamp, which seems to undercut the reason to require v7 in the first
> place.
> >
> > I’d appreciate feedback on whether we should:
> >
> > Option 1 — Require v7.
> > Keep UUIDv7 required, with the server restrictions above (syntactic v7
> validation only; no behavioral decisions based on the embedded timestamp).
> >
> > Option 2 — Version-agnostic.
> > Make the client spec version-agnostic (require RFC 9562 UUID textual
> form; allow v7 as a recommendation). Leave any timestamp/lifetime mechanics
> to a server-side (Polaris idempotency) document.
> >
> > Thanks again for the thoughtful discussion.
> >
> > Best,
> >
> > Huaxin
> >
> >
> > On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov 
> wrote:
> >>
> >> Hi Huaxin,
> >>
> >> Sorry about the delay. I posted some comments on
> https://github.com/apache/iceberg/pull/14196 Some of them I might have
> mentioned on the doc too, so apologies if they got answered in the doc and
> I missed it.
> >>
> >> Cheers,
> >> Dmitri.
> >>
> >> On Thu, Sep

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-24 Thread yun zou
Sorry, I accidentally sent the email before complete, please ignore my
previous email. Sorry for the noise and inconvenience.

Hi Huaxin,

This is a really interesting and valuable proposal — it provides a
great way to address the issue of duplicate client requests. Thank you
for proposing and driving this forward!

One point that isn’t entirely clear to me is how the server uniquely
identifies each request.  Are we relying solely on the idempotency-key
being globally unique, or is there an additional identifier such as
clientId + idempotency-key? Based on the current discussion, it sounds
like the proposal expects the key itself to be globally unique, likely
through the use of a UUID, but I’d like to double-check my
understanding.

If we are indeed relying on the client to generate a globally unique
ID, that approach makes sense. However, it doesn’t seem necessary to
mandate the use of UUIDs, as there are other valid methods for
achieving global uniqueness. Imposing a further restriction to UUIDv7
would place additional constraints on the client implementation.

>From a specification perspective, I think it would be better to
separate the spec from the implementation. In other words, we should
make it clear that the key must be globally unique, but we don’t need
to specify that it must be a UUID or UUIDv7.

Best Regards,
Yun

On Fri, Oct 24, 2025 at 4:41 PM huaxin gao  wrote:
>
> Hi all,
>
> Thank you for taking the time to review my proposal and PR—I really 
> appreciate the input.
>
> There’s one remaining issue I’d like to settle. In the Iceberg Catalog 
> Community sync, many preferred mandating UUIDv7 for the idempotency key. At 
> the same time, there are some concerns:
>
> If we need a timestamp, it should be a separate field; we shouldn’t use the 
> UUIDv7 timestamp.
>
> If we use the UUID timestamp for expiry, we’d have to require keys to be 
> generated at request time, which feels over-engineered.
>
> If we want to use the UUIDv7 timestamp, it should be for debugging only.
>
> Based on that, here’s a draft update to the spec:
>
> Key Requirements:
> - Key format: UUIDv7 in string format as defined in RFC 9562.
>   See 
> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value.
> - The idempotency key must be globally unique (no reuse across different 
> operations).
> - Catalogs SHOULD NOT expire keys before the end of the advertised token 
> lifetime.
> - If Idempotency-Key is used, clients MUST reuse the same key when retrying 
> the same
>   logical operation and MUST generate a new key for a different operation.
> - Server behavior: Servers MUST validate the syntactic validity of UUIDv7 
> (per RFC 9562).
>   Servers MUST NOT make behavioral decisions based on the UUID’s internal 
> timestamp fields.
>   The idempotency key is an opaque, unique identifier used only for 
> lookup/deduplication.
>
> This reads a bit awkward to me: we mandate UUIDv7 but prohibit using its 
> timestamp, which seems to undercut the reason to require v7 in the first 
> place.
>
> I’d appreciate feedback on whether we should:
>
> Option 1 — Require v7.
> Keep UUIDv7 required, with the server restrictions above (syntactic v7 
> validation only; no behavioral decisions based on the embedded timestamp).
>
> Option 2 — Version-agnostic.
> Make the client spec version-agnostic (require RFC 9562 UUID textual form; 
> allow v7 as a recommendation). Leave any timestamp/lifetime mechanics to a 
> server-side (Polaris idempotency) document.
>
> Thanks again for the thoughtful discussion.
>
> Best,
>
> Huaxin
>
>
> On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov  wrote:
>>
>> Hi Huaxin,
>>
>> Sorry about the delay. I posted some comments on 
>> https://github.com/apache/iceberg/pull/14196 Some of them I might have 
>> mentioned on the doc too, so apologies if they got answered in the doc and I 
>> missed it.
>>
>> Cheers,
>> Dmitri.
>>
>> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao  wrote:
>>>
>>> Thank you all for taking the time to review and discuss! I’ve responded to 
>>> all questions and updated the proposal. If there are no additional 
>>> concerns, I’ll proceed to start a VOTE thread.
>>>
>>> Thanks,
>>> Huaxin
>>>
>>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar 
>>>  wrote:

 +1, for low level retry which ensures that the idempotent key is never 
 committed twice. I also agree that canonicalizing the request body where 
 the client can change it due to conflict resolution and retry would be 
 hard to get right.

 On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:
>
> +1 to this being mostly targeting a "low-level" retry semantic. Expanding 
> on that though I'd say even "client-side retries" really have two 
> distinct flavors:
>
> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP 
> client library - behaviorally, these should behave largely the same as 
> "network infra retries". The key distinction is that in

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-24 Thread yun zou
Hi Huaxin,

This is a really interesting proposal and very useful to help solve a
client issue when there are duplicated requests sent. Thanks a lot for
proposing and driving this!

Something that isn't very clear to me in the proposal is how does the
server uniquely identify the request, do we rely on the
idempotency-key to be globally unique,  or will the server have some
way to identify it like using clientId + idempotency-key. Based on
what has been discussed here, it seems more formal, where we want the
key sent to be globally unique by using UUID, but want to double
confirm.

If we are relying on client to generate a global unique id, using UUID


On Fri, Oct 24, 2025 at 4:41 PM huaxin gao  wrote:
>
> Hi all,
>
> Thank you for taking the time to review my proposal and PR—I really 
> appreciate the input.
>
> There’s one remaining issue I’d like to settle. In the Iceberg Catalog 
> Community sync, many preferred mandating UUIDv7 for the idempotency key. At 
> the same time, there are some concerns:
>
> If we need a timestamp, it should be a separate field; we shouldn’t use the 
> UUIDv7 timestamp.
>
> If we use the UUID timestamp for expiry, we’d have to require keys to be 
> generated at request time, which feels over-engineered.
>
> If we want to use the UUIDv7 timestamp, it should be for debugging only.
>
> Based on that, here’s a draft update to the spec:
>
> Key Requirements:
> - Key format: UUIDv7 in string format as defined in RFC 9562.
>   See 
> https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value.
> - The idempotency key must be globally unique (no reuse across different 
> operations).
> - Catalogs SHOULD NOT expire keys before the end of the advertised token 
> lifetime.
> - If Idempotency-Key is used, clients MUST reuse the same key when retrying 
> the same
>   logical operation and MUST generate a new key for a different operation.
> - Server behavior: Servers MUST validate the syntactic validity of UUIDv7 
> (per RFC 9562).
>   Servers MUST NOT make behavioral decisions based on the UUID’s internal 
> timestamp fields.
>   The idempotency key is an opaque, unique identifier used only for 
> lookup/deduplication.
>
> This reads a bit awkward to me: we mandate UUIDv7 but prohibit using its 
> timestamp, which seems to undercut the reason to require v7 in the first 
> place.
>
> I’d appreciate feedback on whether we should:
>
> Option 1 — Require v7.
> Keep UUIDv7 required, with the server restrictions above (syntactic v7 
> validation only; no behavioral decisions based on the embedded timestamp).
>
> Option 2 — Version-agnostic.
> Make the client spec version-agnostic (require RFC 9562 UUID textual form; 
> allow v7 as a recommendation). Leave any timestamp/lifetime mechanics to a 
> server-side (Polaris idempotency) document.
>
> Thanks again for the thoughtful discussion.
>
> Best,
>
> Huaxin
>
>
> On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov  wrote:
>>
>> Hi Huaxin,
>>
>> Sorry about the delay. I posted some comments on 
>> https://github.com/apache/iceberg/pull/14196 Some of them I might have 
>> mentioned on the doc too, so apologies if they got answered in the doc and I 
>> missed it.
>>
>> Cheers,
>> Dmitri.
>>
>> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao  wrote:
>>>
>>> Thank you all for taking the time to review and discuss! I’ve responded to 
>>> all questions and updated the proposal. If there are no additional 
>>> concerns, I’ll proceed to start a VOTE thread.
>>>
>>> Thanks,
>>> Huaxin
>>>
>>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar 
>>>  wrote:

 +1, for low level retry which ensures that the idempotent key is never 
 committed twice. I also agree that canonicalizing the request body where 
 the client can change it due to conflict resolution and retry would be 
 hard to get right.

 On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:
>
> +1 to this being mostly targeting a "low-level" retry semantic. Expanding 
> on that though I'd say even "client-side retries" really have two 
> distinct flavors:
>
> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP 
> client library - behaviorally, these should behave largely the same as 
> "network infra retries". The key distinction is that in this case any 
> content hashing would be *post* serialization and even agnostic to 
> request-body content-type (i.e. not JSON-specific).
> B. Application-specific retries, such as when Iceberg client will 
> potentially rebase on a new snapshot
>
> I think this aligns with what Peter and others mentioned earlier where 
> trying to canonicalize the *semantic* content of a request is probably 
> brittle/risky. And as Yufei mentions, case 2.B (client-side real 
> application-layer retries) should be using a new idempotency-key if it's 
> ever doing the retry at the later that requires re-serializating JSON.
>
> Overall though I agr

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-24 Thread huaxin gao
Hi all,

Thank you for taking the time to review my proposal and PR—I really
appreciate the input.

There’s one remaining issue I’d like to settle. In the Iceberg Catalog
Community sync, many preferred mandating UUIDv7 for the idempotency key. At
the same time, there are some concerns:

   -

   If we need a timestamp, it should be a separate field; we shouldn’t use
   the UUIDv7 timestamp.
   -

   If we use the UUID timestamp for expiry, we’d have to require keys to be
   generated at request time, which feels over-engineered.
   -

   If we want to use the UUIDv7 timestamp, it should be for debugging only.

Based on that, here’s a draft update to the spec:










*Key Requirements:- Key format: UUIDv7 in string format as defined in RFC
9562.  See
https://datatracker.ietf.org/doc/html/rfc9562#name-example-of-a-uuidv7-value
.-
The idempotency key must be globally unique (no reuse across different
operations).- Catalogs SHOULD NOT expire keys before the end of the
advertised token lifetime.- If Idempotency-Key is used, clients MUST reuse
the same key when retrying the same  logical operation and MUST generate a
new key for a different operation.- Server behavior: Servers MUST validate
the syntactic validity of UUIDv7 (per RFC 9562).  Servers MUST NOT make
behavioral decisions based on the UUID’s internal timestamp fields.  The
idempotency key is an opaque, unique identifier used only for
lookup/deduplication.*

This reads a bit awkward to me: we mandate UUIDv7 but prohibit using its
timestamp, which seems to undercut the reason to require v7 in the first
place.

I’d appreciate feedback on whether we should:

*Option 1 — Require v7.*
Keep UUIDv7 *required*, with the server restrictions above (syntactic v7
validation only; no behavioral decisions based on the embedded timestamp).

*Option 2 — Version-agnostic.*
Make the client spec *version-agnostic* (require RFC 9562 UUID textual
form; allow v7 as a recommendation). Leave any timestamp/lifetime mechanics
to a *server-side (Polaris idempotency)* document.

Thanks again for the thoughtful discussion.

Best,

Huaxin

On Mon, Sep 29, 2025 at 5:47 PM Dmitri Bourlatchkov 
wrote:

> Hi Huaxin,
>
> Sorry about the delay. I posted some comments on
> https://github.com/apache/iceberg/pull/14196 Some of them I might have
> mentioned on the doc too, so apologies if they got answered in the doc and
> I missed it.
>
> Cheers,
> Dmitri.
>
> On Thu, Sep 25, 2025 at 12:27 PM huaxin gao 
> wrote:
>
>> Thank you all for taking the time to review and discuss! I’ve responded
>> to all questions and updated the proposal. If there are no additional
>> concerns, I’ll proceed to start a VOTE thread.
>>
>> Thanks,
>> Huaxin
>>
>> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar <
>> [email protected]> wrote:
>>
>>> +1, for low level retry which ensures that the idempotent key is never
>>> committed twice. I also agree that canonicalizing the request body where
>>> the client can change it due to conflict resolution and retry would be hard
>>> to get right.
>>>
>>> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:
>>>
 +1 to this being mostly targeting a "low-level" retry semantic.
 Expanding on that though I'd say even "client-side retries" really have two
 distinct flavors:

 A. Business-logic-agnostic retries, e.g. in a common low-level HTTP
 client library - behaviorally, these should behave largely the same as
 "network infra retries". The key distinction is that in this case any
 content hashing would be *post* serialization and even agnostic to
 request-body content-type (i.e. not JSON-specific).
 B. Application-specific retries, such as when Iceberg client will
 potentially rebase on a new snapshot

 I think this aligns with what Peter and others mentioned earlier where
 trying to canonicalize the *semantic* content of a request is probably
 brittle/risky. And as Yufei mentions, case 2.B (client-side real
 application-layer retries) should be using a new idempotency-key if it's
 ever doing the retry at the later that requires re-serializating JSON.

 Overall though I agree making the content-hash checking optional is a
 good idea.

 On Fri, Sep 19, 2025 at 4:33 PM huaxin gao 
 wrote:

> Thanks, Peter and Yufei. I agree the main use case is
> network‑infrastructure retries. To keep the specification simple and move
> the proposal forward, let’s make the baseline key‑only idempotency. If
> there’s demand, we can add an optional payload‑binding mode (canonical 
> JSON
> + SHA‑256), advertised via /v1/config.
>
> Thanks,
>
> Huaxin
>
> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:
>
>> "*Network infrastructure retries*" would be the dominant use case.
>> I'd NOT recommend clients retry with the same idempotency key if it
>> regen

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-10-17 Thread Dmitri Bourlatchkov
Hi Huaxin,

Sorry about the delay. I posted some comments on
https://github.com/apache/iceberg/pull/14196 Some of them I might have
mentioned on the doc too, so apologies if they got answered in the doc and
I missed it.

Cheers,
Dmitri.

On Thu, Sep 25, 2025 at 12:27 PM huaxin gao  wrote:

> Thank you all for taking the time to review and discuss! I’ve responded to
> all questions and updated the proposal. If there are no additional
> concerns, I’ll proceed to start a VOTE thread.
>
> Thanks,
> Huaxin
>
> On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar <
> [email protected]> wrote:
>
>> +1, for low level retry which ensures that the idempotent key is never
>> committed twice. I also agree that canonicalizing the request body where
>> the client can change it due to conflict resolution and retry would be hard
>> to get right.
>>
>> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:
>>
>>> +1 to this being mostly targeting a "low-level" retry semantic.
>>> Expanding on that though I'd say even "client-side retries" really have two
>>> distinct flavors:
>>>
>>> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP
>>> client library - behaviorally, these should behave largely the same as
>>> "network infra retries". The key distinction is that in this case any
>>> content hashing would be *post* serialization and even agnostic to
>>> request-body content-type (i.e. not JSON-specific).
>>> B. Application-specific retries, such as when Iceberg client will
>>> potentially rebase on a new snapshot
>>>
>>> I think this aligns with what Peter and others mentioned earlier where
>>> trying to canonicalize the *semantic* content of a request is probably
>>> brittle/risky. And as Yufei mentions, case 2.B (client-side real
>>> application-layer retries) should be using a new idempotency-key if it's
>>> ever doing the retry at the later that requires re-serializating JSON.
>>>
>>> Overall though I agree making the content-hash checking optional is a
>>> good idea.
>>>
>>> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao 
>>> wrote:
>>>
 Thanks, Peter and Yufei. I agree the main use case is
 network‑infrastructure retries. To keep the specification simple and move
 the proposal forward, let’s make the baseline key‑only idempotency. If
 there’s demand, we can add an optional payload‑binding mode (canonical JSON
 + SHA‑256), advertised via /v1/config.

 Thanks,

 Huaxin

 On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:

> "*Network infrastructure retries*" would be the dominant use case.
> I'd NOT recommend clients retry with the same idempotency key if it
> regenerated the request, instead, clients should reload before retry in
> that case.
>
> Yufei
>
>
> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <
> [email protected]> wrote:
>
>> Hi Huaxin,
>>
>> Could you clarify the specific use cases we intend to support
>> regarding retry checking? Here are a couple of possibilities I had in 
>> mind:
>>
>>- *Network infrastructure retries* – where the exact same request
>>is retried.
>>- *Client-side retries* – where the client regenerates the
>>request using the same program logic, resulting in identical content.
>>
>> If there are no security or other concerns, I’d suggest keeping the
>> specification simple and avoiding mechanisms that surface client-side
>> implementation errors. The cleanest approach might be to ignore the 
>> request
>> content and rely solely on a user-provided key.
>>
>> Alternatively, we could include an optional error code in the
>> response, which implementations may use to signal conflicts. The actual
>> conflict detection logic can be left to the implementations—we don’t need
>> to define it in the specification. If we go this route, we should also
>> offer a way to disable these checks, since there will inevitably be cases
>> where semantically identical requests are incorrectly flagged as
>> conflicting.
>>
>> Thanks,
>> Peter
>>
>> huaxin gao  ezt írta (időpont: 2025. szept.
>> 19., P, 1:38):
>>
>>> Thanks Steven for the +1 and for raising the fingerprint question!
>>> Great points!
>>>
>>> What we need to protect against:
>>>
>>>
>>>- Same logical request, different bytes across retries (pretty
>>>vs compact JSON, map key order, ...).
>>>- Accidental key reuse with a changed payload.
>>>
>>> Options and tradeoffs:
>>>
>>>
>>>- Exact byte checksum (e.g., SHA‑256 over raw body)
>>>   - Pro: trivial, fast
>>>   - Con: too strict; benign diffs cause false mismatches
>>>
>>>
>>>- Canonical JSON over full request, then hash (proposed)
>>>   - Pro: stable across whitespace/key order; simple to
>>>   implement for typed payloads
>>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-25 Thread huaxin gao
Thank you all for taking the time to review and discuss! I’ve responded to
all questions and updated the proposal. If there are no additional
concerns, I’ll proceed to start a VOTE thread.

Thanks,
Huaxin

On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar <
[email protected]> wrote:

> +1, for low level retry which ensures that the idempotent key is never
> committed twice. I also agree that canonicalizing the request body where
> the client can change it due to conflict resolution and retry would be hard
> to get right.
>
> On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:
>
>> +1 to this being mostly targeting a "low-level" retry semantic. Expanding
>> on that though I'd say even "client-side retries" really have two distinct
>> flavors:
>>
>> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP
>> client library - behaviorally, these should behave largely the same as
>> "network infra retries". The key distinction is that in this case any
>> content hashing would be *post* serialization and even agnostic to
>> request-body content-type (i.e. not JSON-specific).
>> B. Application-specific retries, such as when Iceberg client will
>> potentially rebase on a new snapshot
>>
>> I think this aligns with what Peter and others mentioned earlier where
>> trying to canonicalize the *semantic* content of a request is probably
>> brittle/risky. And as Yufei mentions, case 2.B (client-side real
>> application-layer retries) should be using a new idempotency-key if it's
>> ever doing the retry at the later that requires re-serializating JSON.
>>
>> Overall though I agree making the content-hash checking optional is a
>> good idea.
>>
>> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao 
>> wrote:
>>
>>> Thanks, Peter and Yufei. I agree the main use case is
>>> network‑infrastructure retries. To keep the specification simple and move
>>> the proposal forward, let’s make the baseline key‑only idempotency. If
>>> there’s demand, we can add an optional payload‑binding mode (canonical JSON
>>> + SHA‑256), advertised via /v1/config.
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:
>>>
 "*Network infrastructure retries*" would be the dominant use case. I'd
 NOT recommend clients retry with the same idempotency key if it regenerated
 the request, instead, clients should reload before retry in that case.

 Yufei


 On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
 wrote:

> Hi Huaxin,
>
> Could you clarify the specific use cases we intend to support
> regarding retry checking? Here are a couple of possibilities I had in 
> mind:
>
>- *Network infrastructure retries* – where the exact same request
>is retried.
>- *Client-side retries* – where the client regenerates the request
>using the same program logic, resulting in identical content.
>
> If there are no security or other concerns, I’d suggest keeping the
> specification simple and avoiding mechanisms that surface client-side
> implementation errors. The cleanest approach might be to ignore the 
> request
> content and rely solely on a user-provided key.
>
> Alternatively, we could include an optional error code in the
> response, which implementations may use to signal conflicts. The actual
> conflict detection logic can be left to the implementations—we don’t need
> to define it in the specification. If we go this route, we should also
> offer a way to disable these checks, since there will inevitably be cases
> where semantically identical requests are incorrectly flagged as
> conflicting.
>
> Thanks,
> Peter
>
> huaxin gao  ezt írta (időpont: 2025. szept.
> 19., P, 1:38):
>
>> Thanks Steven for the +1 and for raising the fingerprint question!
>> Great points!
>>
>> What we need to protect against:
>>
>>
>>- Same logical request, different bytes across retries (pretty vs
>>compact JSON, map key order, ...).
>>- Accidental key reuse with a changed payload.
>>
>> Options and tradeoffs:
>>
>>
>>- Exact byte checksum (e.g., SHA‑256 over raw body)
>>   - Pro: trivial, fast
>>   - Con: too strict; benign diffs cause false mismatches
>>
>>
>>- Canonical JSON over full request, then hash (proposed)
>>   - Pro: stable across whitespace/key order; simple to implement
>>   for typed payloads
>>   - Con: slightly more work than raw checksum;
>>
>>
>>- Checksum of selected fields / field-by-field match
>>   - Pro: can be faster for huge payloads; can ignore noisy fields
>>   - Con: could misses legitimate differences
>>
>>
>>- Request digest/signature
>>   - Pro: very strong
>>   - Con: heavyweight
>>
>> Maybe we could make this configurable:
>>
>>
>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-22 Thread Maninder Parmar
+1, for low level retry which ensures that the idempotent key is never
committed twice. I also agree that canonicalizing the request body where
the client can change it due to conflict resolution and retry would be hard
to get right.

On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo  wrote:

> +1 to this being mostly targeting a "low-level" retry semantic. Expanding
> on that though I'd say even "client-side retries" really have two distinct
> flavors:
>
> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP client
> library - behaviorally, these should behave largely the same as "network
> infra retries". The key distinction is that in this case any content
> hashing would be *post* serialization and even agnostic to request-body
> content-type (i.e. not JSON-specific).
> B. Application-specific retries, such as when Iceberg client will
> potentially rebase on a new snapshot
>
> I think this aligns with what Peter and others mentioned earlier where
> trying to canonicalize the *semantic* content of a request is probably
> brittle/risky. And as Yufei mentions, case 2.B (client-side real
> application-layer retries) should be using a new idempotency-key if it's
> ever doing the retry at the later that requires re-serializating JSON.
>
> Overall though I agree making the content-hash checking optional is a good
> idea.
>
> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao  wrote:
>
>> Thanks, Peter and Yufei. I agree the main use case is
>> network‑infrastructure retries. To keep the specification simple and move
>> the proposal forward, let’s make the baseline key‑only idempotency. If
>> there’s demand, we can add an optional payload‑binding mode (canonical JSON
>> + SHA‑256), advertised via /v1/config.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:
>>
>>> "*Network infrastructure retries*" would be the dominant use case. I'd
>>> NOT recommend clients retry with the same idempotency key if it regenerated
>>> the request, instead, clients should reload before retry in that case.
>>>
>>> Yufei
>>>
>>>
>>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
>>> wrote:
>>>
 Hi Huaxin,

 Could you clarify the specific use cases we intend to support regarding
 retry checking? Here are a couple of possibilities I had in mind:

- *Network infrastructure retries* – where the exact same request
is retried.
- *Client-side retries* – where the client regenerates the request
using the same program logic, resulting in identical content.

 If there are no security or other concerns, I’d suggest keeping the
 specification simple and avoiding mechanisms that surface client-side
 implementation errors. The cleanest approach might be to ignore the request
 content and rely solely on a user-provided key.

 Alternatively, we could include an optional error code in the response,
 which implementations may use to signal conflicts. The actual conflict
 detection logic can be left to the implementations—we don’t need to define
 it in the specification. If we go this route, we should also offer a way to
 disable these checks, since there will inevitably be cases where
 semantically identical requests are incorrectly flagged as conflicting.

 Thanks,
 Peter

 huaxin gao  ezt írta (időpont: 2025. szept.
 19., P, 1:38):

> Thanks Steven for the +1 and for raising the fingerprint question!
> Great points!
>
> What we need to protect against:
>
>
>- Same logical request, different bytes across retries (pretty vs
>compact JSON, map key order, ...).
>- Accidental key reuse with a changed payload.
>
> Options and tradeoffs:
>
>
>- Exact byte checksum (e.g., SHA‑256 over raw body)
>   - Pro: trivial, fast
>   - Con: too strict; benign diffs cause false mismatches
>
>
>- Canonical JSON over full request, then hash (proposed)
>   - Pro: stable across whitespace/key order; simple to implement
>   for typed payloads
>   - Con: slightly more work than raw checksum;
>
>
>- Checksum of selected fields / field-by-field match
>   - Pro: can be faster for huge payloads; can ignore noisy fields
>   - Con: could misses legitimate differences
>
>
>- Request digest/signature
>   - Pro: very strong
>   - Con: heavyweight
>
> Maybe we could make this configurable:
>
>
>- canonical-json-sha256 (default)
>- raw-bytes-sha256 (strict)
>- trust-client-key (no fingerprint check)
>
> On the IETF draft status:
>
> I have also noted the draft’s expiry. We will align with its semantics
> for now and can adjust if a new version lands.
>
> Thanks,
>
> Huaxin
>
> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu 
> wrote:
>
>> +1 for the fea

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-20 Thread Péter Váry
Thanks Huaxin for the proposal, and sorry for the late review - I had a bit
of a busy week.
I have one main question, which I have also added as a comment to the doc:
- Why do we try to compare the request contents when the Idempotency-Key is
the same for the requests? The comparison algorithm is a bit complicated,
and seems brittle to me. Consistent field ordering, maps, and maybe even
inconsistency in upper case/lower case letters might mean technically the
same request.

In my previous roles (admittedly more than 10 years ago) I was extensively
working on APIs like this, and we have never really succeeded in creating a
good enough "are these 2 requests are really the same semantically" checks.

I would simplify these requirements, unless there are serious arguments for
the existence of these checks:

   1. Either check for exact matches - without any magic - this could be
   used for detecting issues where the duplication happens on the network
   side, or
   2. Rely entirely on the clients to provide the correct Idempotency-Key.

I would prefer the 2nd.
Otherwise I agree with the contents of the proposal. It is nicely done!
(edited)

Yufei Gu  ezt írta (időpont: 2025. szept. 18., Cs,
2:54):

> Thanks for the proposal. It's a nice feature to make retry more reliable
> and efficient. Left some comments.
>
> Yufei
>
>
> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu  wrote:
>
>>
>> Thanks for writing up the proposal! Makes sense to add idempotency to
>> mutation requests.
>>
>> It would be helpful to add this feature to both the catalog test
>> framework and the iceberg-rest-fixture
>> .
>> The latter is used by the subprojects for testing and would come in handy
>> when we want to test out the client implementation.
>>
>> For other reviewers, the Stripe documentation on idempotency was a
>> helpful read, https://docs.stripe.com/api/idempotent_requests.
>>
>>
>> Best,
>> Kevin Liu
>>
>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho 
>> wrote:
>>
>>> Hi,
>>>
>>> Sounds like fairly standard practice and makes sense to me in the first
>>> read.
>>>
>>> Thanks,
>>> Szehon
>>>
>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
>>> [email protected]> wrote:
>>>
 I think based on the feedback on the proposal and in recent syncs we
 should probably move forward with the actual Spec Change PR so we can see
 what this looks like and move on to a discussion of how the Catalog test
 framework should test this.

 On 2025/08/22 18:26:23 huaxin gao wrote:
 > Hi all,
 >
 > I’d like to propose a change to Iceberg’s REST API to make mutation
 > requests safely retryable.
 >
 > *The Problem*
 > If a POST mutation (e.g., updateTable) succeeds in the catalog but the
 > client doesn’t receive the response (timeout, connection closed,
 etc.), a
 > second attempt can hit 409 Conflict. The client interprets the 409 as
 a
 > failed commit and deletes the associated metadata files, causing
 > catalog/storage inconsistency.
 >
 > *The Proposed Solution*
 > Introduces an optional Idempotency-Key HTTP header on REST mutation
 > endpoints and has the Iceberg client pass it through.
 >
 > *Semantics *(first processed request wins):
 >
 >-
 >
 >Same key + same canonical payload -> return the original result (no
 >re-execution).
 >-
 >
 >Same key + different payload -> 422 (Unprocessable Content).
 >
 > *Capability discovery:* catalogs can advertise support and retention
 so
 > clients know when a retry is safe, e.g.
 >
 > {
 >   "idempotency-tokens-respected": true,
 >   "idempotency-token-lifetime": "30m" }
 >
 > *Scope in Iceberg:* update the OpenAPI to include the header, and add
 > client pass-through + honoring capability discovery. No server
 > implementation is mandated—catalogs (e.g., Polaris) can implement
 > storage/TTL/replay as they choose.
 >
 > *Standards alignment:* uses the industry-standard header name and
 matches
 > the IETF HTTPAPI Idempotency-Key draft
 > <
 https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
 >
 > semantics.
 >
 > *Compatibility:* fully backward compatible. Servers that don’t
 support it
 > can ignore the header; clients can detect support via capability
 discovery.
 >
 > Here is the proposal
 > <
 https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
 >.
 > Looking forward to your thoughts.
 >
 > Thanks,
 >
 > Huaxin
 >

>>>


Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-19 Thread huaxin gao
Thanks, Peter and Yufei. I agree the main use case is
network‑infrastructure retries. To keep the specification simple and move
the proposal forward, let’s make the baseline key‑only idempotency. If
there’s demand, we can add an optional payload‑binding mode (canonical JSON
+ SHA‑256), advertised via /v1/config.

Thanks,

Huaxin

On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:

> "*Network infrastructure retries*" would be the dominant use case. I'd
> NOT recommend clients retry with the same idempotency key if it regenerated
> the request, instead, clients should reload before retry in that case.
>
> Yufei
>
>
> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
> wrote:
>
>> Hi Huaxin,
>>
>> Could you clarify the specific use cases we intend to support regarding
>> retry checking? Here are a couple of possibilities I had in mind:
>>
>>- *Network infrastructure retries* – where the exact same request is
>>retried.
>>- *Client-side retries* – where the client regenerates the request
>>using the same program logic, resulting in identical content.
>>
>> If there are no security or other concerns, I’d suggest keeping the
>> specification simple and avoiding mechanisms that surface client-side
>> implementation errors. The cleanest approach might be to ignore the request
>> content and rely solely on a user-provided key.
>>
>> Alternatively, we could include an optional error code in the response,
>> which implementations may use to signal conflicts. The actual conflict
>> detection logic can be left to the implementations—we don’t need to define
>> it in the specification. If we go this route, we should also offer a way to
>> disable these checks, since there will inevitably be cases where
>> semantically identical requests are incorrectly flagged as conflicting.
>>
>> Thanks,
>> Peter
>>
>> huaxin gao  ezt írta (időpont: 2025. szept. 19.,
>> P, 1:38):
>>
>>> Thanks Steven for the +1 and for raising the fingerprint question! Great
>>> points!
>>>
>>> What we need to protect against:
>>>
>>>
>>>- Same logical request, different bytes across retries (pretty vs
>>>compact JSON, map key order, ...).
>>>- Accidental key reuse with a changed payload.
>>>
>>> Options and tradeoffs:
>>>
>>>
>>>- Exact byte checksum (e.g., SHA‑256 over raw body)
>>>   - Pro: trivial, fast
>>>   - Con: too strict; benign diffs cause false mismatches
>>>
>>>
>>>- Canonical JSON over full request, then hash (proposed)
>>>   - Pro: stable across whitespace/key order; simple to implement
>>>   for typed payloads
>>>   - Con: slightly more work than raw checksum;
>>>
>>>
>>>- Checksum of selected fields / field-by-field match
>>>   - Pro: can be faster for huge payloads; can ignore noisy fields
>>>   - Con: could misses legitimate differences
>>>
>>>
>>>- Request digest/signature
>>>   - Pro: very strong
>>>   - Con: heavyweight
>>>
>>> Maybe we could make this configurable:
>>>
>>>
>>>- canonical-json-sha256 (default)
>>>- raw-bytes-sha256 (strict)
>>>- trust-client-key (no fingerprint check)
>>>
>>> On the IETF draft status:
>>>
>>> I have also noted the draft’s expiry. We will align with its semantics
>>> for now and can adjust if a new version lands.
>>>
>>> Thanks,
>>>
>>> Huaxin
>>>
>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu  wrote:
>>>
 +1 for the feature that can make retry safe for 500s and improve the
 client fault-tolerance of transient server failures.

 Peter and Dimitri raised a good question on the fingerprint. The IETF
 draft doesn't actually define the fingerprint algo. We can also go with
 simple checksum of the entire request payload, which would be cheap to
 compute. Do we anticipate any anticipated scenarios where clients may
 rewrite the payload in different forms of serialized bytes during retries?

*  Checksum of the entire request payload.
*  Checksum of selected element(s) in the request payload.
*  Field value match for each field in the request payload.
*  Field value match for selected element(s) in the request payload.
*  Request digest/signature


 BTW, the IETF draft seems to have expired without approval

 https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/

 On Thu, Sep 18, 2025 at 3:46 PM huaxin gao 
 wrote:

> Thanks Peter and Dmitri for the thoughtful feedback! I really
> appreciate you taking a close look at my proposal. I agree that "semantic
> equality" is tricky, that's why the scope here is intentionally narrow.
>
> Just to clarify scope: I’m not trying to solve general semantic
> equivalence. For these specific, typed request payloads, I serialize to a
> deterministic JSON and hash it. That normalizes benign diffs (map order,
> whitespace) without trying to infer meaning. The goal is a stable
> fingerprint so that if a key is accidentally reu

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-19 Thread Dennis Huo
+1 to this being mostly targeting a "low-level" retry semantic. Expanding
on that though I'd say even "client-side retries" really have two distinct
flavors:

A. Business-logic-agnostic retries, e.g. in a common low-level HTTP client
library - behaviorally, these should behave largely the same as "network
infra retries". The key distinction is that in this case any content
hashing would be *post* serialization and even agnostic to request-body
content-type (i.e. not JSON-specific).
B. Application-specific retries, such as when Iceberg client will
potentially rebase on a new snapshot

I think this aligns with what Peter and others mentioned earlier where
trying to canonicalize the *semantic* content of a request is probably
brittle/risky. And as Yufei mentions, case 2.B (client-side real
application-layer retries) should be using a new idempotency-key if it's
ever doing the retry at the later that requires re-serializating JSON.

Overall though I agree making the content-hash checking optional is a good
idea.

On Fri, Sep 19, 2025 at 4:33 PM huaxin gao  wrote:

> Thanks, Peter and Yufei. I agree the main use case is
> network‑infrastructure retries. To keep the specification simple and move
> the proposal forward, let’s make the baseline key‑only idempotency. If
> there’s demand, we can add an optional payload‑binding mode (canonical JSON
> + SHA‑256), advertised via /v1/config.
>
> Thanks,
>
> Huaxin
>
> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu  wrote:
>
>> "*Network infrastructure retries*" would be the dominant use case. I'd
>> NOT recommend clients retry with the same idempotency key if it regenerated
>> the request, instead, clients should reload before retry in that case.
>>
>> Yufei
>>
>>
>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
>> wrote:
>>
>>> Hi Huaxin,
>>>
>>> Could you clarify the specific use cases we intend to support regarding
>>> retry checking? Here are a couple of possibilities I had in mind:
>>>
>>>- *Network infrastructure retries* – where the exact same request is
>>>retried.
>>>- *Client-side retries* – where the client regenerates the request
>>>using the same program logic, resulting in identical content.
>>>
>>> If there are no security or other concerns, I’d suggest keeping the
>>> specification simple and avoiding mechanisms that surface client-side
>>> implementation errors. The cleanest approach might be to ignore the request
>>> content and rely solely on a user-provided key.
>>>
>>> Alternatively, we could include an optional error code in the response,
>>> which implementations may use to signal conflicts. The actual conflict
>>> detection logic can be left to the implementations—we don’t need to define
>>> it in the specification. If we go this route, we should also offer a way to
>>> disable these checks, since there will inevitably be cases where
>>> semantically identical requests are incorrectly flagged as conflicting.
>>>
>>> Thanks,
>>> Peter
>>>
>>> huaxin gao  ezt írta (időpont: 2025. szept.
>>> 19., P, 1:38):
>>>
 Thanks Steven for the +1 and for raising the fingerprint question!
 Great points!

 What we need to protect against:


- Same logical request, different bytes across retries (pretty vs
compact JSON, map key order, ...).
- Accidental key reuse with a changed payload.

 Options and tradeoffs:


- Exact byte checksum (e.g., SHA‑256 over raw body)
   - Pro: trivial, fast
   - Con: too strict; benign diffs cause false mismatches


- Canonical JSON over full request, then hash (proposed)
   - Pro: stable across whitespace/key order; simple to implement
   for typed payloads
   - Con: slightly more work than raw checksum;


- Checksum of selected fields / field-by-field match
   - Pro: can be faster for huge payloads; can ignore noisy fields
   - Con: could misses legitimate differences


- Request digest/signature
   - Pro: very strong
   - Con: heavyweight

 Maybe we could make this configurable:


- canonical-json-sha256 (default)
- raw-bytes-sha256 (strict)
- trust-client-key (no fingerprint check)

 On the IETF draft status:

 I have also noted the draft’s expiry. We will align with its semantics
 for now and can adjust if a new version lands.

 Thanks,

 Huaxin

 On Thu, Sep 18, 2025 at 4:01 PM Steven Wu  wrote:

> +1 for the feature that can make retry safe for 500s and improve the
> client fault-tolerance of transient server failures.
>
> Peter and Dimitri raised a good question on the fingerprint. The IETF
> draft doesn't actually define the fingerprint algo. We can also go with
> simple checksum of the entire request payload, which would be cheap to
> compute. Do we anticipate any anticipated scenarios where clients may
> rew

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-19 Thread Yufei Gu
"*Network infrastructure retries*" would be the dominant use case. I'd NOT
recommend clients retry with the same idempotency key if it regenerated the
request, instead, clients should reload before retry in that case.

Yufei


On Fri, Sep 19, 2025 at 2:05 AM Péter Váry 
wrote:

> Hi Huaxin,
>
> Could you clarify the specific use cases we intend to support regarding
> retry checking? Here are a couple of possibilities I had in mind:
>
>- *Network infrastructure retries* – where the exact same request is
>retried.
>- *Client-side retries* – where the client regenerates the request
>using the same program logic, resulting in identical content.
>
> If there are no security or other concerns, I’d suggest keeping the
> specification simple and avoiding mechanisms that surface client-side
> implementation errors. The cleanest approach might be to ignore the request
> content and rely solely on a user-provided key.
>
> Alternatively, we could include an optional error code in the response,
> which implementations may use to signal conflicts. The actual conflict
> detection logic can be left to the implementations—we don’t need to define
> it in the specification. If we go this route, we should also offer a way to
> disable these checks, since there will inevitably be cases where
> semantically identical requests are incorrectly flagged as conflicting.
>
> Thanks,
> Peter
>
> huaxin gao  ezt írta (időpont: 2025. szept. 19.,
> P, 1:38):
>
>> Thanks Steven for the +1 and for raising the fingerprint question! Great
>> points!
>>
>> What we need to protect against:
>>
>>
>>- Same logical request, different bytes across retries (pretty vs
>>compact JSON, map key order, ...).
>>- Accidental key reuse with a changed payload.
>>
>> Options and tradeoffs:
>>
>>
>>- Exact byte checksum (e.g., SHA‑256 over raw body)
>>   - Pro: trivial, fast
>>   - Con: too strict; benign diffs cause false mismatches
>>
>>
>>- Canonical JSON over full request, then hash (proposed)
>>   - Pro: stable across whitespace/key order; simple to implement for
>>   typed payloads
>>   - Con: slightly more work than raw checksum;
>>
>>
>>- Checksum of selected fields / field-by-field match
>>   - Pro: can be faster for huge payloads; can ignore noisy fields
>>   - Con: could misses legitimate differences
>>
>>
>>- Request digest/signature
>>   - Pro: very strong
>>   - Con: heavyweight
>>
>> Maybe we could make this configurable:
>>
>>
>>- canonical-json-sha256 (default)
>>- raw-bytes-sha256 (strict)
>>- trust-client-key (no fingerprint check)
>>
>> On the IETF draft status:
>>
>> I have also noted the draft’s expiry. We will align with its semantics
>> for now and can adjust if a new version lands.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu  wrote:
>>
>>> +1 for the feature that can make retry safe for 500s and improve the
>>> client fault-tolerance of transient server failures.
>>>
>>> Peter and Dimitri raised a good question on the fingerprint. The IETF
>>> draft doesn't actually define the fingerprint algo. We can also go with
>>> simple checksum of the entire request payload, which would be cheap to
>>> compute. Do we anticipate any anticipated scenarios where clients may
>>> rewrite the payload in different forms of serialized bytes during retries?
>>>
>>>*  Checksum of the entire request payload.
>>>*  Checksum of selected element(s) in the request payload.
>>>*  Field value match for each field in the request payload.
>>>*  Field value match for selected element(s) in the request payload.
>>>*  Request digest/signature
>>>
>>>
>>> BTW, the IETF draft seems to have expired without approval
>>>
>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>>>
>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao 
>>> wrote:
>>>
 Thanks Peter and Dmitri for the thoughtful feedback! I really
 appreciate you taking a close look at my proposal. I agree that "semantic
 equality" is tricky, that's why the scope here is intentionally narrow.

 Just to clarify scope: I’m not trying to solve general semantic
 equivalence. For these specific, typed request payloads, I serialize to a
 deterministic JSON and hash it. That normalizes benign diffs (map order,
 whitespace) without trying to infer meaning. The goal is a stable
 fingerprint so that if a key is accidentally reused with a changed payload,
 we surface that instead of silently diverging.

 To make this feel less brittle, I’ll add tests for the practical cases
 (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
 formatting), plus end‑to‑end tests in the in‑memory REST fixture with
 failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
 Happy to walk through these if helpful.

 I’m also open to adding a config switch for “trust‑client‑key o

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-19 Thread Péter Váry
Hi Huaxin,

Could you clarify the specific use cases we intend to support regarding
retry checking? Here are a couple of possibilities I had in mind:

   - *Network infrastructure retries* – where the exact same request is
   retried.
   - *Client-side retries* – where the client regenerates the request using
   the same program logic, resulting in identical content.

If there are no security or other concerns, I’d suggest keeping the
specification simple and avoiding mechanisms that surface client-side
implementation errors. The cleanest approach might be to ignore the request
content and rely solely on a user-provided key.

Alternatively, we could include an optional error code in the response,
which implementations may use to signal conflicts. The actual conflict
detection logic can be left to the implementations—we don’t need to define
it in the specification. If we go this route, we should also offer a way to
disable these checks, since there will inevitably be cases where
semantically identical requests are incorrectly flagged as conflicting.

Thanks,
Peter

huaxin gao  ezt írta (időpont: 2025. szept. 19., P,
1:38):

> Thanks Steven for the +1 and for raising the fingerprint question! Great
> points!
>
> What we need to protect against:
>
>
>- Same logical request, different bytes across retries (pretty vs
>compact JSON, map key order, ...).
>- Accidental key reuse with a changed payload.
>
> Options and tradeoffs:
>
>
>- Exact byte checksum (e.g., SHA‑256 over raw body)
>   - Pro: trivial, fast
>   - Con: too strict; benign diffs cause false mismatches
>
>
>- Canonical JSON over full request, then hash (proposed)
>   - Pro: stable across whitespace/key order; simple to implement for
>   typed payloads
>   - Con: slightly more work than raw checksum;
>
>
>- Checksum of selected fields / field-by-field match
>   - Pro: can be faster for huge payloads; can ignore noisy fields
>   - Con: could misses legitimate differences
>
>
>- Request digest/signature
>   - Pro: very strong
>   - Con: heavyweight
>
> Maybe we could make this configurable:
>
>
>- canonical-json-sha256 (default)
>- raw-bytes-sha256 (strict)
>- trust-client-key (no fingerprint check)
>
> On the IETF draft status:
>
> I have also noted the draft’s expiry. We will align with its semantics for
> now and can adjust if a new version lands.
>
> Thanks,
>
> Huaxin
>
> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu  wrote:
>
>> +1 for the feature that can make retry safe for 500s and improve the
>> client fault-tolerance of transient server failures.
>>
>> Peter and Dimitri raised a good question on the fingerprint. The IETF
>> draft doesn't actually define the fingerprint algo. We can also go with
>> simple checksum of the entire request payload, which would be cheap to
>> compute. Do we anticipate any anticipated scenarios where clients may
>> rewrite the payload in different forms of serialized bytes during retries?
>>
>>*  Checksum of the entire request payload.
>>*  Checksum of selected element(s) in the request payload.
>>*  Field value match for each field in the request payload.
>>*  Field value match for selected element(s) in the request payload.
>>*  Request digest/signature
>>
>>
>> BTW, the IETF draft seems to have expired without approval
>>
>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>>
>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao 
>> wrote:
>>
>>> Thanks Peter and Dmitri for the thoughtful feedback! I really appreciate
>>> you taking a close look at my proposal. I agree that "semantic equality" is
>>> tricky, that's why the scope here is intentionally narrow.
>>>
>>> Just to clarify scope: I’m not trying to solve general semantic
>>> equivalence. For these specific, typed request payloads, I serialize to a
>>> deterministic JSON and hash it. That normalizes benign diffs (map order,
>>> whitespace) without trying to infer meaning. The goal is a stable
>>> fingerprint so that if a key is accidentally reused with a changed payload,
>>> we surface that instead of silently diverging.
>>>
>>> To make this feel less brittle, I’ll add tests for the practical cases
>>> (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
>>> formatting), plus end‑to‑end tests in the in‑memory REST fixture with
>>> failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
>>> Happy to walk through these if helpful.
>>>
>>> I’m also open to adding a config switch for “trust‑client‑key only” if
>>> that’s preferred in some environments. My intent is to stay aligned with
>>> the IETF Idempotency‑Key guidance (first request wins; conflicting reuse is
>>> rejected, and reusing a key with a different request payload is rejected
>>> via an idempotency fingerprint) while keeping things as simple as possible
>>> and protecting us from accidental key misuse. Would love to align on the
>>> lightest approach 

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-18 Thread huaxin gao
Thanks Steven for the +1 and for raising the fingerprint question! Great
points!

What we need to protect against:


   - Same logical request, different bytes across retries (pretty vs
   compact JSON, map key order, ...).
   - Accidental key reuse with a changed payload.

Options and tradeoffs:


   - Exact byte checksum (e.g., SHA‑256 over raw body)
  - Pro: trivial, fast
  - Con: too strict; benign diffs cause false mismatches


   - Canonical JSON over full request, then hash (proposed)
  - Pro: stable across whitespace/key order; simple to implement for
  typed payloads
  - Con: slightly more work than raw checksum;


   - Checksum of selected fields / field-by-field match
  - Pro: can be faster for huge payloads; can ignore noisy fields
  - Con: could misses legitimate differences


   - Request digest/signature
  - Pro: very strong
  - Con: heavyweight

Maybe we could make this configurable:


   - canonical-json-sha256 (default)
   - raw-bytes-sha256 (strict)
   - trust-client-key (no fingerprint check)

On the IETF draft status:

I have also noted the draft’s expiry. We will align with its semantics for
now and can adjust if a new version lands.

Thanks,

Huaxin

On Thu, Sep 18, 2025 at 4:01 PM Steven Wu  wrote:

> +1 for the feature that can make retry safe for 500s and improve the
> client fault-tolerance of transient server failures.
>
> Peter and Dimitri raised a good question on the fingerprint. The IETF
> draft doesn't actually define the fingerprint algo. We can also go with
> simple checksum of the entire request payload, which would be cheap to
> compute. Do we anticipate any anticipated scenarios where clients may
> rewrite the payload in different forms of serialized bytes during retries?
>
>*  Checksum of the entire request payload.
>*  Checksum of selected element(s) in the request payload.
>*  Field value match for each field in the request payload.
>*  Field value match for selected element(s) in the request payload.
>*  Request digest/signature
>
>
> BTW, the IETF draft seems to have expired without approval
> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>
> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao  wrote:
>
>> Thanks Peter and Dmitri for the thoughtful feedback! I really appreciate
>> you taking a close look at my proposal. I agree that "semantic equality" is
>> tricky, that's why the scope here is intentionally narrow.
>>
>> Just to clarify scope: I’m not trying to solve general semantic
>> equivalence. For these specific, typed request payloads, I serialize to a
>> deterministic JSON and hash it. That normalizes benign diffs (map order,
>> whitespace) without trying to infer meaning. The goal is a stable
>> fingerprint so that if a key is accidentally reused with a changed payload,
>> we surface that instead of silently diverging.
>>
>> To make this feel less brittle, I’ll add tests for the practical cases
>> (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
>> formatting), plus end‑to‑end tests in the in‑memory REST fixture with
>> failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
>> Happy to walk through these if helpful.
>>
>> I’m also open to adding a config switch for “trust‑client‑key only” if
>> that’s preferred in some environments. My intent is to stay aligned with
>> the IETF Idempotency‑Key guidance (first request wins; conflicting reuse is
>> rejected, and reusing a key with a different request payload is rejected
>> via an idempotency fingerprint) while keeping things as simple as possible
>> and protecting us from accidental key misuse. Would love to align on the
>> lightest approach that meets those goals.
>>
>> Thanks,
>>
>> Huaxin
>>
>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov 
>> wrote:
>>
>>> Hi All,
>>>
>>> I agree that checking request contents is almost redundant in this case.
>>>
>>> If the randomness quality of Idempotency-Key value is good, collisions
>>> are very unlikely on the server side. Given that, any content checks the
>>> server performs are essentially validating that clients correctly reuse the
>>> generated Idempotency-Key value. (this is mostly the same as my comment on
>>> the related Polaris discussion).
>>>
>>> I'd like to propose making the content check optional so that servers
>>> may or may not implement it according to their design principles and
>>> constraints and emphasizing that clients should use unique keys (e.g.
>>> UUIDs)... basically going with option 2 from Peter's email.
>>>
>>> I believe this is in line with the SHOULD word used for this case in the
>>> IETF draft [1] (section 2.7).
>>>
>>> [1]
>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>>>
>>> Thanks,
>>> Dmitri.
>>>
>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry 
>>> wrote:
>>>
 Thanks Huaxin for the proposal, and sorry for the late review - I had a
 bit of a busy week.
 

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-18 Thread Steven Wu
+1 for the feature that can make retry safe for 500s and improve the client
fault-tolerance of transient server failures.

Peter and Dimitri raised a good question on the fingerprint. The IETF draft
doesn't actually define the fingerprint algo. We can also go with simple
checksum of the entire request payload, which would be cheap to compute. Do
we anticipate any anticipated scenarios where clients may rewrite the
payload in different forms of serialized bytes during retries?

   *  Checksum of the entire request payload.
   *  Checksum of selected element(s) in the request payload.
   *  Field value match for each field in the request payload.
   *  Field value match for selected element(s) in the request payload.
   *  Request digest/signature


BTW, the IETF draft seems to have expired without approval
https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/

On Thu, Sep 18, 2025 at 3:46 PM huaxin gao  wrote:

> Thanks Peter and Dmitri for the thoughtful feedback! I really appreciate
> you taking a close look at my proposal. I agree that "semantic equality" is
> tricky, that's why the scope here is intentionally narrow.
>
> Just to clarify scope: I’m not trying to solve general semantic
> equivalence. For these specific, typed request payloads, I serialize to a
> deterministic JSON and hash it. That normalizes benign diffs (map order,
> whitespace) without trying to infer meaning. The goal is a stable
> fingerprint so that if a key is accidentally reused with a changed payload,
> we surface that instead of silently diverging.
>
> To make this feel less brittle, I’ll add tests for the practical cases
> (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
> formatting), plus end‑to‑end tests in the in‑memory REST fixture with
> failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
> Happy to walk through these if helpful.
>
> I’m also open to adding a config switch for “trust‑client‑key only” if
> that’s preferred in some environments. My intent is to stay aligned with
> the IETF Idempotency‑Key guidance (first request wins; conflicting reuse is
> rejected, and reusing a key with a different request payload is rejected
> via an idempotency fingerprint) while keeping things as simple as possible
> and protecting us from accidental key misuse. Would love to align on the
> lightest approach that meets those goals.
>
> Thanks,
>
> Huaxin
>
> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov 
> wrote:
>
>> Hi All,
>>
>> I agree that checking request contents is almost redundant in this case.
>>
>> If the randomness quality of Idempotency-Key value is good, collisions
>> are very unlikely on the server side. Given that, any content checks the
>> server performs are essentially validating that clients correctly reuse the
>> generated Idempotency-Key value. (this is mostly the same as my comment on
>> the related Polaris discussion).
>>
>> I'd like to propose making the content check optional so that servers may
>> or may not implement it according to their design principles and
>> constraints and emphasizing that clients should use unique keys (e.g.
>> UUIDs)... basically going with option 2 from Peter's email.
>>
>> I believe this is in line with the SHOULD word used for this case in the
>> IETF draft [1] (section 2.7).
>>
>> [1]
>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>>
>> Thanks,
>> Dmitri.
>>
>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry 
>> wrote:
>>
>>> Thanks Huaxin for the proposal, and sorry for the late review - I had a
>>> bit of a busy week.
>>> I have one main question, which I have also added as a comment to the
>>> doc:
>>> - Why do we try to compare the request contents when the Idempotency-Key
>>> is the same for the requests? The comparison algorithm is a bit
>>> complicated, and seems brittle to me. Consistent field ordering, maps, and
>>> maybe even inconsistency in upper case/lower case letters might mean
>>> technically the same request.
>>>
>>> In my previous roles (admittedly more than 10 years ago) I was
>>> extensively working on APIs like this, and we have never really succeeded
>>> in creating a good enough "are these 2 requests are really the same
>>> semantically" checks.
>>>
>>> I would simplify these requirements, unless there are serious arguments
>>> for the existence of these checks:
>>>
>>>1. Either check for exact matches - without any magic - this could
>>>be used for detecting issues where the duplication happens on the network
>>>side, or
>>>2. Rely entirely on the clients to provide the correct
>>>Idempotency-Key.
>>>
>>> I would prefer the 2nd.
>>> Otherwise I agree with the contents of the proposal. It is nicely done!
>>> (edited)
>>>
>>> Yufei Gu  ezt írta (időpont: 2025. szept. 18.,
>>> Cs, 2:54):
>>>
 Thanks for the proposal. It's a nice feature to make retry more
 reliable and efficient. Left some comments.

 Yufei


>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-18 Thread huaxin gao
Thanks Peter and Dmitri for the thoughtful feedback! I really appreciate
you taking a close look at my proposal. I agree that "semantic equality" is
tricky, that's why the scope here is intentionally narrow.

Just to clarify scope: I’m not trying to solve general semantic
equivalence. For these specific, typed request payloads, I serialize to a
deterministic JSON and hash it. That normalizes benign diffs (map order,
whitespace) without trying to infer meaning. The goal is a stable
fingerprint so that if a key is accidentally reused with a changed payload,
we surface that instead of silently diverging.

To make this feel less brittle, I’ll add tests for the practical cases
(ordering/whitespace, nested maps, a clear null‑vs‑missing rule, numeric
formatting), plus end‑to‑end tests in the in‑memory REST fixture with
failure injection (in‑flight dup, finalize failure -> reconcile, etc.).
Happy to walk through these if helpful.

I’m also open to adding a config switch for “trust‑client‑key only” if
that’s preferred in some environments. My intent is to stay aligned with
the IETF Idempotency‑Key guidance (first request wins; conflicting reuse is
rejected, and reusing a key with a different request payload is rejected
via an idempotency fingerprint) while keeping things as simple as possible
and protecting us from accidental key misuse. Would love to align on the
lightest approach that meets those goals.

Thanks,

Huaxin

On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov 
wrote:

> Hi All,
>
> I agree that checking request contents is almost redundant in this case.
>
> If the randomness quality of Idempotency-Key value is good, collisions are
> very unlikely on the server side. Given that, any content checks the server
> performs are essentially validating that clients correctly reuse the
> generated Idempotency-Key value. (this is mostly the same as my comment on
> the related Polaris discussion).
>
> I'd like to propose making the content check optional so that servers may
> or may not implement it according to their design principles and
> constraints and emphasizing that clients should use unique keys (e.g.
> UUIDs)... basically going with option 2 from Peter's email.
>
> I believe this is in line with the SHOULD word used for this case in the
> IETF draft [1] (section 2.7).
>
> [1]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>
> Thanks,
> Dmitri.
>
> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry 
> wrote:
>
>> Thanks Huaxin for the proposal, and sorry for the late review - I had a
>> bit of a busy week.
>> I have one main question, which I have also added as a comment to the doc:
>> - Why do we try to compare the request contents when the Idempotency-Key
>> is the same for the requests? The comparison algorithm is a bit
>> complicated, and seems brittle to me. Consistent field ordering, maps, and
>> maybe even inconsistency in upper case/lower case letters might mean
>> technically the same request.
>>
>> In my previous roles (admittedly more than 10 years ago) I was
>> extensively working on APIs like this, and we have never really succeeded
>> in creating a good enough "are these 2 requests are really the same
>> semantically" checks.
>>
>> I would simplify these requirements, unless there are serious arguments
>> for the existence of these checks:
>>
>>1. Either check for exact matches - without any magic - this could be
>>used for detecting issues where the duplication happens on the network
>>side, or
>>2. Rely entirely on the clients to provide the correct
>>Idempotency-Key.
>>
>> I would prefer the 2nd.
>> Otherwise I agree with the contents of the proposal. It is nicely done!
>> (edited)
>>
>> Yufei Gu  ezt írta (időpont: 2025. szept. 18., Cs,
>> 2:54):
>>
>>> Thanks for the proposal. It's a nice feature to make retry more reliable
>>> and efficient. Left some comments.
>>>
>>> Yufei
>>>
>>>
>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu  wrote:
>>>

 Thanks for writing up the proposal! Makes sense to add idempotency to
 mutation requests.

 It would be helpful to add this feature to both the catalog test
 framework and the iceberg-rest-fixture
 .
 The latter is used by the subprojects for testing and would come in handy
 when we want to test out the client implementation.

 For other reviewers, the Stripe documentation on idempotency was a
 helpful read, https://docs.stripe.com/api/idempotent_requests.


 Best,
 Kevin Liu

 On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho 
 wrote:

> Hi,
>
> Sounds like fairly standard practice and makes sense to me in the
> first read.
>
> Thanks,
> Szehon
>
> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
> [email protected]

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-18 Thread Dmitri Bourlatchkov
Hi All,

I agree that checking request contents is almost redundant in this case.

If the randomness quality of Idempotency-Key value is good, collisions are
very unlikely on the server side. Given that, any content checks the server
performs are essentially validating that clients correctly reuse the
generated Idempotency-Key value. (this is mostly the same as my comment on
the related Polaris discussion).

I'd like to propose making the content check optional so that servers may
or may not implement it according to their design principles and
constraints and emphasizing that clients should use unique keys (e.g.
UUIDs)... basically going with option 2 from Peter's email.

I believe this is in line with the SHOULD word used for this case in the
IETF draft [1] (section 2.7).

[1]
https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06

Thanks,
Dmitri.

On Thu, Sep 18, 2025 at 7:56 AM Péter Váry 
wrote:

> Thanks Huaxin for the proposal, and sorry for the late review - I had a
> bit of a busy week.
> I have one main question, which I have also added as a comment to the doc:
> - Why do we try to compare the request contents when the Idempotency-Key
> is the same for the requests? The comparison algorithm is a bit
> complicated, and seems brittle to me. Consistent field ordering, maps, and
> maybe even inconsistency in upper case/lower case letters might mean
> technically the same request.
>
> In my previous roles (admittedly more than 10 years ago) I was extensively
> working on APIs like this, and we have never really succeeded in creating a
> good enough "are these 2 requests are really the same semantically" checks.
>
> I would simplify these requirements, unless there are serious arguments
> for the existence of these checks:
>
>1. Either check for exact matches - without any magic - this could be
>used for detecting issues where the duplication happens on the network
>side, or
>2. Rely entirely on the clients to provide the correct Idempotency-Key.
>
> I would prefer the 2nd.
> Otherwise I agree with the contents of the proposal. It is nicely done!
> (edited)
>
> Yufei Gu  ezt írta (időpont: 2025. szept. 18., Cs,
> 2:54):
>
>> Thanks for the proposal. It's a nice feature to make retry more reliable
>> and efficient. Left some comments.
>>
>> Yufei
>>
>>
>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu  wrote:
>>
>>>
>>> Thanks for writing up the proposal! Makes sense to add idempotency to
>>> mutation requests.
>>>
>>> It would be helpful to add this feature to both the catalog test
>>> framework and the iceberg-rest-fixture
>>> .
>>> The latter is used by the subprojects for testing and would come in handy
>>> when we want to test out the client implementation.
>>>
>>> For other reviewers, the Stripe documentation on idempotency was a
>>> helpful read, https://docs.stripe.com/api/idempotent_requests.
>>>
>>>
>>> Best,
>>> Kevin Liu
>>>
>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho 
>>> wrote:
>>>
 Hi,

 Sounds like fairly standard practice and makes sense to me in the first
 read.

 Thanks,
 Szehon

 On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
 [email protected]> wrote:

> I think based on the feedback on the proposal and in recent syncs we
> should probably move forward with the actual Spec Change PR so we can see
> what this looks like and move on to a discussion of how the Catalog test
> framework should test this.
>
> On 2025/08/22 18:26:23 huaxin gao wrote:
> > Hi all,
> >
> > I’d like to propose a change to Iceberg’s REST API to make mutation
> > requests safely retryable.
> >
> > *The Problem*
> > If a POST mutation (e.g., updateTable) succeeds in the catalog but
> the
> > client doesn’t receive the response (timeout, connection closed,
> etc.), a
> > second attempt can hit 409 Conflict. The client interprets the 409
> as a
> > failed commit and deletes the associated metadata files, causing
> > catalog/storage inconsistency.
> >
> > *The Proposed Solution*
> > Introduces an optional Idempotency-Key HTTP header on REST mutation
> > endpoints and has the Iceberg client pass it through.
> >
> > *Semantics *(first processed request wins):
> >
> >-
> >
> >Same key + same canonical payload -> return the original result
> (no
> >re-execution).
> >-
> >
> >Same key + different payload -> 422 (Unprocessable Content).
> >
> > *Capability discovery:* catalogs can advertise support and retention
> so
> > clients know when a retry is safe, e.g.
> >
> > {
> >   "idempotency-tokens-respected": true,
> >   "idempotency-token-lifetime": "30m" }
> >
> > *Sco

Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-17 Thread Yufei Gu
Thanks for the proposal. It's a nice feature to make retry more reliable
and efficient. Left some comments.

Yufei


On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu  wrote:

>
> Thanks for writing up the proposal! Makes sense to add idempotency to
> mutation requests.
>
> It would be helpful to add this feature to both the catalog test framework
> and the iceberg-rest-fixture
> .
> The latter is used by the subprojects for testing and would come in handy
> when we want to test out the client implementation.
>
> For other reviewers, the Stripe documentation on idempotency was a helpful
> read, https://docs.stripe.com/api/idempotent_requests.
>
>
> Best,
> Kevin Liu
>
> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho 
> wrote:
>
>> Hi,
>>
>> Sounds like fairly standard practice and makes sense to me in the first
>> read.
>>
>> Thanks,
>> Szehon
>>
>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
>> [email protected]> wrote:
>>
>>> I think based on the feedback on the proposal and in recent syncs we
>>> should probably move forward with the actual Spec Change PR so we can see
>>> what this looks like and move on to a discussion of how the Catalog test
>>> framework should test this.
>>>
>>> On 2025/08/22 18:26:23 huaxin gao wrote:
>>> > Hi all,
>>> >
>>> > I’d like to propose a change to Iceberg’s REST API to make mutation
>>> > requests safely retryable.
>>> >
>>> > *The Problem*
>>> > If a POST mutation (e.g., updateTable) succeeds in the catalog but the
>>> > client doesn’t receive the response (timeout, connection closed,
>>> etc.), a
>>> > second attempt can hit 409 Conflict. The client interprets the 409 as a
>>> > failed commit and deletes the associated metadata files, causing
>>> > catalog/storage inconsistency.
>>> >
>>> > *The Proposed Solution*
>>> > Introduces an optional Idempotency-Key HTTP header on REST mutation
>>> > endpoints and has the Iceberg client pass it through.
>>> >
>>> > *Semantics *(first processed request wins):
>>> >
>>> >-
>>> >
>>> >Same key + same canonical payload -> return the original result (no
>>> >re-execution).
>>> >-
>>> >
>>> >Same key + different payload -> 422 (Unprocessable Content).
>>> >
>>> > *Capability discovery:* catalogs can advertise support and retention so
>>> > clients know when a retry is safe, e.g.
>>> >
>>> > {
>>> >   "idempotency-tokens-respected": true,
>>> >   "idempotency-token-lifetime": "30m" }
>>> >
>>> > *Scope in Iceberg:* update the OpenAPI to include the header, and add
>>> > client pass-through + honoring capability discovery. No server
>>> > implementation is mandated—catalogs (e.g., Polaris) can implement
>>> > storage/TTL/replay as they choose.
>>> >
>>> > *Standards alignment:* uses the industry-standard header name and
>>> matches
>>> > the IETF HTTPAPI Idempotency-Key draft
>>> > <
>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
>>> >
>>> > semantics.
>>> >
>>> > *Compatibility:* fully backward compatible. Servers that don’t support
>>> it
>>> > can ignore the header; clients can detect support via capability
>>> discovery.
>>> >
>>> > Here is the proposal
>>> > <
>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
>>> >.
>>> > Looking forward to your thoughts.
>>> >
>>> > Thanks,
>>> >
>>> > Huaxin
>>> >
>>>
>>


Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-15 Thread Kevin Liu
Thanks for writing up the proposal! Makes sense to add idempotency to
mutation requests.

It would be helpful to add this feature to both the catalog test framework
and the iceberg-rest-fixture
.
The latter is used by the subprojects for testing and would come in handy
when we want to test out the client implementation.

For other reviewers, the Stripe documentation on idempotency was a helpful
read, https://docs.stripe.com/api/idempotent_requests.


Best,
Kevin Liu

On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho  wrote:

> Hi,
>
> Sounds like fairly standard practice and makes sense to me in the first
> read.
>
> Thanks,
> Szehon
>
> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
> [email protected]> wrote:
>
>> I think based on the feedback on the proposal and in recent syncs we
>> should probably move forward with the actual Spec Change PR so we can see
>> what this looks like and move on to a discussion of how the Catalog test
>> framework should test this.
>>
>> On 2025/08/22 18:26:23 huaxin gao wrote:
>> > Hi all,
>> >
>> > I’d like to propose a change to Iceberg’s REST API to make mutation
>> > requests safely retryable.
>> >
>> > *The Problem*
>> > If a POST mutation (e.g., updateTable) succeeds in the catalog but the
>> > client doesn’t receive the response (timeout, connection closed, etc.),
>> a
>> > second attempt can hit 409 Conflict. The client interprets the 409 as a
>> > failed commit and deletes the associated metadata files, causing
>> > catalog/storage inconsistency.
>> >
>> > *The Proposed Solution*
>> > Introduces an optional Idempotency-Key HTTP header on REST mutation
>> > endpoints and has the Iceberg client pass it through.
>> >
>> > *Semantics *(first processed request wins):
>> >
>> >-
>> >
>> >Same key + same canonical payload -> return the original result (no
>> >re-execution).
>> >-
>> >
>> >Same key + different payload -> 422 (Unprocessable Content).
>> >
>> > *Capability discovery:* catalogs can advertise support and retention so
>> > clients know when a retry is safe, e.g.
>> >
>> > {
>> >   "idempotency-tokens-respected": true,
>> >   "idempotency-token-lifetime": "30m" }
>> >
>> > *Scope in Iceberg:* update the OpenAPI to include the header, and add
>> > client pass-through + honoring capability discovery. No server
>> > implementation is mandated—catalogs (e.g., Polaris) can implement
>> > storage/TTL/replay as they choose.
>> >
>> > *Standards alignment:* uses the industry-standard header name and
>> matches
>> > the IETF HTTPAPI Idempotency-Key draft
>> > <
>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
>> >
>> > semantics.
>> >
>> > *Compatibility:* fully backward compatible. Servers that don’t support
>> it
>> > can ignore the header; clients can detect support via capability
>> discovery.
>> >
>> > Here is the proposal
>> > <
>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
>> >.
>> > Looking forward to your thoughts.
>> >
>> > Thanks,
>> >
>> > Huaxin
>> >
>>
>


Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-15 Thread Szehon Ho
Hi,

Sounds like fairly standard practice and makes sense to me in the first
read.

Thanks,
Szehon

On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer 
wrote:

> I think based on the feedback on the proposal and in recent syncs we
> should probably move forward with the actual Spec Change PR so we can see
> what this looks like and move on to a discussion of how the Catalog test
> framework should test this.
>
> On 2025/08/22 18:26:23 huaxin gao wrote:
> > Hi all,
> >
> > I’d like to propose a change to Iceberg’s REST API to make mutation
> > requests safely retryable.
> >
> > *The Problem*
> > If a POST mutation (e.g., updateTable) succeeds in the catalog but the
> > client doesn’t receive the response (timeout, connection closed, etc.), a
> > second attempt can hit 409 Conflict. The client interprets the 409 as a
> > failed commit and deletes the associated metadata files, causing
> > catalog/storage inconsistency.
> >
> > *The Proposed Solution*
> > Introduces an optional Idempotency-Key HTTP header on REST mutation
> > endpoints and has the Iceberg client pass it through.
> >
> > *Semantics *(first processed request wins):
> >
> >-
> >
> >Same key + same canonical payload -> return the original result (no
> >re-execution).
> >-
> >
> >Same key + different payload -> 422 (Unprocessable Content).
> >
> > *Capability discovery:* catalogs can advertise support and retention so
> > clients know when a retry is safe, e.g.
> >
> > {
> >   "idempotency-tokens-respected": true,
> >   "idempotency-token-lifetime": "30m" }
> >
> > *Scope in Iceberg:* update the OpenAPI to include the header, and add
> > client pass-through + honoring capability discovery. No server
> > implementation is mandated—catalogs (e.g., Polaris) can implement
> > storage/TTL/replay as they choose.
> >
> > *Standards alignment:* uses the industry-standard header name and matches
> > the IETF HTTPAPI Idempotency-Key draft
> > <
> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
> >
> > semantics.
> >
> > *Compatibility:* fully backward compatible. Servers that don’t support it
> > can ignore the header; clients can detect support via capability
> discovery.
> >
> > Here is the proposal
> > <
> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
> >.
> > Looking forward to your thoughts.
> >
> > Thanks,
> >
> > Huaxin
> >
>


Re: [DISCUSS] Iceberg REST Catalog Idempotency

2025-09-15 Thread Russell Spitzer
I think based on the feedback on the proposal and in recent syncs we should 
probably move forward with the actual Spec Change PR so we can see what this 
looks like and move on to a discussion of how the Catalog test framework should 
test this.

On 2025/08/22 18:26:23 huaxin gao wrote:
> Hi all,
> 
> I’d like to propose a change to Iceberg’s REST API to make mutation
> requests safely retryable.
> 
> *The Problem*
> If a POST mutation (e.g., updateTable) succeeds in the catalog but the
> client doesn’t receive the response (timeout, connection closed, etc.), a
> second attempt can hit 409 Conflict. The client interprets the 409 as a
> failed commit and deletes the associated metadata files, causing
> catalog/storage inconsistency.
> 
> *The Proposed Solution*
> Introduces an optional Idempotency-Key HTTP header on REST mutation
> endpoints and has the Iceberg client pass it through.
> 
> *Semantics *(first processed request wins):
> 
>-
> 
>Same key + same canonical payload -> return the original result (no
>re-execution).
>-
> 
>Same key + different payload -> 422 (Unprocessable Content).
> 
> *Capability discovery:* catalogs can advertise support and retention so
> clients know when a retry is safe, e.g.
> 
> {
>   "idempotency-tokens-respected": true,
>   "idempotency-token-lifetime": "30m" }
> 
> *Scope in Iceberg:* update the OpenAPI to include the header, and add
> client pass-through + honoring capability discovery. No server
> implementation is mandated—catalogs (e.g., Polaris) can implement
> storage/TTL/replay as they choose.
> 
> *Standards alignment:* uses the industry-standard header name and matches
> the IETF HTTPAPI Idempotency-Key draft
> 
> semantics.
> 
> *Compatibility:* fully backward compatible. Servers that don’t support it
> can ignore the header; clients can detect support via capability discovery.
> 
> Here is the proposal
> .
> Looking forward to your thoughts.
> 
> Thanks,
> 
> Huaxin
>