Re: [DISCUSS] Describing REST Server capabilities

2024-08-14 Thread Eduard Tudenhöfner
@Walaa: Those are all good feedback points that are appreciated. I realized
that the capabilities probably add more complexity than necessary.
As you mentioned, in the end we really want to know what endpoints a server
supports.

I've created a new design doc and opened a separate discussion thread for
that topic.

Thanks
Eduard

On Tue, Aug 6, 2024 at 5:37 AM Walaa Eldin Moustafa 
wrote:

> Catching up here.
>
> From Eduard's doc [1], it seems that at the end of the day, the
> capability boils down to whether an end point is implemented by the
> server or not. Therefore, I feel we could simplify things by skipping
> the categorization/grouping (e.g., tables, views, udfs, etc) and just
> allow servers to declare whether an end point is implemented or not.
> We could have a discussion around how to assign identities to
> endpoints. I think skipping the categorization has some benefits:
> * It removes one concept that servers and client implementations need
> to be aware of. It also removes one level of indirection.
> * It transitively removes the "capability version" concept, since with
> the capabilities solution, it seems within each category we should
> group some APIs into versions. It is not clear what the process is
> around creating versions and what group of APIs qualify as a version.
> It sounds there will be a dedicated process to create those.
> * It allows us to future-proof the concept of capabilities since it
> will be clearly correlated with endpoints and we do not have to come
> up with new terms for new capabilities and retro-fit them to existing
> capabilities.
>
> I am trying to see if we could simplify how to think about
> compatibility in Iceberg since there are quite a set of moving parts
> such as Iceberg library version number, table format version number,
> REST spec version number, REST endpoint root version number, etc, all
> of those interacting with reader, writer, server versions (which is a
> topic that I think is worth discussing but possibly in another
> thread).
>
> Thanks,
> Walaa.
>
> [1]
> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>


Re: [DISCUSS] Describing REST Server capabilities

2024-08-05 Thread Walaa Eldin Moustafa
Catching up here.

>From Eduard's doc [1], it seems that at the end of the day, the
capability boils down to whether an end point is implemented by the
server or not. Therefore, I feel we could simplify things by skipping
the categorization/grouping (e.g., tables, views, udfs, etc) and just
allow servers to declare whether an end point is implemented or not.
We could have a discussion around how to assign identities to
endpoints. I think skipping the categorization has some benefits:
* It removes one concept that servers and client implementations need
to be aware of. It also removes one level of indirection.
* It transitively removes the "capability version" concept, since with
the capabilities solution, it seems within each category we should
group some APIs into versions. It is not clear what the process is
around creating versions and what group of APIs qualify as a version.
It sounds there will be a dedicated process to create those.
* It allows us to future-proof the concept of capabilities since it
will be clearly correlated with endpoints and we do not have to come
up with new terms for new capabilities and retro-fit them to existing
capabilities.

I am trying to see if we could simplify how to think about
compatibility in Iceberg since there are quite a set of moving parts
such as Iceberg library version number, table format version number,
REST spec version number, REST endpoint root version number, etc, all
of those interacting with reader, writer, server versions (which is a
topic that I think is worth discussing but possibly in another
thread).

Thanks,
Walaa.

[1]  
https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit


Re: [DISCUSS] Describing REST Server capabilities

2024-07-31 Thread Dmitri Bourlatchkov
> endpoint version should bump (e.g. GET /v1/namespaces to GET
/v2/namespaces) when there is a significant backwards incompatible change.

That makes sense to me too.

> (2) version the entire catalog spec. A released catalog spec version will
contain a list of configs it supports, and also a set of APIs and all
features embedded in the APIs. A server will report the specific catalog
version it adheres to, and then document the nuances.

This is very similar to the approach Nessie takes [1] and it worked quite
well over a chain of releases that included major API and behaviour changes.

In that regard, when applied to Iceberg I'd think the "Catalog spec
version" would define a complete set of behaviours expected from compliant
servers when accessed via any API version.

When a new server-side feature/behaviour is introduced, the spec version is
increased. If the new feature can be expressed in terms of existing API
(possibly with backward-compatible changes), the API version stays the
same. If a new API version is necessary, the spec should mention that
compliant servers should support that API version.

Each REST server implementation would report one specific Catalog spec
version that it adheres to (as a simple config property).

If we want to isolate server-side features within a spec version, that
should only require one boolean config property per feature (e.g. multi
table commit) since the behavioural nuances will be defined by the spec
version itself.

Ideally the Catalog spec should apply to all catalogs (not just to REST).
For example (linking the Hadoop discussion [2]), if a catalog is not able
to provide atomicity, but the catalog spec requires that, the catalog
implementation would also indicate that via a boolean flag.

I think this approach would work well with TCKs. Basically TCKs can be
written based on the Catalog spec (potentially a different set of TCKs per
spec version).

Also, having a textual description of server behaviours should help people
that write "think clients" that utilize an Iceberg catalog via one of the
language-specific libraries.

Cheers,
Dmitri.

[1] https://github.com/projectnessie/nessie/blob/main/api/NESSIE-SPEC-2-0.md
[2] https://lists.apache.org/thread/oohcjfp1vpo005h2r0f6gfpsp6op0qps

On Wed, Jul 31, 2024 at 10:06 AM Jack Ye  wrote:

> One thing to clarify, regarding per-endpoint versioning, my understanding
> is that endpoint version should bump (e.g. GET /v1/namespaces to GET
> /v2/namespaces) when there is a significant backwards incompatible change.
>
> -Jack
>
> On Tue, Jul 30, 2024 at 7:56 PM Jack Ye  wrote:
>
>> > are you talking about the endpoint path like "/v1/"
>>
>> No, I mean in general the catalog spec version, which is marked currently
>> as 0.0.1:
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27
>>
>> And in my mind it would be you periodically release a version of the
>> catalog spec, just like you release Iceberg java libraries and pyiceberg.
>> And catalog providers will choose to upgrade to that version, and decide if
>> anything has changed for their existing support, add new support for new
>> features, document new restrictions and limitations. Basically just go
>> through a typical product lifecycle when integrating with the spec.
>>
>> -Jack
>>
>>
>> On Tue, Jul 30, 2024 at 12:24 PM Steven Wu  wrote:
>>
>>> >  (2) version the entire catalog spec. A released catalog spec version
>>> will contain a list of configs it supports, and also a set of APIs and all
>>> features embedded in the APIs. A server will report the specific catalog
>>> version it adheres to, and then document the nuances.
>>>
>>> Jack, just to clarify, are you talking about the endpoint path like
>>> "/v1/"? Also, does that mean every API/feature addition would require a
>>> catalog version bump?
>>>
>>> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye  wrote:
>>>
 Since the catalog sync was canceled this week, I find maybe it is
 better to reply here for my latest take on this topic.

 I think we have 2 discussions intertwined here, that I would like to
 decouple if possible.

 (1) is it worth having a concept of capabilities to control client
 behaviors?
 (2) suppose we introduce capabilities, is it worth having versioned
 capabilities?

 Personally speaking I am currently still more inclined to not have
 capabilities. An alternative here is to keep doing what has been done for
 metrics API, which is to introduce feature flags like
 rest-metrics-reporting-enabled. One strong argument I saw for this
 alternative is that a feature flag can express non-binary options. For
 capabilities, you are bound to say just whether the server has this
 capability or not. But what we really want is to control client behavior
 based on the capability. And for that, there could be multiple options for
 the client to interact with the server in existence/absence of a feature.
 F

Re: [DISCUSS] Describing REST Server capabilities

2024-07-31 Thread Jack Ye
One thing to clarify, regarding per-endpoint versioning, my understanding
is that endpoint version should bump (e.g. GET /v1/namespaces to GET
/v2/namespaces) when there is a significant backwards incompatible change.

-Jack

On Tue, Jul 30, 2024 at 7:56 PM Jack Ye  wrote:

> > are you talking about the endpoint path like "/v1/"
>
> No, I mean in general the catalog spec version, which is marked currently
> as 0.0.1:
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27
>
> And in my mind it would be you periodically release a version of the
> catalog spec, just like you release Iceberg java libraries and pyiceberg.
> And catalog providers will choose to upgrade to that version, and decide if
> anything has changed for their existing support, add new support for new
> features, document new restrictions and limitations. Basically just go
> through a typical product lifecycle when integrating with the spec.
>
> -Jack
>
>
> On Tue, Jul 30, 2024 at 12:24 PM Steven Wu  wrote:
>
>> >  (2) version the entire catalog spec. A released catalog spec version
>> will contain a list of configs it supports, and also a set of APIs and all
>> features embedded in the APIs. A server will report the specific catalog
>> version it adheres to, and then document the nuances.
>>
>> Jack, just to clarify, are you talking about the endpoint path like
>> "/v1/"? Also, does that mean every API/feature addition would require a
>> catalog version bump?
>>
>> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye  wrote:
>>
>>> Since the catalog sync was canceled this week, I find maybe it is better
>>> to reply here for my latest take on this topic.
>>>
>>> I think we have 2 discussions intertwined here, that I would like to
>>> decouple if possible.
>>>
>>> (1) is it worth having a concept of capabilities to control client
>>> behaviors?
>>> (2) suppose we introduce capabilities, is it worth having versioned
>>> capabilities?
>>>
>>> Personally speaking I am currently still more inclined to not have
>>> capabilities. An alternative here is to keep doing what has been done for
>>> metrics API, which is to introduce feature flags like
>>> rest-metrics-reporting-enabled. One strong argument I saw for this
>>> alternative is that a feature flag can express non-binary options. For
>>> capabilities, you are bound to say just whether the server has this
>>> capability or not. But what we really want is to control client behavior
>>> based on the capability. And for that, there could be multiple options for
>>> the client to interact with the server in existence/absence of a feature.
>>> For example, for multi-table commit, there could be 2 different behaviors
>>> when the server does not support the endpoint, (1) fail the operation
>>> early, (2) fallback to use single-table commit for each table.
>>>
>>> And with this alternative, there is of course no versioned capabilities.
>>> But I think the reason we want versioned capabilities is because we want a
>>> general versioning story for the catalog spec with forward and backward
>>> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
>>> feature flag configs as a part of the spec, (2) version the entire catalog
>>> spec. A released catalog spec version will contain a list of configs it
>>> supports, and also a set of APIs and all features embedded in the APIs. A
>>> server will report the specific catalog version it adheres to, and then
>>> document the nuances. I feel this would put catalog providers in a more
>>> comfortable situation, as they now have a stable catalog spec to adhere to
>>> as the basis, that does not just automatically evolve within the same
>>> version. They can implement a catalog spec and upgrade at their own pace
>>> following a common versioning semantics. They will also report whatever
>>> level of support and detailed behaviors they want, without the need to tie
>>> specific behaviors to different capabilities.
>>>
>>> I think we have been spending quite a long time on this topic, but this
>>> is so fundamental that I feel we should think through the alternatives.
>>> Would it be possible to at least document in the design proposal why the
>>> alternatives are not desirable, what are the pros and cons?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
 Hey everyone,

 I've written up
 https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
  to
 provide an easier way of giving feedback to the proposal.
 Please take a look so that we can discuss how we'd like to handle the
 default fallback behavior (*tables* vs *everything that's currently in
 the spec*) when a newer client talks to an older server.


 Eduard

 On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
  wrote:

> So I would argue to define the current set of APIs and

Re: [DISCUSS] Describing REST Server capabilities

2024-07-30 Thread Jack Ye
> are you talking about the endpoint path like "/v1/"

No, I mean in general the catalog spec version, which is marked currently
as 0.0.1:
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27

And in my mind it would be you periodically release a version of the
catalog spec, just like you release Iceberg java libraries and pyiceberg.
And catalog providers will choose to upgrade to that version, and decide if
anything has changed for their existing support, add new support for new
features, document new restrictions and limitations. Basically just go
through a typical product lifecycle when integrating with the spec.

-Jack


On Tue, Jul 30, 2024 at 12:24 PM Steven Wu  wrote:

> >  (2) version the entire catalog spec. A released catalog spec version
> will contain a list of configs it supports, and also a set of APIs and all
> features embedded in the APIs. A server will report the specific catalog
> version it adheres to, and then document the nuances.
>
> Jack, just to clarify, are you talking about the endpoint path like
> "/v1/"? Also, does that mean every API/feature addition would require a
> catalog version bump?
>
> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye  wrote:
>
>> Since the catalog sync was canceled this week, I find maybe it is better
>> to reply here for my latest take on this topic.
>>
>> I think we have 2 discussions intertwined here, that I would like to
>> decouple if possible.
>>
>> (1) is it worth having a concept of capabilities to control client
>> behaviors?
>> (2) suppose we introduce capabilities, is it worth having versioned
>> capabilities?
>>
>> Personally speaking I am currently still more inclined to not have
>> capabilities. An alternative here is to keep doing what has been done for
>> metrics API, which is to introduce feature flags like
>> rest-metrics-reporting-enabled. One strong argument I saw for this
>> alternative is that a feature flag can express non-binary options. For
>> capabilities, you are bound to say just whether the server has this
>> capability or not. But what we really want is to control client behavior
>> based on the capability. And for that, there could be multiple options for
>> the client to interact with the server in existence/absence of a feature.
>> For example, for multi-table commit, there could be 2 different behaviors
>> when the server does not support the endpoint, (1) fail the operation
>> early, (2) fallback to use single-table commit for each table.
>>
>> And with this alternative, there is of course no versioned capabilities.
>> But I think the reason we want versioned capabilities is because we want a
>> general versioning story for the catalog spec with forward and backward
>> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
>> feature flag configs as a part of the spec, (2) version the entire catalog
>> spec. A released catalog spec version will contain a list of configs it
>> supports, and also a set of APIs and all features embedded in the APIs. A
>> server will report the specific catalog version it adheres to, and then
>> document the nuances. I feel this would put catalog providers in a more
>> comfortable situation, as they now have a stable catalog spec to adhere to
>> as the basis, that does not just automatically evolve within the same
>> version. They can implement a catalog spec and upgrade at their own pace
>> following a common versioning semantics. They will also report whatever
>> level of support and detailed behaviors they want, without the need to tie
>> specific behaviors to different capabilities.
>>
>> I think we have been spending quite a long time on this topic, but this
>> is so fundamental that I feel we should think through the alternatives.
>> Would it be possible to at least document in the design proposal why the
>> alternatives are not desirable, what are the pros and cons?
>>
>> -Jack
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I've written up
>>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>>>  to
>>> provide an easier way of giving feedback to the proposal.
>>> Please take a look so that we can discuss how we'd like to handle the
>>> default fallback behavior (*tables* vs *everything that's currently in
>>> the spec*) when a newer client talks to an older server.
>>>
>>>
>>> Eduard
>>>
>>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>>>  wrote:
>>>
 So I would argue to define the current set of APIs and specs as the
> default if the `capabilities` field is missing.


 There have been two sides to this in prior discussions. Having *tables*
 as the default vs having what's *currently in the spec* as the
 default. The argument for having *tables* as the default is because we
 can't assume that every REST server out there already supports views.


 Can

Re: [DISCUSS] Describing REST Server capabilities

2024-07-30 Thread Steven Wu
>  (2) version the entire catalog spec. A released catalog spec version
will contain a list of configs it supports, and also a set of APIs and all
features embedded in the APIs. A server will report the specific catalog
version it adheres to, and then document the nuances.

Jack, just to clarify, are you talking about the endpoint path like "/v1/"?
Also, does that mean every API/feature addition would require a catalog
version bump?

On Tue, Jul 30, 2024 at 8:34 AM Jack Ye  wrote:

> Since the catalog sync was canceled this week, I find maybe it is better
> to reply here for my latest take on this topic.
>
> I think we have 2 discussions intertwined here, that I would like to
> decouple if possible.
>
> (1) is it worth having a concept of capabilities to control client
> behaviors?
> (2) suppose we introduce capabilities, is it worth having versioned
> capabilities?
>
> Personally speaking I am currently still more inclined to not have
> capabilities. An alternative here is to keep doing what has been done for
> metrics API, which is to introduce feature flags like
> rest-metrics-reporting-enabled. One strong argument I saw for this
> alternative is that a feature flag can express non-binary options. For
> capabilities, you are bound to say just whether the server has this
> capability or not. But what we really want is to control client behavior
> based on the capability. And for that, there could be multiple options for
> the client to interact with the server in existence/absence of a feature.
> For example, for multi-table commit, there could be 2 different behaviors
> when the server does not support the endpoint, (1) fail the operation
> early, (2) fallback to use single-table commit for each table.
>
> And with this alternative, there is of course no versioned capabilities.
> But I think the reason we want versioned capabilities is because we want a
> general versioning story for the catalog spec with forward and backward
> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
> feature flag configs as a part of the spec, (2) version the entire catalog
> spec. A released catalog spec version will contain a list of configs it
> supports, and also a set of APIs and all features embedded in the APIs. A
> server will report the specific catalog version it adheres to, and then
> document the nuances. I feel this would put catalog providers in a more
> comfortable situation, as they now have a stable catalog spec to adhere to
> as the basis, that does not just automatically evolve within the same
> version. They can implement a catalog spec and upgrade at their own pace
> following a common versioning semantics. They will also report whatever
> level of support and detailed behaviors they want, without the need to tie
> specific behaviors to different capabilities.
>
> I think we have been spending quite a long time on this topic, but this is
> so fundamental that I feel we should think through the alternatives. Would
> it be possible to at least document in the design proposal why the
> alternatives are not desirable, what are the pros and cons?
>
> -Jack
>
>
>
>
>
>
>
>
>
>
> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I've written up
>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>>  to
>> provide an easier way of giving feedback to the proposal.
>> Please take a look so that we can discuss how we'd like to handle the
>> default fallback behavior (*tables* vs *everything that's currently in
>> the spec*) when a newer client talks to an older server.
>>
>>
>> Eduard
>>
>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>>  wrote:
>>
>>> So I would argue to define the current set of APIs and specs as the
 default if the `capabilities` field is missing.
>>>
>>>
>>> There have been two sides to this in prior discussions. Having *tables*
>>> as the default vs having what's *currently in the spec* as the default.
>>> The argument for having *tables* as the default is because we can't
>>> assume that every REST server out there already supports views.
>>>
>>>
>>> Can we assume that a server that does not declare capabilities does NOT
>>> implement views? IMHO, that assumption is too strong and will break use
>>> cases when the client is upgraded, but the server is not.
>>>
>>> Before capabilities were introduced, clients used to work in a certain
>>> way. I think when the client starts interpreting capabilities, but the
>>> server does not declare the capabilities property at all, the client should
>>> (by default) work the same way as when it did not expect capabilities to be
>>> declared.
>>>
>>>
>>> Hence we're opting for the middle ground with *tables* + having a 
>>> *configurable
>>> fallback mechanism*. Servers that already support views can configure
>>> their clients to default to *tables / views*, meaning that no
>>> additional (manual) configuration from a clie

Re: [DISCUSS] Describing REST Server capabilities

2024-07-30 Thread Jack Ye
Since the catalog sync was canceled this week, I find maybe it is better to
reply here for my latest take on this topic.

I think we have 2 discussions intertwined here, that I would like to
decouple if possible.

(1) is it worth having a concept of capabilities to control client
behaviors?
(2) suppose we introduce capabilities, is it worth having versioned
capabilities?

Personally speaking I am currently still more inclined to not have
capabilities. An alternative here is to keep doing what has been done for
metrics API, which is to introduce feature flags like
rest-metrics-reporting-enabled. One strong argument I saw for this
alternative is that a feature flag can express non-binary options. For
capabilities, you are bound to say just whether the server has this
capability or not. But what we really want is to control client behavior
based on the capability. And for that, there could be multiple options for
the client to interact with the server in existence/absence of a feature.
For example, for multi-table commit, there could be 2 different behaviors
when the server does not support the endpoint, (1) fail the operation
early, (2) fallback to use single-table commit for each table.

And with this alternative, there is of course no versioned capabilities.
But I think the reason we want versioned capabilities is because we want a
general versioning story for the catalog spec with forward and backward
compatibility guarantees. If that is the goal, why not: (1) acknowledge the
feature flag configs as a part of the spec, (2) version the entire catalog
spec. A released catalog spec version will contain a list of configs it
supports, and also a set of APIs and all features embedded in the APIs. A
server will report the specific catalog version it adheres to, and then
document the nuances. I feel this would put catalog providers in a more
comfortable situation, as they now have a stable catalog spec to adhere to
as the basis, that does not just automatically evolve within the same
version. They can implement a catalog spec and upgrade at their own pace
following a common versioning semantics. They will also report whatever
level of support and detailed behaviors they want, without the need to tie
specific behaviors to different capabilities.

I think we have been spending quite a long time on this topic, but this is
so fundamental that I feel we should think through the alternatives. Would
it be possible to at least document in the design proposal why the
alternatives are not desirable, what are the pros and cons?

-Jack










On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner 
wrote:

> Hey everyone,
>
> I've written up
> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>  to
> provide an easier way of giving feedback to the proposal.
> Please take a look so that we can discuss how we'd like to handle the
> default fallback behavior (*tables* vs *everything that's currently in
> the spec*) when a newer client talks to an older server.
>
>
> Eduard
>
> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>  wrote:
>
>> So I would argue to define the current set of APIs and specs as the
>>> default if the `capabilities` field is missing.
>>
>>
>> There have been two sides to this in prior discussions. Having *tables*
>> as the default vs having what's *currently in the spec* as the default.
>> The argument for having *tables* as the default is because we can't
>> assume that every REST server out there already supports views.
>>
>>
>> Can we assume that a server that does not declare capabilities does NOT
>> implement views? IMHO, that assumption is too strong and will break use
>> cases when the client is upgraded, but the server is not.
>>
>> Before capabilities were introduced, clients used to work in a certain
>> way. I think when the client starts interpreting capabilities, but the
>> server does not declare the capabilities property at all, the client should
>> (by default) work the same way as when it did not expect capabilities to be
>> declared.
>>
>>
>> Hence we're opting for the middle ground with *tables* + having a 
>> *configurable
>> fallback mechanism*. Servers that already support views can configure
>> their clients to default to *tables / views*, meaning that no additional
>> (manual) configuration from a client's perspective is required to get table
>> & view behavior.
>>
>>
>> Forcing a server upgrade when users just want to upgrade the client is
>> too much of a burden, I think. Servers and clients are often managed by
>> different groups of people.
>>
>> In the end, IIRC previous posts in this thread correctly, declaring
>> server capabilities is an optimization to allow more efficient / less
>> error-prone client operation. I do not think it should impose additional
>> functional / interoperability requirements on servers.
>>
>> Cheers,
>> Dmitri.
>>
>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> 

Re: [DISCUSS] Describing REST Server capabilities

2024-07-16 Thread Eduard Tudenhöfner
Hey everyone,

I've written up
https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
to
provide an easier way of giving feedback to the proposal.
Please take a look so that we can discuss how we'd like to handle the
default fallback behavior (*tables* vs *everything that's currently in the
spec*) when a newer client talks to an older server.


Eduard

On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
 wrote:

> So I would argue to define the current set of APIs and specs as the
>> default if the `capabilities` field is missing.
>
>
> There have been two sides to this in prior discussions. Having *tables*
> as the default vs having what's *currently in the spec* as the default.
> The argument for having *tables* as the default is because we can't
> assume that every REST server out there already supports views.
>
>
> Can we assume that a server that does not declare capabilities does NOT
> implement views? IMHO, that assumption is too strong and will break use
> cases when the client is upgraded, but the server is not.
>
> Before capabilities were introduced, clients used to work in a certain
> way. I think when the client starts interpreting capabilities, but the
> server does not declare the capabilities property at all, the client should
> (by default) work the same way as when it did not expect capabilities to be
> declared.
>
>
> Hence we're opting for the middle ground with *tables* + having a 
> *configurable
> fallback mechanism*. Servers that already support views can configure
> their clients to default to *tables / views*, meaning that no additional
> (manual) configuration from a client's perspective is required to get table
> & view behavior.
>
>
> Forcing a server upgrade when users just want to upgrade the client is too
> much of a burden, I think. Servers and clients are often managed by
> different groups of people.
>
> In the end, IIRC previous posts in this thread correctly, declaring server
> capabilities is an optimization to allow more efficient / less error-prone
> client operation. I do not think it should impose additional functional /
> interoperability requirements on servers.
>
> Cheers,
> Dmitri.
>
> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Current servers do not send a `capabilities` field at all. You're
>>> suggesting to use a new `rest-default-capabilities` property to let newer
>>> clients assume `1`.  Once the table/view/etc-spec capabilities are needed,
>>> those newer clients would assume table-spec v1. That's wrong IMO.
>>
>>
>> That statement I mentioned only applies to the capabilities that are
>> currently in the PR and not to *table-spec / view-spec*.
>>
>>
>> I'm not a fan of a `rest-default-capabilities` property at all, because
>>> every user has to configure it explicitly and correctly
>>>
>>
>> As I mentioned, servers can configure this for *all* of their clients
>> via the *config* endpoint, so clients wouldn't have to do this *manually*
>> .
>>
>>
>> So I would argue to define the current set of APIs and specs as the
>>> default if the `capabilities` field is missing.
>>
>>
>> There have been two sides to this in prior discussions. Having *tables*
>> as the default vs having what's *currently in the spec* as the default.
>> The argument for having *tables* as the default is because we can't
>> assume that every REST server out there already supports views.
>>
>> Hence we're opting for the middle ground with *tables* + having a 
>> *configurable
>> fallback mechanism*. Servers that already support views can configure
>> their clients to default to *tables / views*, meaning that no additional
>> (manual) configuration from a client's perspective is required to get table
>> & view behavior.
>>
>> Eduard
>>
>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp  wrote:
>>
>>> Sorry, I don't understand the two suggestions, especially when used in
>>> combination. Current servers do not send a `capabilities` field at all.
>>> You're suggesting to use a new `rest-default-capabilities` property to let
>>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>>> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>>>
>>> I'm not a fan of a `rest-default-capabilities` property at all, because
>>> every user has to configure it explicitly and correctly. I predict quite
>>> some users not doing this or not doing it correctly, causing some trouble
>>> that can be prevented. The way things are configured is already quite
>>> complex, and yet adding another option adds more complexity to Iceberg. So
>>> I would argue to define the current set of APIs and specs as the default if
>>> the `capabilities` field is missing.
>>>
>>> Just because the *current* implementation doesn't use
>>> table-spec/view-spec doesn't mean near future clients would need it -
>>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>>> isn't far away either.
>>

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Dmitri Bourlatchkov
So I would argue to define the current set of APIs and specs as the default
> if the `capabilities` field is missing.


There have been two sides to this in prior discussions. Having *tables* as
the default vs having what's *currently in the spec* as the default. The
argument for having *tables* as the default is because we can't assume that
every REST server out there already supports views.


Can we assume that a server that does not declare capabilities does NOT
implement views? IMHO, that assumption is too strong and will break use
cases when the client is upgraded, but the server is not.

Before capabilities were introduced, clients used to work in a certain way.
I think when the client starts interpreting capabilities, but the server
does not declare the capabilities property at all, the client should (by
default) work the same way as when it did not expect capabilities to be
declared.


Hence we're opting for the middle ground with *tables* + having a *configurable
fallback mechanism*. Servers that already support views can configure their
clients to default to *tables / views*, meaning that no additional (manual)
configuration from a client's perspective is required to get table & view
behavior.


Forcing a server upgrade when users just want to upgrade the client is too
much of a burden, I think. Servers and clients are often managed by
different groups of people.

In the end, IIRC previous posts in this thread correctly, declaring server
capabilities is an optimization to allow more efficient / less error-prone
client operation. I do not think it should impose additional functional /
interoperability requirements on servers.

Cheers,
Dmitri.

On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
[email protected]> wrote:

> Current servers do not send a `capabilities` field at all. You're
>> suggesting to use a new `rest-default-capabilities` property to let newer
>> clients assume `1`.  Once the table/view/etc-spec capabilities are needed,
>> those newer clients would assume table-spec v1. That's wrong IMO.
>
>
> That statement I mentioned only applies to the capabilities that are
> currently in the PR and not to *table-spec / view-spec*.
>
>
> I'm not a fan of a `rest-default-capabilities` property at all, because
>> every user has to configure it explicitly and correctly
>>
>
> As I mentioned, servers can configure this for *all* of their clients via
> the *config* endpoint, so clients wouldn't have to do this *manually*.
>
>
> So I would argue to define the current set of APIs and specs as the
>> default if the `capabilities` field is missing.
>
>
> There have been two sides to this in prior discussions. Having *tables*
> as the default vs having what's *currently in the spec* as the default.
> The argument for having *tables* as the default is because we can't
> assume that every REST server out there already supports views.
>
> Hence we're opting for the middle ground with *tables* + having a 
> *configurable
> fallback mechanism*. Servers that already support views can configure
> their clients to default to *tables / views*, meaning that no additional
> (manual) configuration from a client's perspective is required to get table
> & view behavior.
>
> Eduard
>
> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp  wrote:
>
>> Sorry, I don't understand the two suggestions, especially when used in
>> combination. Current servers do not send a `capabilities` field at all.
>> You're suggesting to use a new `rest-default-capabilities` property to let
>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>>
>> I'm not a fan of a `rest-default-capabilities` property at all, because
>> every user has to configure it explicitly and correctly. I predict quite
>> some users not doing this or not doing it correctly, causing some trouble
>> that can be prevented. The way things are configured is already quite
>> complex, and yet adding another option adds more complexity to Iceberg. So
>> I would argue to define the current set of APIs and specs as the default if
>> the `capabilities` field is missing.
>>
>> Just because the *current* implementation doesn't use
>> table-spec/view-spec doesn't mean near future clients would need it -
>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>> isn't far away either.
>>
>> Adding table-spec + view-spec capabilities now saves a lot of headaches
>> for Iceberg users in the near future.
>>
>>
>> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>>
>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
>> later when new requirements/updates get added. The current implementation
>> wouldn't make any use of these capabilities, so I don't see a good enough
>> reason to add them at this point.
>>
>> The PR currently says: "tables -> default capability in case the
>>> `capabilities` property doesn't exist or is empty in the response"

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Robert Stupp


On 15.07.24 16:10, Eduard Tudenhöfner wrote:


Current servers do not send a `capabilities` field at all. You're
suggesting to use a new `rest-default-capabilities` property to
let newer clients assume `1`.  Once the table/view/etc-spec
capabilities are needed, those newer clients would assume
table-spec v1. That's wrong IMO.


That statement I mentioned only applies to the capabilities that are 
currently in the PR and not to *table-spec / view-spec*.



I'm not a fan of a `rest-default-capabilities` property at all,
because every user has to configure it explicitly and correctly


As I mentioned, servers can configure this for *all* of their clients 
via the *config* endpoint, so clients wouldn't have to do this 
*manually*.


But what's the point of letting a server sending a property 
override/default over sending the `capabilities` field?


Old servers as they run _today_ would have to be explicitly configured 
by users to send that value.




So I would argue to define the current set of APIs and specs as
the default if the `capabilities` field is missing.


There have been two sides to this in prior discussions. Having 
*tables* as the default vs having what's *currently in the spec* as 
the default. The argument for having *tables* as the default is 
because we can't assume that every REST server out there already 
supports views.


You cannot "know" it right now either. So it wouldn't be a regression. 
But if we follow the "tables only" route, existing servers would 
effectively lose the views capability - unless users know that they have 
to configure something explicitly.



Hence we're opting for the middle ground with *tables* + having a 
*configurable fallback mechanism*. Servers that already support views 
can configure their clients to default to *tables / views*, meaning 
that no additional (manual) configuration from a client's perspective 
is required to get table & view behavior.


See my point about users of server implementations above. Users have to 
re-configure their servers.





Eduard

On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp  wrote:

Sorry, I don't understand the two suggestions, especially when
used in combination. Current servers do not send a `capabilities`
field at all. You're suggesting to use a new
`rest-default-capabilities` property to let newer clients assume
`1`.  Once the table/view/etc-spec capabilities are needed, those
newer clients would assume table-spec v1. That's wrong IMO.

I'm not a fan of a `rest-default-capabilities` property at all,
because every user has to configure it explicitly and correctly. I
predict quite some users not doing this or not doing it correctly,
causing some trouble that can be prevented. The way things are
configured is already quite complex, and yet adding another option
adds more complexity to Iceberg. So I would argue to define the
current set of APIs and specs as the default if the `capabilities`
field is missing.

Just because the *current* implementation doesn't use
table-spec/view-spec doesn't mean near future clients would need
it - table-spec v3 isn't that far away. And with new data types,
view-spec v2 isn't far away either.

Adding table-spec + view-spec capabilities now saves a lot of
headaches for Iceberg users in the near future.


On 15.07.24 11:27, Eduard Tudenhöfner wrote:

I would suggest adding *table-spec / view-spec / udf-spec
*capabilities later when new requirements/updates get added. The
current implementation wouldn't make any use of
these capabilities, so I don't see a good enough reason to add
them at this point.

The PR currently says: "tables -> default capability in case
the `capabilities` property doesn't exist or is empty in the
response" - meaning: the server would _only_ support tables.
This phrase in the spec proposal effectively removes the view
functionality from all currently existing Iceberg REST
implementations.


This is why the configurable fallback mechanism was mentioned in
the Catalog sync, which can be realized with
*r**est-default-capabilities=tables,views,abc,xyz* (all of them
defaulting to version 1). A server could send that property via
the config route without having clients to change anything.


On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp  wrote:

Hi,

I still have concerns regarding the missing
table-spec/view-spec capabilities. Newer clients can send
create/update requests with requirements/updates of newer
Iceberg table/view/udf specs to a server that doesn't support
those spec versions - the outcome is rather undefined. What
should a server do? Ignore the unknown fields and
requirement/update types and hence do what it's potentially
_not_ supposed to do? Reply with a then ambiguous 501 (is it

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Eduard Tudenhöfner
>
> Current servers do not send a `capabilities` field at all. You're
> suggesting to use a new `rest-default-capabilities` property to let newer
> clients assume `1`.  Once the table/view/etc-spec capabilities are needed,
> those newer clients would assume table-spec v1. That's wrong IMO.


That statement I mentioned only applies to the capabilities that are
currently in the PR and not to *table-spec / view-spec*.


I'm not a fan of a `rest-default-capabilities` property at all, because
> every user has to configure it explicitly and correctly
>

As I mentioned, servers can configure this for *all* of their clients via
the *config* endpoint, so clients wouldn't have to do this *manually*.


So I would argue to define the current set of APIs and specs as the default
> if the `capabilities` field is missing.


There have been two sides to this in prior discussions. Having *tables* as
the default vs having what's *currently in the spec* as the default. The
argument for having *tables* as the default is because we can't assume that
every REST server out there already supports views.

Hence we're opting for the middle ground with *tables* + having a *configurable
fallback mechanism*. Servers that already support views can configure their
clients to default to *tables / views*, meaning that no additional (manual)
configuration from a client's perspective is required to get table & view
behavior.

Eduard

On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp  wrote:

> Sorry, I don't understand the two suggestions, especially when used in
> combination. Current servers do not send a `capabilities` field at all.
> You're suggesting to use a new `rest-default-capabilities` property to let
> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>
> I'm not a fan of a `rest-default-capabilities` property at all, because
> every user has to configure it explicitly and correctly. I predict quite
> some users not doing this or not doing it correctly, causing some trouble
> that can be prevented. The way things are configured is already quite
> complex, and yet adding another option adds more complexity to Iceberg. So
> I would argue to define the current set of APIs and specs as the default if
> the `capabilities` field is missing.
>
> Just because the *current* implementation doesn't use table-spec/view-spec
> doesn't mean near future clients would need it - table-spec v3 isn't that
> far away. And with new data types, view-spec v2 isn't far away either.
>
> Adding table-spec + view-spec capabilities now saves a lot of headaches
> for Iceberg users in the near future.
>
>
> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>
> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
> later when new requirements/updates get added. The current implementation
> wouldn't make any use of these capabilities, so I don't see a good enough
> reason to add them at this point.
>
> The PR currently says: "tables -> default capability in case the
>> `capabilities` property doesn't exist or is empty in the response" -
>> meaning: the server would _only_ support tables. This phrase in the spec
>> proposal effectively removes the view functionality from all currently
>> existing Iceberg REST implementations.
>
>
> This is why the configurable fallback mechanism was mentioned in the
> Catalog sync, which can be realized with *r*
> *est-default-capabilities=tables,views,abc,xyz* (all of them defaulting
> to version 1). A server could send that property via the config route
> without having clients to change anything.
>
>
> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp  wrote:
>
>> Hi,
>>
>> I still have concerns regarding the missing table-spec/view-spec
>> capabilities. Newer clients can send create/update requests with
>> requirements/updates of newer Iceberg table/view/udf specs to a server that
>> doesn't support those spec versions - the outcome is rather undefined. What
>> should a server do? Ignore the unknown fields and requirement/update types
>> and hence do what it's potentially _not_ supposed to do? Reply with a then
>> ambiguous 501 (is it the endpoint that's not implemented or the request
>> content not supported)? Similar, what if a server decides to not support
>> for example table-spec v1 and just drop the manifest-file list in a table
>> snapshot leading to data loss?
>>
>> IMO capabilities must contain the table/view/... spec versions supported
>> by the server.
>>
>> There's also the concern about the behavior if the `capabilties` field is
>> missing (see
>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not sure
>> why the comment thread's resolved). The PR currently says: "tables ->
>> default capability in case the `capabilities` property doesn't exist or is
>> empty in the response" - meaning: the server would _only_ support tables.
>> This phrase in the spec proposal effectively removes the view func

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Robert Stupp
Sorry, I don't understand the two suggestions, especially when used in 
combination. Current servers do not send a `capabilities` field at all. 
You're suggesting to use a new `rest-default-capabilities` property to 
let newer clients assume `1`.  Once the table/view/etc-spec capabilities 
are needed, those newer clients would assume table-spec v1. That's wrong 
IMO.


I'm not a fan of a `rest-default-capabilities` property at all, because 
every user has to configure it explicitly and correctly. I predict quite 
some users not doing this or not doing it correctly, causing some 
trouble that can be prevented. The way things are configured is already 
quite complex, and yet adding another option adds more complexity to 
Iceberg. So I would argue to define the current set of APIs and specs as 
the default if the `capabilities` field is missing.


Just because the *current* implementation doesn't use 
table-spec/view-spec doesn't mean near future clients would need it - 
table-spec v3 isn't that far away. And with new data types, view-spec v2 
isn't far away either.


Adding table-spec + view-spec capabilities now saves a lot of headaches 
for Iceberg users in the near future.



On 15.07.24 11:27, Eduard Tudenhöfner wrote:
I would suggest adding *table-spec / view-spec / udf-spec 
*capabilities later when new requirements/updates get added. The 
current implementation wouldn't make any use of these capabilities, so 
I don't see a good enough reason to add them at this point.


The PR currently says: "tables -> default capability in case the
`capabilities` property doesn't exist or is empty in the response"
- meaning: the server would _only_ support tables. This phrase in
the spec proposal effectively removes the view functionality from
all currently existing Iceberg REST implementations.


This is why the configurable fallback mechanism was mentioned in the 
Catalog sync, which can be realized with 
*r**est-default-capabilities=tables,views,abc,xyz* (all of them 
defaulting to version 1). A server could send that property via the 
config route without having clients to change anything.



On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp  wrote:

Hi,

I still have concerns regarding the missing table-spec/view-spec
capabilities. Newer clients can send create/update requests with
requirements/updates of newer Iceberg table/view/udf specs to a
server that doesn't support those spec versions - the outcome is
rather undefined. What should a server do? Ignore the unknown
fields and requirement/update types and hence do what it's
potentially _not_ supposed to do? Reply with a then ambiguous 501
(is it the endpoint that's not implemented or the request content
not supported)? Similar, what if a server decides to not support
for example table-spec v1 and just drop the manifest-file list in
a table snapshot leading to data loss?

IMO capabilities must contain the table/view/... spec versions
supported by the server.

There's also the concern about the behavior if the `capabilties`
field is missing (see
https://github.com/apache/iceberg/pull/9940/files#r1676113409, not
sure why the comment thread's resolved). The PR currently says:
"tables -> default capability in case the `capabilities` property
doesn't exist or is empty in the response" - meaning: the server
would _only_ support tables. This phrase in the spec proposal
effectively removes the view functionality from all currently
existing Iceberg REST implementations.


On 11.07.24 08:42, Eduard Tudenhöfner wrote:

Are there any other concerns with the proposal or should we start
a VOTE thread?

Eduard

On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov

 wrote:

Re: remote signing, I agree that it does not look
like a server capability that a client can / should
discover. It is more like something that the server
instructs / configures the client to do.

While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems
more appropriate to indicate that this capability isn't
supported rather than a generic 501, because not every
server will support remote signing.


Good point regarding clients taking initiative and using
request singing without an explicit server-provided config.
It moves the client operations into a mode where the server
has more control (over having longer term client-side
credentials), so it looks like a reasonable mode to support
from the security perspective.

Let's keep that capability flag.

Cheers,
Dmitri.

 

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Eduard Tudenhöfner
I would suggest adding *table-spec / view-spec / udf-spec *capabilities
later when new requirements/updates get added. The current implementation
wouldn't make any use of these capabilities, so I don't see a good enough
reason to add them at this point.

The PR currently says: "tables -> default capability in case the
> `capabilities` property doesn't exist or is empty in the response" -
> meaning: the server would _only_ support tables. This phrase in the spec
> proposal effectively removes the view functionality from all currently
> existing Iceberg REST implementations.


This is why the configurable fallback mechanism was mentioned in the
Catalog sync, which can be realized with *r*
*est-default-capabilities=tables,views,abc,xyz* (all of them defaulting to
version 1). A server could send that property via the config route without
having clients to change anything.


On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp  wrote:

> Hi,
>
> I still have concerns regarding the missing table-spec/view-spec
> capabilities. Newer clients can send create/update requests with
> requirements/updates of newer Iceberg table/view/udf specs to a server that
> doesn't support those spec versions - the outcome is rather undefined. What
> should a server do? Ignore the unknown fields and requirement/update types
> and hence do what it's potentially _not_ supposed to do? Reply with a then
> ambiguous 501 (is it the endpoint that's not implemented or the request
> content not supported)? Similar, what if a server decides to not support
> for example table-spec v1 and just drop the manifest-file list in a table
> snapshot leading to data loss?
>
> IMO capabilities must contain the table/view/... spec versions supported
> by the server.
>
> There's also the concern about the behavior if the `capabilties` field is
> missing (see https://github.com/apache/iceberg/pull/9940/files#r1676113409,
> not sure why the comment thread's resolved). The PR currently says: "tables
> -> default capability in case the `capabilities` property doesn't exist or
> is empty in the response" - meaning: the server would _only_ support
> tables. This phrase in the spec proposal effectively removes the view
> functionality from all currently existing Iceberg REST implementations.
>
>
> On 11.07.24 08:42, Eduard Tudenhöfner wrote:
>
> Are there any other concerns with the proposal or should we start a VOTE
> thread?
>
> Eduard
>
> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
> 
>  wrote:
>
>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>>
>> Good point regarding clients taking initiative and using request singing
>> without an explicit server-provided config. It moves the client operations
>> into a mode where the server has more control (over having longer term
>> client-side credentials), so it looks like a reasonable mode to support
>> from the security perspective.
>>
>> Let's keep that capability flag.
>>
>> Cheers,
>> Dmitri.
>>
>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I've added a few inline comments below.
>>>
>>>
>>>
 Re: remote signing, I agree that it does not look like a server
 capability that a client can / should discover. It is more like something
 that the server instructs / configures the client to do.
>>>
>>>
>>> While a server can control this behavior and instruct the client to use
>>> remote signing, technically nothing is preventing a client from configuring
>>> s3.remote-signing-enabled=true. In such a case it seems more
>>> appropriate to indicate that this capability isn't supported rather than a
>>> generic 501, because not every server will support remote signing.
>>>
>>> The *vended-credentials* capability on the other hand is more
>>> informative in its nature and a server indeed configures a client. I think
>>> that was also one of the reasons I removed this capability but added it
>>> later back due to a comment from Jack.
>>>
>>> I'm ok either way in terms of removing / keeping *vended-credentials*
>>> as a capability but given that we'd want to include *actionable* 
>>> capabilities
>>> at this point, I'd just remove it (nothing is preventing us from adding it
>>> later if necessary).
>>>
>>>
>>> In that case, why do we need all these other capabilities like tables,
 remote-signing, etc. in the first place?
>>>
>>>
>>> Given that capabilities also carry versioning information, clients can
>>> make m

Re: [DISCUSS] Describing REST Server capabilities

2024-07-15 Thread Robert Stupp

Hi,

I still have concerns regarding the missing table-spec/view-spec 
capabilities. Newer clients can send create/update requests with 
requirements/updates of newer Iceberg table/view/udf specs to a server 
that doesn't support those spec versions - the outcome is rather 
undefined. What should a server do? Ignore the unknown fields and 
requirement/update types and hence do what it's potentially _not_ 
supposed to do? Reply with a then ambiguous 501 (is it the endpoint 
that's not implemented or the request content not supported)? Similar, 
what if a server decides to not support for example table-spec v1 and 
just drop the manifest-file list in a table snapshot leading to data loss?


IMO capabilities must contain the table/view/... spec versions supported 
by the server.


There's also the concern about the behavior if the `capabilties` field 
is missing (see 
https://github.com/apache/iceberg/pull/9940/files#r1676113409, not sure 
why the comment thread's resolved). The PR currently says: "tables -> 
default capability in case the `capabilities` property doesn't exist or 
is empty in the response" - meaning: the server would _only_ support 
tables. This phrase in the spec proposal effectively removes the view 
functionality from all currently existing Iceberg REST implementations.



On 11.07.24 08:42, Eduard Tudenhöfner wrote:
Are there any other concerns with the proposal or should we start a 
VOTE thread?


Eduard

On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov 
 wrote:


Re: remote signing, I agree that it does not look like a
server capability that a client can / should discover. It
is more like something that the server instructs /
configures the client to do.

While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more
appropriate to indicate that this capability isn't supported
rather than a generic 501, because not every server will
support remote signing.


Good point regarding clients taking initiative and using request
singing without an explicit server-provided config. It moves the
client operations into a mode where the server has more control
(over having longer term client-side credentials), so it looks
like a reasonable mode to support from the security perspective.

Let's keep that capability flag.

Cheers,
Dmitri.

On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner
 wrote:

Hey everyone,

I've added a few inline comments below.

Re: remote signing, I agree that it does not look like a
server capability that a client can / should discover. It
is more like something that the server instructs /
configures the client to do.

While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more
appropriate to indicate that this capability isn't supported
rather than a generic 501, because not every server will
support remote signing.

The *vended-credentials* capability on the other hand is more
informative in its nature and a server indeed configures a
client. I think that was also one of the reasons I removed
this capability but added it later back due to a comment from
Jack.

I'm ok either way in terms of removing / keeping
*vended-credentials* as a capability but given that we'd want
to include *actionable* capabilities at this point, I'd just
remove it (nothing is preventing us from adding it later if
necessary).


In that case, why do we need all these other capabilities
like tables, remote-signing, etc. in the first place?


Given that capabilities also carry versioning information,
clients can make more informed decisions on which endpoints to
call. One could argue that generally throwing a 501 on
everything that isn't supported might be sufficient, but that
doesn't necessarily help a client in knowing which versions of
a capability are safe to call/use.

Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be *tables* (with
version 1) with a property in the REST catalog that allows
configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz* (all of them
defaulting to version 1).


Eduard


On Tue, Jul 9, 2024 at 7:00 PM Jack Ye 
wrote:

Yes I agree that sounds like a valid use case. So the
criteria so far is that capabilities 

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Eduard Tudenhöfner
Let's remove the *remote-signing* capability for now and go with *tables /
views / multi-table-commit*. As I mentioned earlier, we can always add it
when there's a clear benefit.

Eduard

On Fri, Jul 12, 2024 at 5:09 PM Dmitri Bourlatchkov
 wrote:

> After more thinking about the "remote signing" capability flag, I am still
> not sure it is actually useful for making decisions on the client side.
>
> Granted, the client may have s3.remote-signing-enabled=true set
> independently of the server and then use the remote signing call paths.
> However, in this case the capability flag is irrelevant. Whoever sets
> s3.remote-signing-enabled=true must have prior knowledge that remote
> signing is available.
>
> If we use the "remote signing" capability flag only to produce nicer
> user-level error messages, will it not be an extra burden on server
> implementations to keep this flag in sync with the actual behaviour of the
> signing endpoint?
>
> The signing endpoint responses may differ from one request to another
> (e.g. due to different access credentials), so the REST client has to deal
> with the full range of possible error responses even when the "remote
> signing" capability flag is set.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Thu, Jul 11, 2024 at 11:31 PM Jack Ye  wrote:
>
>> > While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>> This is what I did not fully understand, because it seems like we are
>> saying in addition to the criteria of using capability to:
>> - controlling client-side fallback behavior
>> - failing expensive operations early if we know it will eventually fail
>> due to missing capability
>>
>> you also try to fine-tune any client side behavior to match server side
>> capabilities so it fails early and more gracefully rather than just a
>> generic 501. But why does this logic not apply to per-endpoint versioning?
>> Isn't it also nice to just fail at client side instead of calling server
>> and getting a "generic 501"?
>>
>> -Jack
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 11, 2024 at 9:51 AM Jack Ye  wrote:
>>
>>> Sorry I will take a look at the new comments later today.
>>>
>>> -Jack
>>>
>>> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
 Are there any other concerns with the proposal or should we start a
 VOTE thread?

 Eduard

 On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
  wrote:

> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to
> use remote signing, technically nothing is preventing a client from
> configuring s3.remote-signing-enabled=true. In such a case it seems
> more appropriate to indicate that this capability isn't supported rather
> than a generic 501, because not every server will support remote signing.
>
>
> Good point regarding clients taking initiative and using request
> singing without an explicit server-provided config. It moves the client
> operations into a mode where the server has more control (over having
> longer term client-side credentials), so it looks like a reasonable mode 
> to
> support from the security perspective.
>
> Let's keep that capability flag.
>
> Cheers,
> Dmitri.
>
> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I've added a few inline comments below.
>>
>>
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like 
>>> something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to
>> use remote signing, technically nothing is preventing a client from
>> configuring s3.remote-signing-enabled=true. In such a case it seems
>> more appropriate to indicate that this capability isn't supported rather
>> than a generic 501, because not every server will support remote signing.
>>
>> The *vended-credentials* capability on the other hand is more
>> informative in its nature and a server indeed configures a client. I 
>> think
>> that was also one of the reasons I removed this capability but added it
>> later back due to a comment from Jack.
>>
>> I'm ok either way in terms of removing / keeping *vended-crede

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Dmitri Bourlatchkov
After more thinking about the "remote signing" capability flag, I am still
not sure it is actually useful for making decisions on the client side.

Granted, the client may have s3.remote-signing-enabled=true set
independently of the server and then use the remote signing call paths.
However, in this case the capability flag is irrelevant. Whoever sets
s3.remote-signing-enabled=true must have prior knowledge that remote
signing is available.

If we use the "remote signing" capability flag only to produce nicer
user-level error messages, will it not be an extra burden on server
implementations to keep this flag in sync with the actual behaviour of the
signing endpoint?

The signing endpoint responses may differ from one request to another (e.g.
due to different access credentials), so the REST client has to deal with
the full range of possible error responses even when the "remote signing"
capability flag is set.

WDYT?

Thanks,
Dmitri.

On Thu, Jul 11, 2024 at 11:31 PM Jack Ye  wrote:

> > While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
> This is what I did not fully understand, because it seems like we are
> saying in addition to the criteria of using capability to:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually fail
> due to missing capability
>
> you also try to fine-tune any client side behavior to match server side
> capabilities so it fails early and more gracefully rather than just a
> generic 501. But why does this logic not apply to per-endpoint versioning?
> Isn't it also nice to just fail at client side instead of calling server
> and getting a "generic 501"?
>
> -Jack
>
>
>
>
>
>
>
> On Thu, Jul 11, 2024 at 9:51 AM Jack Ye  wrote:
>
>> Sorry I will take a look at the new comments later today.
>>
>> -Jack
>>
>> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Are there any other concerns with the proposal or should we start a VOTE
>>> thread?
>>>
>>> Eduard
>>>
>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>  wrote:
>>>
 Re: remote signing, I agree that it does not look like a server
> capability that a client can / should discover. It is more like something
> that the server instructs / configures the client to do.


 While a server can control this behavior and instruct the client to use
 remote signing, technically nothing is preventing a client from configuring
 s3.remote-signing-enabled=true. In such a case it seems more
 appropriate to indicate that this capability isn't supported rather than a
 generic 501, because not every server will support remote signing.


 Good point regarding clients taking initiative and using request
 singing without an explicit server-provided config. It moves the client
 operations into a mode where the server has more control (over having
 longer term client-side credentials), so it looks like a reasonable mode to
 support from the security perspective.

 Let's keep that capability flag.

 Cheers,
 Dmitri.

 On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
 [email protected]> wrote:

> Hey everyone,
>
> I've added a few inline comments below.
>
>
>
>> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to
> use remote signing, technically nothing is preventing a client from
> configuring s3.remote-signing-enabled=true. In such a case it seems
> more appropriate to indicate that this capability isn't supported rather
> than a generic 501, because not every server will support remote signing.
>
> The *vended-credentials* capability on the other hand is more
> informative in its nature and a server indeed configures a client. I think
> that was also one of the reasons I removed this capability but added it
> later back due to a comment from Jack.
>
> I'm ok either way in terms of removing / keeping *vended-credentials*
> as a capability but given that we'd want to include *actionable* 
> capabilities
> at this point, I'd just remove it (nothing is preventing us from adding it
> later if necessary).
>
>
> In that case, why do we need all these other capabilities like tables,
>> remote-signing, etc. in the first place?
>
>
> Given that capabilities also carry versioning in

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Eduard Tudenhöfner
>
> But why does this logic not apply to per-endpoint versioning? Isn't it
> also nice to just fail at client side instead of calling server and getting
> a "generic 501"?


Yes of course that would be nice, but that would be at the cost of having
finer-grained capabilities which we want to avoid based on recent
discussions.
If you feel strongly about the *remote-signing* capability, then we can
remove it at this point (and add it later if necessary), but my thinking
with *remote-signing* is that this capability is *actionable* on the client
side.

So to summarize things in the PR, there are the following *capabilities*
with versioning information:

   - *tables*
   - *views*
   - *remote-signing*
   - *multi-table-commit*

For servers that only *partially* implement endpoints under a capability
the spec requires the server to throw a *501 Not Implemented*.

The default fallback behavior when a newer client talks to an older server
that doesn't send *capabilities *would be *tables* (with version 1). That
fallback behavior can be configured via a property, such as *r*
*est-default-capabilities=tables,views,abc,xyz* (all of them defaulting to
version 1).

Please provide additional feedback if necessary or signal that you're ok
with having a vote thread.

Thanks
Eduard



On Fri, Jul 12, 2024 at 8:47 AM Ajantha Bhat  wrote:

> Are there any other concerns with the proposal or should we start a VOTE
>> thread?
>
>
> We should summarize the consensus since the thread is quite long
> and then check if anyone has additional points to add before we proceed to
> voting.
>
> - Ajantha
>
>
>
> On Fri, Jul 12, 2024 at 9:09 AM Jack Ye  wrote:
>
>> > While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>> This is what I did not fully understand, because it seems like we are
>> saying in addition to the criteria of using capability to:
>> - controlling client-side fallback behavior
>> - failing expensive operations early if we know it will eventually fail
>> due to missing capability
>>
>> you also try to fine-tune any client side behavior to match server side
>> capabilities so it fails early and more gracefully rather than just a
>> generic 501. But why does this logic not apply to per-endpoint versioning?
>> Isn't it also nice to just fail at client side instead of calling server
>> and getting a "generic 501"?
>>
>> -Jack
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 11, 2024 at 9:51 AM Jack Ye  wrote:
>>
>>> Sorry I will take a look at the new comments later today.
>>>
>>> -Jack
>>>
>>> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
 Are there any other concerns with the proposal or should we start a
 VOTE thread?

 Eduard

 On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
  wrote:

> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to
> use remote signing, technically nothing is preventing a client from
> configuring s3.remote-signing-enabled=true. In such a case it seems
> more appropriate to indicate that this capability isn't supported rather
> than a generic 501, because not every server will support remote signing.
>
>
> Good point regarding clients taking initiative and using request
> singing without an explicit server-provided config. It moves the client
> operations into a mode where the server has more control (over having
> longer term client-side credentials), so it looks like a reasonable mode 
> to
> support from the security perspective.
>
> Let's keep that capability flag.
>
> Cheers,
> Dmitri.
>
> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I've added a few inline comments below.
>>
>>
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like 
>>> something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to
>> use remote signing, technically nothing is preventing a client from
>> configuring s3.remote-signing-enabled=true. In such a case it seems
>> more appropriate to indicate that this capability isn't supported rather
>> than a generic 501, because not every server will support remote signing.
>

Re: [DISCUSS] Describing REST Server capabilities

2024-07-11 Thread Ajantha Bhat
>
> Are there any other concerns with the proposal or should we start a VOTE
> thread?


We should summarize the consensus since the thread is quite long
and then check if anyone has additional points to add before we proceed to
voting.

- Ajantha



On Fri, Jul 12, 2024 at 9:09 AM Jack Ye  wrote:

> > While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
> This is what I did not fully understand, because it seems like we are
> saying in addition to the criteria of using capability to:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually fail
> due to missing capability
>
> you also try to fine-tune any client side behavior to match server side
> capabilities so it fails early and more gracefully rather than just a
> generic 501. But why does this logic not apply to per-endpoint versioning?
> Isn't it also nice to just fail at client side instead of calling server
> and getting a "generic 501"?
>
> -Jack
>
>
>
>
>
>
>
> On Thu, Jul 11, 2024 at 9:51 AM Jack Ye  wrote:
>
>> Sorry I will take a look at the new comments later today.
>>
>> -Jack
>>
>> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Are there any other concerns with the proposal or should we start a VOTE
>>> thread?
>>>
>>> Eduard
>>>
>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>  wrote:
>>>
 Re: remote signing, I agree that it does not look like a server
> capability that a client can / should discover. It is more like something
> that the server instructs / configures the client to do.


 While a server can control this behavior and instruct the client to use
 remote signing, technically nothing is preventing a client from configuring
 s3.remote-signing-enabled=true. In such a case it seems more
 appropriate to indicate that this capability isn't supported rather than a
 generic 501, because not every server will support remote signing.


 Good point regarding clients taking initiative and using request
 singing without an explicit server-provided config. It moves the client
 operations into a mode where the server has more control (over having
 longer term client-side credentials), so it looks like a reasonable mode to
 support from the security perspective.

 Let's keep that capability flag.

 Cheers,
 Dmitri.

 On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
 [email protected]> wrote:

> Hey everyone,
>
> I've added a few inline comments below.
>
>
>
>> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to
> use remote signing, technically nothing is preventing a client from
> configuring s3.remote-signing-enabled=true. In such a case it seems
> more appropriate to indicate that this capability isn't supported rather
> than a generic 501, because not every server will support remote signing.
>
> The *vended-credentials* capability on the other hand is more
> informative in its nature and a server indeed configures a client. I think
> that was also one of the reasons I removed this capability but added it
> later back due to a comment from Jack.
>
> I'm ok either way in terms of removing / keeping *vended-credentials*
> as a capability but given that we'd want to include *actionable* 
> capabilities
> at this point, I'd just remove it (nothing is preventing us from adding it
> later if necessary).
>
>
> In that case, why do we need all these other capabilities like tables,
>> remote-signing, etc. in the first place?
>
>
> Given that capabilities also carry versioning information, clients can
> make more informed decisions on which endpoints to call. One could argue
> that generally throwing a 501 on everything that isn't supported might be
> sufficient, but that doesn't necessarily help a client in knowing which
> versions of a capability are safe to call/use.
>
> Regarding the control of client-side fallback behavior:
> I think the default fallback behavior should be *tables* (with
> version 1) with a property in the REST catalog that allows configuring 
> this
> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of them
> defaulting to version 1).
>
>
> Eduard
>
>
> On Tue, Ju

Re: [DISCUSS] Describing REST Server capabilities

2024-07-11 Thread Jack Ye
> While a server can control this behavior and instruct the client to use
remote signing, technically nothing is preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more appropriate to
indicate that this capability isn't supported rather than a generic 501,
because not every server will support remote signing.

This is what I did not fully understand, because it seems like we are
saying in addition to the criteria of using capability to:
- controlling client-side fallback behavior
- failing expensive operations early if we know it will eventually fail due
to missing capability

you also try to fine-tune any client side behavior to match server side
capabilities so it fails early and more gracefully rather than just a
generic 501. But why does this logic not apply to per-endpoint versioning?
Isn't it also nice to just fail at client side instead of calling server
and getting a "generic 501"?

-Jack







On Thu, Jul 11, 2024 at 9:51 AM Jack Ye  wrote:

> Sorry I will take a look at the new comments later today.
>
> -Jack
>
> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Are there any other concerns with the proposal or should we start a VOTE
>> thread?
>>
>> Eduard
>>
>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>  wrote:
>>
>>> Re: remote signing, I agree that it does not look like a server
 capability that a client can / should discover. It is more like something
 that the server instructs / configures the client to do.
>>>
>>>
>>> While a server can control this behavior and instruct the client to use
>>> remote signing, technically nothing is preventing a client from configuring
>>> s3.remote-signing-enabled=true. In such a case it seems more
>>> appropriate to indicate that this capability isn't supported rather than a
>>> generic 501, because not every server will support remote signing.
>>>
>>>
>>> Good point regarding clients taking initiative and using request singing
>>> without an explicit server-provided config. It moves the client operations
>>> into a mode where the server has more control (over having longer term
>>> client-side credentials), so it looks like a reasonable mode to support
>>> from the security perspective.
>>>
>>> Let's keep that capability flag.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
 Hey everyone,

 I've added a few inline comments below.



> Re: remote signing, I agree that it does not look like a server
> capability that a client can / should discover. It is more like something
> that the server instructs / configures the client to do.


 While a server can control this behavior and instruct the client to use
 remote signing, technically nothing is preventing a client from configuring
 s3.remote-signing-enabled=true. In such a case it seems more
 appropriate to indicate that this capability isn't supported rather than a
 generic 501, because not every server will support remote signing.

 The *vended-credentials* capability on the other hand is more
 informative in its nature and a server indeed configures a client. I think
 that was also one of the reasons I removed this capability but added it
 later back due to a comment from Jack.

 I'm ok either way in terms of removing / keeping *vended-credentials*
 as a capability but given that we'd want to include *actionable* 
 capabilities
 at this point, I'd just remove it (nothing is preventing us from adding it
 later if necessary).


 In that case, why do we need all these other capabilities like tables,
> remote-signing, etc. in the first place?


 Given that capabilities also carry versioning information, clients can
 make more informed decisions on which endpoints to call. One could argue
 that generally throwing a 501 on everything that isn't supported might be
 sufficient, but that doesn't necessarily help a client in knowing which
 versions of a capability are safe to call/use.

 Regarding the control of client-side fallback behavior:
 I think the default fallback behavior should be *tables* (with version
 1) with a property in the REST catalog that allows configuring this to e.g.
 *rest-default-capabilities=tables,views,abc,xyz* (all of them
 defaulting to version 1).


 Eduard


 On Tue, Jul 9, 2024 at 7:00 PM Jack Ye  wrote:

> Yes I agree that sounds like a valid use case. So the criteria so far
> is that capabilities are used for:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually
> fail due to missing capability
>
> Do we agree if this is the criteria we should use? What about the
> other capabilities, namly tables, remote

Re: [DISCUSS] Describing REST Server capabilities

2024-07-11 Thread Jack Ye
Sorry I will take a look at the new comments later today.

-Jack

On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner 
wrote:

> Are there any other concerns with the proposal or should we start a VOTE
> thread?
>
> Eduard
>
> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>  wrote:
>
>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>>
>> Good point regarding clients taking initiative and using request singing
>> without an explicit server-provided config. It moves the client operations
>> into a mode where the server has more control (over having longer term
>> client-side credentials), so it looks like a reasonable mode to support
>> from the security perspective.
>>
>> Let's keep that capability flag.
>>
>> Cheers,
>> Dmitri.
>>
>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I've added a few inline comments below.
>>>
>>>
>>>
 Re: remote signing, I agree that it does not look like a server
 capability that a client can / should discover. It is more like something
 that the server instructs / configures the client to do.
>>>
>>>
>>> While a server can control this behavior and instruct the client to use
>>> remote signing, technically nothing is preventing a client from configuring
>>> s3.remote-signing-enabled=true. In such a case it seems more
>>> appropriate to indicate that this capability isn't supported rather than a
>>> generic 501, because not every server will support remote signing.
>>>
>>> The *vended-credentials* capability on the other hand is more
>>> informative in its nature and a server indeed configures a client. I think
>>> that was also one of the reasons I removed this capability but added it
>>> later back due to a comment from Jack.
>>>
>>> I'm ok either way in terms of removing / keeping *vended-credentials*
>>> as a capability but given that we'd want to include *actionable* 
>>> capabilities
>>> at this point, I'd just remove it (nothing is preventing us from adding it
>>> later if necessary).
>>>
>>>
>>> In that case, why do we need all these other capabilities like tables,
 remote-signing, etc. in the first place?
>>>
>>>
>>> Given that capabilities also carry versioning information, clients can
>>> make more informed decisions on which endpoints to call. One could argue
>>> that generally throwing a 501 on everything that isn't supported might be
>>> sufficient, but that doesn't necessarily help a client in knowing which
>>> versions of a capability are safe to call/use.
>>>
>>> Regarding the control of client-side fallback behavior:
>>> I think the default fallback behavior should be *tables* (with version
>>> 1) with a property in the REST catalog that allows configuring this to e.g.
>>> *rest-default-capabilities=tables,views,abc,xyz* (all of them
>>> defaulting to version 1).
>>>
>>>
>>> Eduard
>>>
>>>
>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye  wrote:
>>>
 Yes I agree that sounds like a valid use case. So the criteria so far
 is that capabilities are used for:
 - controlling client-side fallback behavior
 - failing expensive operations early if we know it will eventually fail
 due to missing capability

 Do we agree if this is the criteria we should use? What about the other
 capabilities, namly tables, remote-signing, credential-vending?

 -Jack


 On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue 
 wrote:

> > does it make a difference if I declare the capability or not?
>
> I think that it does in other cases. Multi-table commits, for example,
> are a building block for multi-statement transactions. If a service 
> doesn't
> support multi-table commits then we ideally want clients to know that 
> ahead
> of time so that they don't run a big transaction and then fail because the
> commit is not supported.
>
> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>  wrote:
>
>> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>>
>> Cheers,
>> Dmitri.
>>
>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:
>>
>>> I was reconciling the discussion yesterday, one point that was
>>> interesting to me was that we agreed the purpose of these capabilities 
>>> is
>>> t

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Eduard Tudenhöfner
Are there any other concerns with the proposal or should we start a VOTE
thread?

Eduard

On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
 wrote:

> Re: remote signing, I agree that it does not look like a server capability
>> that a client can / should discover. It is more like something that the
>> server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
>
> Good point regarding clients taking initiative and using request singing
> without an explicit server-provided config. It moves the client operations
> into a mode where the server has more control (over having longer term
> client-side credentials), so it looks like a reasonable mode to support
> from the security perspective.
>
> Let's keep that capability flag.
>
> Cheers,
> Dmitri.
>
> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I've added a few inline comments below.
>>
>>
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>> The *vended-credentials* capability on the other hand is more
>> informative in its nature and a server indeed configures a client. I think
>> that was also one of the reasons I removed this capability but added it
>> later back due to a comment from Jack.
>>
>> I'm ok either way in terms of removing / keeping *vended-credentials* as
>> a capability but given that we'd want to include *actionable* capabilities
>> at this point, I'd just remove it (nothing is preventing us from adding it
>> later if necessary).
>>
>>
>> In that case, why do we need all these other capabilities like tables,
>>> remote-signing, etc. in the first place?
>>
>>
>> Given that capabilities also carry versioning information, clients can
>> make more informed decisions on which endpoints to call. One could argue
>> that generally throwing a 501 on everything that isn't supported might be
>> sufficient, but that doesn't necessarily help a client in knowing which
>> versions of a capability are safe to call/use.
>>
>> Regarding the control of client-side fallback behavior:
>> I think the default fallback behavior should be *tables* (with version
>> 1) with a property in the REST catalog that allows configuring this to e.g.
>> *rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting
>> to version 1).
>>
>>
>> Eduard
>>
>>
>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye  wrote:
>>
>>> Yes I agree that sounds like a valid use case. So the criteria so far is
>>> that capabilities are used for:
>>> - controlling client-side fallback behavior
>>> - failing expensive operations early if we know it will eventually fail
>>> due to missing capability
>>>
>>> Do we agree if this is the criteria we should use? What about the other
>>> capabilities, namly tables, remote-signing, credential-vending?
>>>
>>> -Jack
>>>
>>>
>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue 
>>> wrote:
>>>
 > does it make a difference if I declare the capability or not?

 I think that it does in other cases. Multi-table commits, for example,
 are a building block for multi-statement transactions. If a service doesn't
 support multi-table commits then we ideally want clients to know that ahead
 of time so that they don't run a big transaction and then fail because the
 commit is not supported.

 On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
  wrote:

> Re: remote signing, I agree that it does not look like a server
> capability that a client can / should discover. It is more like something
> that the server instructs / configures the client to do.
>
> Cheers,
> Dmitri.
>
> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:
>
>> I was reconciling the discussion yesterday, one point that was
>> interesting to me was that we agreed the purpose of these capabilities is
>> to "control client-side fallback behavior", or at least the client should
>> behave differently based on these capabilities. However, this seems to be
>> only needed so far for views, or more specifically, for loadView API only
>> because it impacts the fallback behavi

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Dmitri Bourlatchkov
Re: remote signing, I agree that it does not look like a server capability
> that a client can / should discover. It is more like something that the
> server instructs / configures the client to do.


While a server can control this behavior and instruct the client to use
remote signing, technically nothing is preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more appropriate to
indicate that this capability isn't supported rather than a generic 501,
because not every server will support remote signing.


Good point regarding clients taking initiative and using request singing
without an explicit server-provided config. It moves the client operations
into a mode where the server has more control (over having longer term
client-side credentials), so it looks like a reasonable mode to support
from the security perspective.

Let's keep that capability flag.

Cheers,
Dmitri.

On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner 
wrote:

> Hey everyone,
>
> I've added a few inline comments below.
>
>
>
>> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
> The *vended-credentials* capability on the other hand is more informative
> in its nature and a server indeed configures a client. I think that was
> also one of the reasons I removed this capability but added it later back
> due to a comment from Jack.
>
> I'm ok either way in terms of removing / keeping *vended-credentials* as
> a capability but given that we'd want to include *actionable* capabilities
> at this point, I'd just remove it (nothing is preventing us from adding it
> later if necessary).
>
>
> In that case, why do we need all these other capabilities like tables,
>> remote-signing, etc. in the first place?
>
>
> Given that capabilities also carry versioning information, clients can
> make more informed decisions on which endpoints to call. One could argue
> that generally throwing a 501 on everything that isn't supported might be
> sufficient, but that doesn't necessarily help a client in knowing which
> versions of a capability are safe to call/use.
>
> Regarding the control of client-side fallback behavior:
> I think the default fallback behavior should be *tables* (with version 1)
> with a property in the REST catalog that allows configuring this to e.g.
> *rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting
> to version 1).
>
>
> Eduard
>
>
> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye  wrote:
>
>> Yes I agree that sounds like a valid use case. So the criteria so far is
>> that capabilities are used for:
>> - controlling client-side fallback behavior
>> - failing expensive operations early if we know it will eventually fail
>> due to missing capability
>>
>> Do we agree if this is the criteria we should use? What about the other
>> capabilities, namly tables, remote-signing, credential-vending?
>>
>> -Jack
>>
>>
>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue 
>> wrote:
>>
>>> > does it make a difference if I declare the capability or not?
>>>
>>> I think that it does in other cases. Multi-table commits, for example,
>>> are a building block for multi-statement transactions. If a service doesn't
>>> support multi-table commits then we ideally want clients to know that ahead
>>> of time so that they don't run a big transaction and then fail because the
>>> commit is not supported.
>>>
>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>  wrote:
>>>
 Re: remote signing, I agree that it does not look like a server
 capability that a client can / should discover. It is more like something
 that the server instructs / configures the client to do.

 Cheers,
 Dmitri.

 On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:

> I was reconciling the discussion yesterday, one point that was
> interesting to me was that we agreed the purpose of these capabilities is
> to "control client-side fallback behavior", or at least the client should
> behave differently based on these capabilities. However, this seems to be
> only needed so far for views, or more specifically, for loadView API only
> because it impacts the fallback behavior to resolve the identifier as a
> table or not.
>
> For all the other capabilities listed, and even the other endpoints in
> view, because a server can decide to implement it partially anyway and 
> just
> document the behavior, does it make a difference if I declare the
> capabil

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Eduard Tudenhöfner
Hey everyone,

I've added a few inline comments below.



> Re: remote signing, I agree that it does not look like a server capability
> that a client can / should discover. It is more like something that the
> server instructs / configures the client to do.


While a server can control this behavior and instruct the client to use
remote signing, technically nothing is preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more appropriate to
indicate that this capability isn't supported rather than a generic 501,
because not every server will support remote signing.

The *vended-credentials* capability on the other hand is more informative
in its nature and a server indeed configures a client. I think that was
also one of the reasons I removed this capability but added it later back
due to a comment from Jack.

I'm ok either way in terms of removing / keeping *vended-credentials* as a
capability but given that we'd want to include *actionable* capabilities at
this point, I'd just remove it (nothing is preventing us from adding it
later if necessary).


In that case, why do we need all these other capabilities like tables,
> remote-signing, etc. in the first place?


Given that capabilities also carry versioning information, clients can make
more informed decisions on which endpoints to call. One could argue that
generally throwing a 501 on everything that isn't supported might be
sufficient, but that doesn't necessarily help a client in knowing which
versions of a capability are safe to call/use.

Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be *tables* (with version 1)
with a property in the REST catalog that allows configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting to
version 1).


Eduard


On Tue, Jul 9, 2024 at 7:00 PM Jack Ye  wrote:

> Yes I agree that sounds like a valid use case. So the criteria so far is
> that capabilities are used for:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually fail
> due to missing capability
>
> Do we agree if this is the criteria we should use? What about the other
> capabilities, namly tables, remote-signing, credential-vending?
>
> -Jack
>
>
> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue 
> wrote:
>
>> > does it make a difference if I declare the capability or not?
>>
>> I think that it does in other cases. Multi-table commits, for example,
>> are a building block for multi-statement transactions. If a service doesn't
>> support multi-table commits then we ideally want clients to know that ahead
>> of time so that they don't run a big transaction and then fail because the
>> commit is not supported.
>>
>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>  wrote:
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:
>>>
 I was reconciling the discussion yesterday, one point that was
 interesting to me was that we agreed the purpose of these capabilities is
 to "control client-side fallback behavior", or at least the client should
 behave differently based on these capabilities. However, this seems to be
 only needed so far for views, or more specifically, for loadView API only
 because it impacts the fallback behavior to resolve the identifier as a
 table or not.

 For all the other capabilities listed, and even the other endpoints in
 view, because a server can decide to implement it partially anyway and just
 document the behavior, does it make a difference if I declare the
 capability or not? The client will not stop the request, the server will
 just error out if it is not supported. Maybe the error is not in the
 expected code or message, but it is still an error. In that case, why do we
 need all these other capabilities like tables, remote-signing, etc. in the
 first place?

 Maybe it is too extreme of a thought, but could anyone help describe
 how the other capabilities could be used beyond potentially returning an
 error earlier?

 -Jack




 On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
  wrote:

> Hi Eduard,
>
> > I've also added the 501 error to the response of the respective
> endpoints but worth mentioning that *HEAD* / *GET *requests must not
> return a 501
>  (this
> implies that the server impl would e.g. return a *404* in such a
> case).
>
> My reading on the Mozilla page makes me think that it is phrased too
> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
> respon

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Jack Ye
Yes I agree that sounds like a valid use case. So the criteria so far is
that capabilities are used for:
- controlling client-side fallback behavior
- failing expensive operations early if we know it will eventually fail due
to missing capability

Do we agree if this is the criteria we should use? What about the other
capabilities, namly tables, remote-signing, credential-vending?

-Jack


On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue 
wrote:

> > does it make a difference if I declare the capability or not?
>
> I think that it does in other cases. Multi-table commits, for example, are
> a building block for multi-statement transactions. If a service doesn't
> support multi-table commits then we ideally want clients to know that ahead
> of time so that they don't run a big transaction and then fail because the
> commit is not supported.
>
> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>  wrote:
>
>> Re: remote signing, I agree that it does not look like a server
>> capability that a client can / should discover. It is more like something
>> that the server instructs / configures the client to do.
>>
>> Cheers,
>> Dmitri.
>>
>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:
>>
>>> I was reconciling the discussion yesterday, one point that was
>>> interesting to me was that we agreed the purpose of these capabilities is
>>> to "control client-side fallback behavior", or at least the client should
>>> behave differently based on these capabilities. However, this seems to be
>>> only needed so far for views, or more specifically, for loadView API only
>>> because it impacts the fallback behavior to resolve the identifier as a
>>> table or not.
>>>
>>> For all the other capabilities listed, and even the other endpoints in
>>> view, because a server can decide to implement it partially anyway and just
>>> document the behavior, does it make a difference if I declare the
>>> capability or not? The client will not stop the request, the server will
>>> just error out if it is not supported. Maybe the error is not in the
>>> expected code or message, but it is still an error. In that case, why do we
>>> need all these other capabilities like tables, remote-signing, etc. in the
>>> first place?
>>>
>>> Maybe it is too extreme of a thought, but could anyone help describe how
>>> the other capabilities could be used beyond potentially returning an error
>>> earlier?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>  wrote:
>>>
 Hi Eduard,

 > I've also added the 501 error to the response of the respective
 endpoints but worth mentioning that *HEAD* / *GET *requests must not
 return a 501
  (this
 implies that the server impl would e.g. return a *404* in such a case).

 My reading on the Mozilla page makes me think that it is phrased too
 narrowly. Reading RFC 2616 [1] I believe that it does not preclude
 responding with 501 to GET and HEAD requests. I think it means that GET and
 HEAD methods must be supported by "general purpose" servers. The Iceberg
 REST server is not a general purpose server for resources. So, I think it
 should be fine to respond with 501 to unimplemented endpoints.

 Cheers,
 Dmitri.

 [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1

 On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
 [email protected]> wrote:

> Hey everyone,
>
> I watched the catalog sync recording today and updated the PR
>  to remove fine-grained
> capabilities like *register-table / table-metrics*.
>
> The current capabilities (with versioning information) in the PR are:
>
>- tables
>- views
>- remote-signing
>- vended-credentials
>- multi-table-commit
>
> For servers that only *partially* implement endpoints under a
> capability the spec requires the server to throw a *501 Not
> Implemented*. I've also added the 501 error to the response of the
> respective endpoints but worth mentioning that *HEAD* / *GET *requests
> must not return a 501
>  (this
> implies that the server impl would e.g. return a *404* in such a
> case).
>
>
> Regards
> Eduard
>
> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Eduard,
>>
>> It makes sense to return 501 for servers which don't implement all
>> endpoints. It means that the server will at least have to implement
>> empty endpoints if needed (that makes sense to me).
>>
>> I think we should focus on only "identified capabilities". I think
>> that I proposed before that the capabilities can be
>> overridden/provided by server implementation. Else, I'm afraid we
>> won't be flexibl

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Ryan Blue
> does it make a difference if I declare the capability or not?

I think that it does in other cases. Multi-table commits, for example, are
a building block for multi-statement transactions. If a service doesn't
support multi-table commits then we ideally want clients to know that ahead
of time so that they don't run a big transaction and then fail because the
commit is not supported.

On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
 wrote:

> Re: remote signing, I agree that it does not look like a server capability
> that a client can / should discover. It is more like something that the
> server instructs / configures the client to do.
>
> Cheers,
> Dmitri.
>
> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:
>
>> I was reconciling the discussion yesterday, one point that was
>> interesting to me was that we agreed the purpose of these capabilities is
>> to "control client-side fallback behavior", or at least the client should
>> behave differently based on these capabilities. However, this seems to be
>> only needed so far for views, or more specifically, for loadView API only
>> because it impacts the fallback behavior to resolve the identifier as a
>> table or not.
>>
>> For all the other capabilities listed, and even the other endpoints in
>> view, because a server can decide to implement it partially anyway and just
>> document the behavior, does it make a difference if I declare the
>> capability or not? The client will not stop the request, the server will
>> just error out if it is not supported. Maybe the error is not in the
>> expected code or message, but it is still an error. In that case, why do we
>> need all these other capabilities like tables, remote-signing, etc. in the
>> first place?
>>
>> Maybe it is too extreme of a thought, but could anyone help describe how
>> the other capabilities could be used beyond potentially returning an error
>> earlier?
>>
>> -Jack
>>
>>
>>
>>
>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>  wrote:
>>
>>> Hi Eduard,
>>>
>>> > I've also added the 501 error to the response of the respective
>>> endpoints but worth mentioning that *HEAD* / *GET *requests must not
>>> return a 501
>>>  (this
>>> implies that the server impl would e.g. return a *404* in such a case).
>>>
>>> My reading on the Mozilla page makes me think that it is phrased too
>>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>>> responding with 501 to GET and HEAD requests. I think it means that GET and
>>> HEAD methods must be supported by "general purpose" servers. The Iceberg
>>> REST server is not a general purpose server for resources. So, I think it
>>> should be fine to respond with 501 to unimplemented endpoints.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>
>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>> [email protected]> wrote:
>>>
 Hey everyone,

 I watched the catalog sync recording today and updated the PR
  to remove fine-grained
 capabilities like *register-table / table-metrics*.

 The current capabilities (with versioning information) in the PR are:

- tables
- views
- remote-signing
- vended-credentials
- multi-table-commit

 For servers that only *partially* implement endpoints under a
 capability the spec requires the server to throw a *501 Not
 Implemented*. I've also added the 501 error to the response of the
 respective endpoints but worth mentioning that *HEAD* / *GET *requests must
 not return a 501
  (this
 implies that the server impl would e.g. return a *404* in such a case).


 Regards
 Eduard

 On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré 
 wrote:

> Hi Eduard,
>
> It makes sense to return 501 for servers which don't implement all
> endpoints. It means that the server will at least have to implement
> empty endpoints if needed (that makes sense to me).
>
> I think we should focus on only "identified capabilities". I think
> that I proposed before that the capabilities can be
> overridden/provided by server implementation. Else, I'm afraid we
> won't be flexible enough or always behind the implementation (if an
> implementation wants to add "my-foo-cap").
>
> Regards
> JB
>
> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>  wrote:
> >
> > I have clarified the wording in #9940 around the requirement on
> having to implement all endpoints under a particular capability.
> >
> > For servers that only partially implement endpoints under a
> capability the spec requires the server to throw a 501 Not Implemented.
> This was suggested by Jack and it seems reasonable 

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Dmitri Bourlatchkov
Re: remote signing, I agree that it does not look like a server capability
that a client can / should discover. It is more like something that the
server instructs / configures the client to do.

Cheers,
Dmitri.

On Tue, Jul 9, 2024 at 12:05 PM Jack Ye  wrote:

> I was reconciling the discussion yesterday, one point that was interesting
> to me was that we agreed the purpose of these capabilities is to "control
> client-side fallback behavior", or at least the client should behave
> differently based on these capabilities. However, this seems to be only
> needed so far for views, or more specifically, for loadView API only
> because it impacts the fallback behavior to resolve the identifier as a
> table or not.
>
> For all the other capabilities listed, and even the other endpoints in
> view, because a server can decide to implement it partially anyway and just
> document the behavior, does it make a difference if I declare the
> capability or not? The client will not stop the request, the server will
> just error out if it is not supported. Maybe the error is not in the
> expected code or message, but it is still an error. In that case, why do we
> need all these other capabilities like tables, remote-signing, etc. in the
> first place?
>
> Maybe it is too extreme of a thought, but could anyone help describe how
> the other capabilities could be used beyond potentially returning an error
> earlier?
>
> -Jack
>
>
>
>
> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>  wrote:
>
>> Hi Eduard,
>>
>> > I've also added the 501 error to the response of the respective
>> endpoints but worth mentioning that *HEAD* / *GET *requests must not
>> return a 501
>>  (this
>> implies that the server impl would e.g. return a *404* in such a case).
>>
>> My reading on the Mozilla page makes me think that it is phrased too
>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>> responding with 501 to GET and HEAD requests. I think it means that GET and
>> HEAD methods must be supported by "general purpose" servers. The Iceberg
>> REST server is not a general purpose server for resources. So, I think it
>> should be fine to respond with 501 to unimplemented endpoints.
>>
>> Cheers,
>> Dmitri.
>>
>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>
>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I watched the catalog sync recording today and updated the PR
>>>  to remove fine-grained
>>> capabilities like *register-table / table-metrics*.
>>>
>>> The current capabilities (with versioning information) in the PR are:
>>>
>>>- tables
>>>- views
>>>- remote-signing
>>>- vended-credentials
>>>- multi-table-commit
>>>
>>> For servers that only *partially* implement endpoints under a
>>> capability the spec requires the server to throw a *501 Not Implemented*.
>>> I've also added the 501 error to the response of the respective endpoints
>>> but worth mentioning that *HEAD* / *GET *requests must not return a 501
>>>  (this
>>> implies that the server impl would e.g. return a *404* in such a case).
>>>
>>>
>>> Regards
>>> Eduard
>>>
>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi Eduard,

 It makes sense to return 501 for servers which don't implement all
 endpoints. It means that the server will at least have to implement
 empty endpoints if needed (that makes sense to me).

 I think we should focus on only "identified capabilities". I think
 that I proposed before that the capabilities can be
 overridden/provided by server implementation. Else, I'm afraid we
 won't be flexible enough or always behind the implementation (if an
 implementation wants to add "my-foo-cap").

 Regards
 JB

 On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
  wrote:
 >
 > I have clarified the wording in #9940 around the requirement on
 having to implement all endpoints under a particular capability.
 >
 > For servers that only partially implement endpoints under a
 capability the spec requires the server to throw a 501 Not Implemented.
 This was suggested by Jack and it seems reasonable to do that.
 >
 > Regarding the inclusion of table-spec / view-spec as a capability: I
 think this might make sense for the next iteration of the REST spec but as
 I mentioned earlier I don't see any clear benefit for the current REST spec
 as the client wouldn't do anything with that information.
 > If there is a clear benefit of having this, then this can still be
 added later to the current REST spec but I believe we should rather have a
 few well-defined and actionable capabilities rather than too many.
 >
 > Eduard
 >

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Jack Ye
I was reconciling the discussion yesterday, one point that was interesting
to me was that we agreed the purpose of these capabilities is to "control
client-side fallback behavior", or at least the client should behave
differently based on these capabilities. However, this seems to be only
needed so far for views, or more specifically, for loadView API only
because it impacts the fallback behavior to resolve the identifier as a
table or not.

For all the other capabilities listed, and even the other endpoints in
view, because a server can decide to implement it partially anyway and just
document the behavior, does it make a difference if I declare the
capability or not? The client will not stop the request, the server will
just error out if it is not supported. Maybe the error is not in the
expected code or message, but it is still an error. In that case, why do we
need all these other capabilities like tables, remote-signing, etc. in the
first place?

Maybe it is too extreme of a thought, but could anyone help describe how
the other capabilities could be used beyond potentially returning an error
earlier?

-Jack




On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
 wrote:

> Hi Eduard,
>
> > I've also added the 501 error to the response of the respective
> endpoints but worth mentioning that *HEAD* / *GET *requests must not
> return a 501
>  (this
> implies that the server impl would e.g. return a *404* in such a case).
>
> My reading on the Mozilla page makes me think that it is phrased too
> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
> responding with 501 to GET and HEAD requests. I think it means that GET and
> HEAD methods must be supported by "general purpose" servers. The Iceberg
> REST server is not a general purpose server for resources. So, I think it
> should be fine to respond with 501 to unimplemented endpoints.
>
> Cheers,
> Dmitri.
>
> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>
> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I watched the catalog sync recording today and updated the PR
>>  to remove fine-grained
>> capabilities like *register-table / table-metrics*.
>>
>> The current capabilities (with versioning information) in the PR are:
>>
>>- tables
>>- views
>>- remote-signing
>>- vended-credentials
>>- multi-table-commit
>>
>> For servers that only *partially* implement endpoints under a capability
>> the spec requires the server to throw a *501 Not Implemented*. I've also
>> added the 501 error to the response of the respective endpoints but worth
>> mentioning that *HEAD* / *GET *requests must not return a 501
>>  (this
>> implies that the server impl would e.g. return a *404* in such a case).
>>
>>
>> Regards
>> Eduard
>>
>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Eduard,
>>>
>>> It makes sense to return 501 for servers which don't implement all
>>> endpoints. It means that the server will at least have to implement
>>> empty endpoints if needed (that makes sense to me).
>>>
>>> I think we should focus on only "identified capabilities". I think
>>> that I proposed before that the capabilities can be
>>> overridden/provided by server implementation. Else, I'm afraid we
>>> won't be flexible enough or always behind the implementation (if an
>>> implementation wants to add "my-foo-cap").
>>>
>>> Regards
>>> JB
>>>
>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>  wrote:
>>> >
>>> > I have clarified the wording in #9940 around the requirement on having
>>> to implement all endpoints under a particular capability.
>>> >
>>> > For servers that only partially implement endpoints under a capability
>>> the spec requires the server to throw a 501 Not Implemented. This was
>>> suggested by Jack and it seems reasonable to do that.
>>> >
>>> > Regarding the inclusion of table-spec / view-spec as a capability: I
>>> think this might make sense for the next iteration of the REST spec but as
>>> I mentioned earlier I don't see any clear benefit for the current REST spec
>>> as the client wouldn't do anything with that information.
>>> > If there is a clear benefit of having this, then this can still be
>>> added later to the current REST spec but I believe we should rather have a
>>> few well-defined and actionable capabilities rather than too many.
>>> >
>>> > Eduard
>>> >
>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu 
>>> wrote:
>>> >>>
>>> >>> Spec is an interesting topic we did not discuss. Robert, how do you
>>> envision this to be used?
>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>> approaches we can go with, taking CreateTable as an example:
>>> >>> (1) increment the related operation version, which means that POST
>>> /v2/{prefi

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Dmitri Bourlatchkov
Hi Eduard,

> I've also added the 501 error to the response of the respective endpoints
but worth mentioning that *HEAD* / *GET *requests must not return a 501
 (this
implies that the server impl would e.g. return a *404* in such a case).

My reading on the Mozilla page makes me think that it is phrased too
narrowly. Reading RFC 2616 [1] I believe that it does not preclude
responding with 501 to GET and HEAD requests. I think it means that GET and
HEAD methods must be supported by "general purpose" servers. The Iceberg
REST server is not a general purpose server for resources. So, I think it
should be fine to respond with 501 to unimplemented endpoints.

Cheers,
Dmitri.

[1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1

On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner 
wrote:

> Hey everyone,
>
> I watched the catalog sync recording today and updated the PR
>  to remove fine-grained
> capabilities like *register-table / table-metrics*.
>
> The current capabilities (with versioning information) in the PR are:
>
>- tables
>- views
>- remote-signing
>- vended-credentials
>- multi-table-commit
>
> For servers that only *partially* implement endpoints under a capability
> the spec requires the server to throw a *501 Not Implemented*. I've also
> added the 501 error to the response of the respective endpoints but worth
> mentioning that *HEAD* / *GET *requests must not return a 501
>  (this
> implies that the server impl would e.g. return a *404* in such a case).
>
>
> Regards
> Eduard
>
> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Eduard,
>>
>> It makes sense to return 501 for servers which don't implement all
>> endpoints. It means that the server will at least have to implement
>> empty endpoints if needed (that makes sense to me).
>>
>> I think we should focus on only "identified capabilities". I think
>> that I proposed before that the capabilities can be
>> overridden/provided by server implementation. Else, I'm afraid we
>> won't be flexible enough or always behind the implementation (if an
>> implementation wants to add "my-foo-cap").
>>
>> Regards
>> JB
>>
>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>  wrote:
>> >
>> > I have clarified the wording in #9940 around the requirement on having
>> to implement all endpoints under a particular capability.
>> >
>> > For servers that only partially implement endpoints under a capability
>> the spec requires the server to throw a 501 Not Implemented. This was
>> suggested by Jack and it seems reasonable to do that.
>> >
>> > Regarding the inclusion of table-spec / view-spec as a capability: I
>> think this might make sense for the next iteration of the REST spec but as
>> I mentioned earlier I don't see any clear benefit for the current REST spec
>> as the client wouldn't do anything with that information.
>> > If there is a clear benefit of having this, then this can still be
>> added later to the current REST spec but I believe we should rather have a
>> few well-defined and actionable capabilities rather than too many.
>> >
>> > Eduard
>> >
>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu 
>> wrote:
>> >>>
>> >>> Spec is an interesting topic we did not discuss. Robert, how do you
>> envision this to be used?
>> >>> In my mind, if a new table format v3 is launched, there are 2
>> approaches we can go with, taking CreateTable as an example:
>> >>> (1) increment the related operation version, which means that POST
>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>> tables in the v3 version.
>> >>> (2) update the existing table metadata model to support both v2 and
>> v3 fields, and the server enforces the payload differently based on the
>> TableMetadata.format-version field. If the server does not support v3, it
>> can return unsupported at that time.
>> >>> Either way we go, the table-spec version does not need to be a
>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>> a new endpoint compared to (2).
>> >>> Do you see another way to do this leveraging the table-spec version?
>> >>
>> >>
>> >> 2 is cleaner but maybe inconsistent with current behavior, since
>> /v1/tables operation supports both v1 and v3. We should only go to 2 only
>> when we have incompatible fields/break changes according to discussion.
>> >>
>> >> Generally I agree with adding table-spec into capabilities. For
>> example, we can expose this to user in api so that user could choose a
>> supported table format version without throwing exception.
>> >>
>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:
>> >>>
>> >>> Spec is an interesting topic we did not discuss. Robert, how do you
>> envision this to be used?
>> >>>
>> >>> In my mind, if a new table format v3 is launched, there are 2
>> approaches we can go

Re: [DISCUSS] Describing REST Server capabilities

2024-07-09 Thread Eduard Tudenhöfner
Hey everyone,

I watched the catalog sync recording today and updated the PR
 to remove fine-grained
capabilities like *register-table / table-metrics*.

The current capabilities (with versioning information) in the PR are:

   - tables
   - views
   - remote-signing
   - vended-credentials
   - multi-table-commit

For servers that only *partially* implement endpoints under a capability
the spec requires the server to throw a *501 Not Implemented*. I've also
added the 501 error to the response of the respective endpoints but worth
mentioning that *HEAD* / *GET *requests must not return a 501
 (this
implies that the server impl would e.g. return a *404* in such a case).


Regards
Eduard

On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré  wrote:

> Hi Eduard,
>
> It makes sense to return 501 for servers which don't implement all
> endpoints. It means that the server will at least have to implement
> empty endpoints if needed (that makes sense to me).
>
> I think we should focus on only "identified capabilities". I think
> that I proposed before that the capabilities can be
> overridden/provided by server implementation. Else, I'm afraid we
> won't be flexible enough or always behind the implementation (if an
> implementation wants to add "my-foo-cap").
>
> Regards
> JB
>
> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>  wrote:
> >
> > I have clarified the wording in #9940 around the requirement on having
> to implement all endpoints under a particular capability.
> >
> > For servers that only partially implement endpoints under a capability
> the spec requires the server to throw a 501 Not Implemented. This was
> suggested by Jack and it seems reasonable to do that.
> >
> > Regarding the inclusion of table-spec / view-spec as a capability: I
> think this might make sense for the next iteration of the REST spec but as
> I mentioned earlier I don't see any clear benefit for the current REST spec
> as the client wouldn't do anything with that information.
> > If there is a clear benefit of having this, then this can still be added
> later to the current REST spec but I believe we should rather have a few
> well-defined and actionable capabilities rather than too many.
> >
> > Eduard
> >
> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu 
> wrote:
> >>>
> >>> Spec is an interesting topic we did not discuss. Robert, how do you
> envision this to be used?
> >>> In my mind, if a new table format v3 is launched, there are 2
> approaches we can go with, taking CreateTable as an example:
> >>> (1) increment the related operation version, which means that POST
> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
> tables in the v3 version.
> >>> (2) update the existing table metadata model to support both v2 and v3
> fields, and the server enforces the payload differently based on the
> TableMetadata.format-version field. If the server does not support v3, it
> can return unsupported at that time.
> >>> Either way we go, the table-spec version does not need to be a
> capability. (1) seems to be cleaner, but has some overhead in provisioning
> a new endpoint compared to (2).
> >>> Do you see another way to do this leveraging the table-spec version?
> >>
> >>
> >> 2 is cleaner but maybe inconsistent with current behavior, since
> /v1/tables operation supports both v1 and v3. We should only go to 2 only
> when we have incompatible fields/break changes according to discussion.
> >>
> >> Generally I agree with adding table-spec into capabilities. For
> example, we can expose this to user in api so that user could choose a
> supported table format version without throwing exception.
> >>
> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:
> >>>
> >>> Spec is an interesting topic we did not discuss. Robert, how do you
> envision this to be used?
> >>>
> >>> In my mind, if a new table format v3 is launched, there are 2
> approaches we can go with, taking CreateTable as an example:
> >>>
> >>> (1) increment the related operation version, which means that POST
> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
> tables in the v3 version.
> >>>
> >>> (2) update the existing table metadata model to support both v2 and v3
> fields, and the server enforces the payload differently based on the
> TableMetadata.format-version field. If the server does not support v3, it
> can return unsupported at that time.
> >>>
> >>> Either way we go, the table-spec version does not need to be a
> capability. (1) seems to be cleaner, but has some overhead in provisioning
> a new endpoint compared to (2).
> >>>
> >>> Do you see another way to do this leveraging the table-spec version?
> >>>
> >>> -Jack
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>  wrote:
> 
> 
>  I couldn't make it to the catalog sync meeting yesterday but I
> watched the recordin

Re: [DISCUSS] Describing REST Server capabilities

2024-07-04 Thread Jean-Baptiste Onofré
Hi Eduard,

It makes sense to return 501 for servers which don't implement all
endpoints. It means that the server will at least have to implement
empty endpoints if needed (that makes sense to me).

I think we should focus on only "identified capabilities". I think
that I proposed before that the capabilities can be
overridden/provided by server implementation. Else, I'm afraid we
won't be flexible enough or always behind the implementation (if an
implementation wants to add "my-foo-cap").

Regards
JB

On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
 wrote:
>
> I have clarified the wording in #9940 around the requirement on having to 
> implement all endpoints under a particular capability.
>
> For servers that only partially implement endpoints under a capability the 
> spec requires the server to throw a 501 Not Implemented. This was suggested 
> by Jack and it seems reasonable to do that.
>
> Regarding the inclusion of table-spec / view-spec as a capability: I think 
> this might make sense for the next iteration of the REST spec but as I 
> mentioned earlier I don't see any clear benefit for the current REST spec as 
> the client wouldn't do anything with that information.
> If there is a clear benefit of having this, then this can still be added 
> later to the current REST spec but I believe we should rather have a few 
> well-defined and actionable capabilities rather than too many.
>
> Eduard
>
> On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu  wrote:
>>>
>>> Spec is an interesting topic we did not discuss. Robert, how do you 
>>> envision this to be used?
>>> In my mind, if a new table format v3 is launched, there are 2 approaches we 
>>> can go with, taking CreateTable as an example:
>>> (1) increment the related operation version, which means that POST 
>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating 
>>> tables in the v3 version.
>>> (2) update the existing table metadata model to support both v2 and v3 
>>> fields, and the server enforces the payload differently based on the 
>>> TableMetadata.format-version field. If the server does not support v3, it 
>>> can return unsupported at that time.
>>> Either way we go, the table-spec version does not need to be a capability. 
>>> (1) seems to be cleaner, but has some overhead in provisioning a new 
>>> endpoint compared to (2).
>>> Do you see another way to do this leveraging the table-spec version?
>>
>>
>> 2 is cleaner but maybe inconsistent with current behavior, since /v1/tables 
>> operation supports both v1 and v3. We should only go to 2 only when we have 
>> incompatible fields/break changes according to discussion.
>>
>> Generally I agree with adding table-spec into capabilities. For example, we 
>> can expose this to user in api so that user could choose a supported table 
>> format version without throwing exception.
>>
>> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:
>>>
>>> Spec is an interesting topic we did not discuss. Robert, how do you 
>>> envision this to be used?
>>>
>>> In my mind, if a new table format v3 is launched, there are 2 approaches we 
>>> can go with, taking CreateTable as an example:
>>>
>>> (1) increment the related operation version, which means that POST 
>>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating 
>>> tables in the v3 version.
>>>
>>> (2) update the existing table metadata model to support both v2 and v3 
>>> fields, and the server enforces the payload differently based on the 
>>> TableMetadata.format-version field. If the server does not support v3, it 
>>> can return unsupported at that time.
>>>
>>> Either way we go, the table-spec version does not need to be a capability. 
>>> (1) seems to be cleaner, but has some overhead in provisioning a new 
>>> endpoint compared to (2).
>>>
>>> Do you see another way to do this leveraging the table-spec version?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner 
>>>  wrote:


 I couldn't make it to the catalog sync meeting yesterday but I watched the 
 recording today (thanks for providing that).

> The missing piece is how (new, capabilities-aware) clients handle the 
> case when a service does _not_ return the capabilities field (absent). My 
> proposal would be that a client should in this case assume that all 
> _currently_ existing capabilities are supported.
>
> - tables: [1]
> - views: [1]
> - remote-signing: [1]
> - multi-table-commit: [1]
> - register-table: [1]
> - table-metrics: [1]
> - table-spec: [1,2]
> - view-spec: [1,2]
>
>
 The one thing I would like to add here is that the current PR uses the 
 tables capability (as version 1) as the default when a server doesn't 
 return capabilities but it might be also ok to include views (as version 
 1) because the current client impl has some code to deal with errors in 
 case endpoints don't exist.

 Unless we

Re: [DISCUSS] Describing REST Server capabilities

2024-07-04 Thread Robert Stupp


On 04.07.24 10:32, Eduard Tudenhöfner wrote:
For servers that only *partially* implement endpoints under a 
capability the spec requires the server to throw a *501 Not 
Implemented*. This was suggested by Jack and it seems reasonable to do 
that.


SGTM


Regarding the inclusion of table-spec / *view-spec *as a capability: I 
think this might make sense for the next iteration of the REST spec 
but as I mentioned earlier I don't see any clear benefit for the 
current REST spec as the client wouldn't do anything with that 
information.


It's IMO better to add those now. Omitting those will let (future) 
clients have to guess the "right values" when talking to older REST 
services. It's not much effort to add those now, but can cause a lot of 
pain in the future.




On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu  wrote:

Spec is an interesting topic we did not discuss. Robert, how
do you envision this to be used?
In my mind, if a new table format v3 is launched, there are 2
approaches we can go with, taking CreateTable as an example:
(1) increment the related operation version, which means that
POST /v2/{prefix}/namespaces/{ns}/tables will be created and
allow creating tables in the v3 version.

I think this is mixing REST endpoint versioning with payload/spec 
versioning, which are very different things IMO




(2) update the existing table metadata model to support both
v2 and v3 fields, and the server enforces the payload
differently based on the TableMetadata.format-version field.
If the server does not support v3, it can return unsupported
at that time.
Either way we go, the table-spec version does not need to be a
capability. (1) seems to be cleaner, but has some overhead in
provisioning a new endpoint compared to (2).
Do you see another way to do this leveraging the table-spec
version?


2 is cleaner but maybe inconsistent with current behavior, since
/v1/tables operation supports both v1 and v3. We should only go to
2 only when we have incompatible fields/break changes according to
discussion.

Generally I agree with adding table-spec into capabilities. For
example, we can expose this to user in api so that user could
choose a supported table format version without throwing exception.

On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:

Spec is an interesting topic we did not discuss. Robert, how
do you envision this to be used?

In my mind, if a new table format v3 is launched, there are 2
approaches we can go with, taking CreateTable as an example:

(1) increment the related operation version, which means that
POST /v2/{prefix}/namespaces/{ns}/tables will be created and
allow creating tables in the v3 version.

(2) update the existing table metadata model to support both
v2 and v3 fields, and the server enforces the payload
differently based on the TableMetadata.format-version field.
If the server does not support v3, it can return unsupported
at that time.

Either way we go, the table-spec version does not need to be a
capability. (1) seems to be cleaner, but has some overhead in
provisioning a new endpoint compared to (2).

Do you see another way to do this leveraging the table-spec
version?

-Jack





On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
 wrote:


I couldn't make it to the catalog sync meeting yesterday
but I watched the recording today (thanks for providing that).

The missing piece is how (new, capabilities-aware)
clients handle the case when a service does _not_
return the capabilities field (absent). My proposal
would be that a client should in this case assume that
all _currently_ existing capabilities are supported.

- tables: [1]
- views: [1]
- remote-signing: [1]
- multi-table-commit: [1]
- register-table: [1]
- table-metrics: [1]
- table-spec: [1,2]
- view-spec: [1,2]




The one thing I would like to add here is that the current
PR uses the *tables* capability (as version 1) as the
default when a server doesn't return *capabilities *but it
might be also ok to include *views *(as version 1) because
the current client impl has /some/ code to deal with
errors in case endpoints don't exist.

Unless we agree that the currently existing functionality
in the REST spec is the *default* behavior to be assumed
for older server, I'm not sure about including
*remote-signing / multi-table-commit / register-table /

Re: [DISCUSS] Describing REST Server capabilities

2024-07-04 Thread Eduard Tudenhöfner
I have clarified the wording in #9940
 around the requirement on
having to implement all endpoints under a particular capability.

For servers that only *partially* implement endpoints under a capability
the spec requires the server to throw a *501 Not Implemented*. This was
suggested by Jack and it seems reasonable to do that.

Regarding the inclusion of table-spec / *view-spec *as a capability: I
think this might make sense for the next iteration of the REST spec but as
I mentioned earlier I don't see any clear benefit for the current REST spec
as the client wouldn't do anything with that information.
If there is a clear benefit of having this, then this can still be added
later to the current REST spec but I believe we should rather have a few
well-defined and actionable capabilities rather than too many.

Eduard

On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu  wrote:

> Spec is an interesting topic we did not discuss. Robert, how do you
>> envision this to be used?
>> In my mind, if a new table format v3 is launched, there are 2 approaches
>> we can go with, taking CreateTable as an example:
>> (1) increment the related operation version, which means that POST
>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>> tables in the v3 version.
>> (2) update the existing table metadata model to support both v2 and v3
>> fields, and the server enforces the payload differently based on the
>> TableMetadata.format-version field. If the server does not support v3, it
>> can return unsupported at that time.
>> Either way we go, the table-spec version does not need to be a
>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>> a new endpoint compared to (2).
>> Do you see another way to do this leveraging the table-spec version?
>
>
> 2 is cleaner but maybe inconsistent with current behavior, since
> /v1/tables operation supports both v1 and v3. We should only go to 2 only
> when we have incompatible fields/break changes according to discussion.
>
> Generally I agree with adding table-spec into capabilities. For example,
> we can expose this to user in api so that user could choose a supported
> table format version without throwing exception.
>
> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:
>
>> Spec is an interesting topic we did not discuss. Robert, how do you
>> envision this to be used?
>>
>> In my mind, if a new table format v3 is launched, there are 2 approaches
>> we can go with, taking CreateTable as an example:
>>
>> (1) increment the related operation version, which means that POST
>> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
>> tables in the v3 version.
>>
>> (2) update the existing table metadata model to support both v2 and v3
>> fields, and the server enforces the payload differently based on the
>> TableMetadata.format-version field. If the server does not support v3, it
>> can return unsupported at that time.
>>
>> Either way we go, the table-spec version does not need to be a
>> capability. (1) seems to be cleaner, but has some overhead in provisioning
>> a new endpoint compared to (2).
>>
>> Do you see another way to do this leveraging the table-spec version?
>>
>> -Jack
>>
>>
>>
>>
>>
>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>  wrote:
>>
>>>
>>> I couldn't make it to the catalog sync meeting yesterday but I watched
>>> the recording today (thanks for providing that).
>>>
>>> The missing piece is how (new, capabilities-aware) clients handle the
 case when a service does _not_ return the capabilities field (absent). My
 proposal would be that a client should in this case assume that all
 _currently_ existing capabilities are supported.

 - tables: [1]
 - views: [1]
 - remote-signing: [1]
 - multi-table-commit: [1]
 - register-table: [1]
 - table-metrics: [1]
 - table-spec: [1,2]
 - view-spec: [1,2]


 The one thing I would like to add here is that the current PR uses the
>>> *tables* capability (as version 1) as the default when a server doesn't
>>> return *capabilities *but it might be also ok to include *views *(as
>>> version 1) because the current client impl has *some* code to deal with
>>> errors in case endpoints don't exist.
>>>
>>> Unless we agree that the currently existing functionality in the REST
>>> spec is the *default* behavior to be assumed for older server, I'm not
>>> sure about including *remote-signing / multi-table-commit /
>>> register-table / table-metrics* as it has been indicated in earlier
>>> comments on the PR/ML that not every REST server supports these.
>>>
>>> That being said, we should discuss whether we want the *default*
>>> behavior (when an older server doesn't send back *capabilities*) to be
>>> a) *tables* (version 1) only
>>> b) the currently existing functionality as defined in the REST spec (as
>>> version 1)
>>>
>>>
>>> On another note: Including *table-spec / vie

Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Renjie Liu
>
> Spec is an interesting topic we did not discuss. Robert, how do you
> envision this to be used?
> In my mind, if a new table format v3 is launched, there are 2 approaches
> we can go with, taking CreateTable as an example:
> (1) increment the related operation version, which means that POST
> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
> tables in the v3 version.
> (2) update the existing table metadata model to support both v2 and v3
> fields, and the server enforces the payload differently based on the
> TableMetadata.format-version field. If the server does not support v3, it
> can return unsupported at that time.
> Either way we go, the table-spec version does not need to be a capability.
> (1) seems to be cleaner, but has some overhead in provisioning a new
> endpoint compared to (2).
> Do you see another way to do this leveraging the table-spec version?


2 is cleaner but maybe inconsistent with current behavior, since /v1/tables
operation supports both v1 and v3. We should only go to 2 only when we have
incompatible fields/break changes according to discussion.

Generally I agree with adding table-spec into capabilities. For example, we
can expose this to user in api so that user could choose a supported table
format version without throwing exception.

On Wed, Jul 3, 2024 at 12:18 AM Jack Ye  wrote:

> Spec is an interesting topic we did not discuss. Robert, how do you
> envision this to be used?
>
> In my mind, if a new table format v3 is launched, there are 2 approaches
> we can go with, taking CreateTable as an example:
>
> (1) increment the related operation version, which means that POST
> /v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
> tables in the v3 version.
>
> (2) update the existing table metadata model to support both v2 and v3
> fields, and the server enforces the payload differently based on the
> TableMetadata.format-version field. If the server does not support v3, it
> can return unsupported at that time.
>
> Either way we go, the table-spec version does not need to be a capability.
> (1) seems to be cleaner, but has some overhead in provisioning a new
> endpoint compared to (2).
>
> Do you see another way to do this leveraging the table-spec version?
>
> -Jack
>
>
>
>
>
> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>  wrote:
>
>>
>> I couldn't make it to the catalog sync meeting yesterday but I watched
>> the recording today (thanks for providing that).
>>
>> The missing piece is how (new, capabilities-aware) clients handle the
>>> case when a service does _not_ return the capabilities field (absent). My
>>> proposal would be that a client should in this case assume that all
>>> _currently_ existing capabilities are supported.
>>>
>>> - tables: [1]
>>> - views: [1]
>>> - remote-signing: [1]
>>> - multi-table-commit: [1]
>>> - register-table: [1]
>>> - table-metrics: [1]
>>> - table-spec: [1,2]
>>> - view-spec: [1,2]
>>>
>>>
>>> The one thing I would like to add here is that the current PR uses the
>> *tables* capability (as version 1) as the default when a server doesn't
>> return *capabilities *but it might be also ok to include *views *(as
>> version 1) because the current client impl has *some* code to deal with
>> errors in case endpoints don't exist.
>>
>> Unless we agree that the currently existing functionality in the REST
>> spec is the *default* behavior to be assumed for older server, I'm not
>> sure about including *remote-signing / multi-table-commit /
>> register-table / table-metrics* as it has been indicated in earlier
>> comments on the PR/ML that not every REST server supports these.
>>
>> That being said, we should discuss whether we want the *default*
>> behavior (when an older server doesn't send back *capabilities*) to be
>> a) *tables* (version 1) only
>> b) the currently existing functionality as defined in the REST spec (as
>> version 1)
>>
>>
>> On another note: Including *table-spec / view-spec* seems to be more
>> informative in its nature as I don't think a client would act differently
>> right now when seeing these.
>>
>> Thanks
>> Eduard
>>
>>


Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Robert Stupp
The opposite is, as I expressed, that a client would _not_ use 
functionality that worked before. In other words: the 
capabilities-change would become a _breaking_ change - not backwards 
compatible.



On 02.07.24 15:02, Eduard Tudenhöfner wrote:


I couldn't make it to the catalog sync meeting yesterday but I watched 
the recording today (thanks for providing that).


The missing piece is how (new, capabilities-aware) clients handle
the case when a service does _not_ return the capabilities field
(absent). My proposal would be that a client should in this case
assume that all _currently_ existing capabilities are supported.

- tables: [1]
- views: [1]
- remote-signing: [1]
- multi-table-commit: [1]
- register-table: [1]
- table-metrics: [1]
- table-spec: [1,2]
- view-spec: [1,2]



The one thing I would like to add here is that the current PR uses the 
*tables* capability (as version 1) as the default when a server 
doesn't return *capabilities *but it might be also ok to include 
*views *(as version 1) because the current client impl has /some/ code 
to deal with errors in case endpoints don't exist.


Unless we agree that the currently existing functionality in the REST 
spec is the *default* behavior to be assumed for older server, I'm not 
sure about including *remote-signing / multi-table-commit / 
register-table / table-metrics* as it has been indicated in earlier 
comments on the PR/ML that not every REST server supports these.


That being said, we should discuss whether we want the *default* 
behavior (when an older server doesn't send back *capabilities*) to be

a) *tables* (version 1) only
b) the currently existing functionality as defined in the REST spec 
(as version 1)



On another note: Including *table-spec / view-spec* seems to be more 
informative in its nature as I don't think a client would act 
differently right now when seeing these.


Thanks
Eduard


--
Robert Stupp
@snazy


Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Robert Stupp
I think that we really need to let a service express which 
table/view/udf/etc-spec versions it supports. Otherwise, as you noted, a 
client would only receive some error. If a "table-spec v4" aware client 
knows that the service just knows about "table-spec v2", it can just use 
that version - preventing an error that a user has to deal with. Users 
will just come to the mailing-list or Slack and say: "here's an 
exception" - we can prevent this.



On 02.07.24 16:59, Jack Ye wrote:
Spec is an interesting topic we did not discuss. Robert, how do you 
envision this to be used?


In my mind, if a new table format v3 is launched, there are 2 
approaches we can go with, taking CreateTable as an example:


(1) increment the related operation version, which means that POST 
/v2/{prefix}/namespaces/{ns}/tables will be created and allow creating 
tables in the v3 version.


(2) update the existing table metadata model to support both v2 and v3 
fields, and the server enforces the payload differently based on the 
TableMetadata.format-version field. If the server does not support v3, 
it can return unsupported at that time.


Either way we go, the table-spec version does not need to be a 
capability. (1) seems to be cleaner, but has some overhead in 
provisioning a new endpoint compared to (2).


Do you see another way to do this leveraging the table-spec version?

-Jack





On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner 
 wrote:



I couldn't make it to the catalog sync meeting yesterday but I
watched the recording today (thanks for providing that).

The missing piece is how (new, capabilities-aware) clients
handle the case when a service does _not_ return the
capabilities field (absent). My proposal would be that a
client should in this case assume that all _currently_
existing capabilities are supported.

- tables: [1]
- views: [1]
- remote-signing: [1]
- multi-table-commit: [1]
- register-table: [1]
- table-metrics: [1]
- table-spec: [1,2]
- view-spec: [1,2]




The one thing I would like to add here is that the current PR uses
the *tables* capability (as version 1) as the default when a
server doesn't return *capabilities *but it might be also ok to
include *views *(as version 1) because the current client impl has
/some/ code to deal with errors in case endpoints don't exist.

Unless we agree that the currently existing functionality in the
REST spec is the *default* behavior to be assumed for older
server, I'm not sure about including *remote-signing /
multi-table-commit / register-table / table-metrics* as it has
been indicated in earlier comments on the PR/ML that not every
REST server supports these.

That being said, we should discuss whether we want the *default*
behavior (when an older server doesn't send back *capabilities*) to be
a) *tables* (version 1) only
b) the currently existing functionality as defined in the REST
spec (as version 1)


On another note: Including *table-spec / view-spec* seems to be
more informative in its nature as I don't think a client would act
differently right now when seeing these.

Thanks
Eduard


--
Robert Stupp
@snazy


Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Jack Ye
Spec is an interesting topic we did not discuss. Robert, how do you
envision this to be used?

In my mind, if a new table format v3 is launched, there are 2 approaches we
can go with, taking CreateTable as an example:

(1) increment the related operation version, which means that POST
/v2/{prefix}/namespaces/{ns}/tables will be created and allow creating
tables in the v3 version.

(2) update the existing table metadata model to support both v2 and v3
fields, and the server enforces the payload differently based on the
TableMetadata.format-version field. If the server does not support v3, it
can return unsupported at that time.

Either way we go, the table-spec version does not need to be a capability.
(1) seems to be cleaner, but has some overhead in provisioning a new
endpoint compared to (2).

Do you see another way to do this leveraging the table-spec version?

-Jack





On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
 wrote:

>
> I couldn't make it to the catalog sync meeting yesterday but I watched the
> recording today (thanks for providing that).
>
> The missing piece is how (new, capabilities-aware) clients handle the case
>> when a service does _not_ return the capabilities field (absent). My
>> proposal would be that a client should in this case assume that all
>> _currently_ existing capabilities are supported.
>>
>> - tables: [1]
>> - views: [1]
>> - remote-signing: [1]
>> - multi-table-commit: [1]
>> - register-table: [1]
>> - table-metrics: [1]
>> - table-spec: [1,2]
>> - view-spec: [1,2]
>>
>>
>> The one thing I would like to add here is that the current PR uses the
> *tables* capability (as version 1) as the default when a server doesn't
> return *capabilities *but it might be also ok to include *views *(as
> version 1) because the current client impl has *some* code to deal with
> errors in case endpoints don't exist.
>
> Unless we agree that the currently existing functionality in the REST spec
> is the *default* behavior to be assumed for older server, I'm not sure
> about including *remote-signing / multi-table-commit / register-table /
> table-metrics* as it has been indicated in earlier comments on the PR/ML
> that not every REST server supports these.
>
> That being said, we should discuss whether we want the *default* behavior
> (when an older server doesn't send back *capabilities*) to be
> a) *tables* (version 1) only
> b) the currently existing functionality as defined in the REST spec (as
> version 1)
>
>
> On another note: Including *table-spec / view-spec* seems to be more
> informative in its nature as I don't think a client would act differently
> right now when seeing these.
>
> Thanks
> Eduard
>
>


Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Eduard Tudenhöfner
I couldn't make it to the catalog sync meeting yesterday but I watched the
recording today (thanks for providing that).

The missing piece is how (new, capabilities-aware) clients handle the case
> when a service does _not_ return the capabilities field (absent). My
> proposal would be that a client should in this case assume that all
> _currently_ existing capabilities are supported.
>
> - tables: [1]
> - views: [1]
> - remote-signing: [1]
> - multi-table-commit: [1]
> - register-table: [1]
> - table-metrics: [1]
> - table-spec: [1,2]
> - view-spec: [1,2]
>
>
> The one thing I would like to add here is that the current PR uses the
*tables* capability (as version 1) as the default when a server doesn't
return *capabilities *but it might be also ok to include *views *(as
version 1) because the current client impl has *some* code to deal with
errors in case endpoints don't exist.

Unless we agree that the currently existing functionality in the REST spec
is the *default* behavior to be assumed for older server, I'm not sure
about including *remote-signing / multi-table-commit / register-table /
table-metrics* as it has been indicated in earlier comments on the PR/ML
that not every REST server supports these.

That being said, we should discuss whether we want the *default* behavior
(when an older server doesn't send back *capabilities*) to be
a) *tables* (version 1) only
b) the currently existing functionality as defined in the REST spec (as
version 1)


On another note: Including *table-spec / view-spec* seems to be more
informative in its nature as I don't think a client would act differently
right now when seeing these.

Thanks
Eduard


Re: [DISCUSS] Describing REST Server capabilities

2024-07-02 Thread Robert Stupp
We also talked briefly about whether clients/server must fail on unknown 
attributes/values or not. Please correct me if I'm wrong, but I think 
that the consensus in the meeting yesterday was clients must fail.


(more comments inline below)


On 02.07.24 04:44, Jack Ye wrote:
Each operation will be versioned separately, starting at v1 which is 
the current version on the spec for all operations. When there are 
significant changes to the request response model, or highly backwards 
incompatible changes, the version of that operation will be 
incremented and it will have a new URL route at /v{nextVersion}


Yup, just adding more details: "when there are significant, 
non-backwards compatible changes to the endpoint parameters and/or 
request/response schema/model"



An IRC server must implement the getConfig operation. This operation 
expresses a set of logical capabilities. Each capability could cover 
certain behaviors in a set of operation versions.


Yes, that allows the service to announce supported "behavior" and 
"table-spec" / "view-spec" / "udf-spec" versions.


The latter (spec versions) should IMO also be expressed in the 
capabilities-PR.



The definition of new capability is based on new features proposed in 
IRC, usually a new backwards incompatible feature results in a new 
capability defined to control client behavior.


I'd just formally add that clients must not fail when they encounter a 
capability name or version they don't know about.



The server could implement a capability partially, and in that case 
the server needs to document the behavior.


I think that's fine and leaves an "escape path" for example for 
read-only implementations. The wording in the capabilities PR should 
then be adjusted (replace "must implement all endpoints" with "should ...").



Each capability is also versioned. Therefore the response of getConfig 
should be something like a map where key is the capability name, and 
value is a set of supported versions of that capability. The 
capability version is incremented when the same logical capability is 
updated and is backwards incompatible.


Not sure whether a version "3" needs to be backwards compatible with 
version "2". The service can announce both ("2" and "3") to express this.



The client looks at each capability and the supported versions, and 
chooses the highest capability version it can use by default, unless 
overwritten by some client side config.


Not sure whether we should demand the "override" functionality - I'd 
leave that up to the implementation.



The missing piece is how (new, capabilities-aware) clients handle the 
case when a service does _not_ return the capabilities field (absent). 
My proposal would be that a client should in this case assume that all 
_currently_ existing capabilities are supported.


- tables: [1]
- views: [1]
- remote-signing: [1]
- multi-table-commit: [1]
- register-table: [1]
- table-metrics: [1]
- table-spec: [1,2]
- view-spec: [1,2]



What do we think about this?

-Jack








On Thu, Jun 27, 2024 at 12:53 PM Péter Váry 
 wrote:


Ignore my previous email - fat thumbed...

Here is the full version:

I think most of us agree that the server should announce its exact
capabilities, so the clients don't need to guess. The debate is
around how granular this definition should be.

If we do it on service level, then the client needs to examine
each and every service it is using whether it has the specific
capability. While this is more flexible, I think this will become
another property file on the client side listing all the services
and versions which will be hard to understand/work with.

Compatibility effects how often we need to define new versions, so
we still have to touch the topic a bit:
- If we define that the clients should be able to ignore unknown
properties in the response, then we could decrease the cost of
version handling on the service side, as the server doesn't need
to know the exact client version. We push this cost on the
client side. It still doesn't effect the number of versions as the
new capabilities have to be advertised.
- We have to be careful with services designed to be forward
compatible. They could hide behavioral changes while technically
keep the compatibility. Consider a new key in the map which could
cause a commit fail. We should prefer to encode as much of the
capabilities in the specification as possible, so we will have a
new capability.

Based on this, whatever we do, we should expect high number of
changes to the api specification. For me this means that this
should be human readable and easy to understand. I think grouping
fits the description better.

Thanks, Peter

On Thu, Jun 27, 2024, 21:12 Péter Váry
 wrote:

I think most of us agree that the server should announce its
exact capabilities, so the clients don't need to gu

Re: [DISCUSS] Describing REST Server capabilities

2024-07-01 Thread Jack Ye
Let me try to summarize what my understanding is after the sync, and we can
see if we agree:

Each operation will be versioned separately, starting at v1 which is the
current version on the spec for all operations. When there are significant
changes to the request response model, or highly backwards incompatible
changes, the version of that operation will be incremented and it will have
a new URL route at /v{nextVersion}

An IRC server must implement the getConfig operation. This operation
expresses a set of logical capabilities. Each capability could cover
certain behaviors in a set of operation versions.

The definition of new capability is based on new features proposed in IRC,
usually a new backwards incompatible feature results in a new capability
defined to control client behavior.

The server could implement a capability partially, and in that case the
server needs to document the behavior.

Each capability is also versioned. Therefore the response of getConfig
should be something like a map where key is the capability name, and value
is a set of supported versions of that capability. The capability version
is incremented when the same logical capability is updated and is backwards
incompatible.

The client looks at each capability and the supported versions, and chooses
the highest capability version it can use by default, unless overwritten by
some client side config.

What do we think about this?

-Jack








On Thu, Jun 27, 2024 at 12:53 PM Péter Váry 
wrote:

> Ignore my previous email - fat thumbed...
>
> Here is the full version:
>
> I think most of us agree that the server should announce its exact
> capabilities, so the clients don't need to guess. The debate is around how
> granular this definition should be.
>
> If we do it on service level, then the client needs to examine each and
> every service it is using whether it has the specific capability. While
> this is more flexible, I think this will become another property file on
> the client side listing all the services and versions which will be hard to
> understand/work with.
>
> Compatibility effects how often we need to define new versions, so we
> still have to touch the topic a bit:
> - If we define that the clients should be able to ignore unknown
> properties in the response, then we could decrease the cost of version
> handling on the service side, as the server doesn't need to know the exact
> client version. We push this cost on the client side. It still doesn't
> effect the number of versions as the new capabilities have to be advertised.
> - We have to be careful with services designed to be forward compatible.
> They could hide behavioral changes while technically keep the
> compatibility. Consider a new key in the map which could cause a commit
> fail. We should prefer to encode as much of the capabilities in the
> specification as possible, so we will have a new capability.
>
> Based on this, whatever we do, we should expect high number of changes to
> the api specification. For me this means that this should be human readable
> and easy to understand. I think grouping fits the description better.
>
> Thanks, Peter
>
> On Thu, Jun 27, 2024, 21:12 Péter Váry 
> wrote:
>
>> I think most of us agree that the server should announce its exact
>> capabilities, so the clients don't need to guess. The debate is around how
>> granular this definition should be.
>>
>> If we do it on service level, then the client needs to examine each and
>> every service it is using whether it has the specific capability. While
>> this is more flexible, I think this will become another property file on
>> the client side listing
>>
>> If we define that the clients should be able to ignore unknown properties
>> in the response, then we could decrease the cost of version handling on the
>> service side, as the server doesn't need to know the exact client version.
>> We push this cost on the client side.
>>
>> On Thu, Jun 27, 2024, 19:24 Robert Stupp  wrote:
>>
>>> IMO that would be a list of "capability" to "set of versions" tuples.
>>> The reason to have a "set of (integer) version" is that you have to plan
>>> for the future, now.
>>>
>>> I also think we do need "logical" capabilities to express for example
>>> which table/view/etc specs a service supports and to express which request
>>> /response schema (query params + request/response object params) a service
>>> supports. Not doing that will put clients in the spot of guessing whether a
>>> service supports a specific feature of an existing functionality, something
>>> that's been added after the functionality has been introduced.
>>>
>>> Whether we need a capability for each and everything ... I suspect that
>>> depends on the actual feature/functionality/change. Many things can be
>>> designed in a backwards/forwards compatible way and don't deserve a (REST)
>>> spec/capability version bump.
>>> Repeating my point that the service must really fail when it encounters
>>> an unknown attribut

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Péter Váry
Ignore my previous email - fat thumbed...

Here is the full version:

I think most of us agree that the server should announce its exact
capabilities, so the clients don't need to guess. The debate is around how
granular this definition should be.

If we do it on service level, then the client needs to examine each and
every service it is using whether it has the specific capability. While
this is more flexible, I think this will become another property file on
the client side listing all the services and versions which will be hard to
understand/work with.

Compatibility effects how often we need to define new versions, so we still
have to touch the topic a bit:
- If we define that the clients should be able to ignore unknown properties
in the response, then we could decrease the cost of version handling on the
service side, as the server doesn't need to know the exact client version.
We push this cost on the client side. It still doesn't effect the number of
versions as the new capabilities have to be advertised.
- We have to be careful with services designed to be forward compatible.
They could hide behavioral changes while technically keep the
compatibility. Consider a new key in the map which could cause a commit
fail. We should prefer to encode as much of the capabilities in the
specification as possible, so we will have a new capability.

Based on this, whatever we do, we should expect high number of changes to
the api specification. For me this means that this should be human readable
and easy to understand. I think grouping fits the description better.

Thanks, Peter

On Thu, Jun 27, 2024, 21:12 Péter Váry  wrote:

> I think most of us agree that the server should announce its exact
> capabilities, so the clients don't need to guess. The debate is around how
> granular this definition should be.
>
> If we do it on service level, then the client needs to examine each and
> every service it is using whether it has the specific capability. While
> this is more flexible, I think this will become another property file on
> the client side listing
>
> If we define that the clients should be able to ignore unknown properties
> in the response, then we could decrease the cost of version handling on the
> service side, as the server doesn't need to know the exact client version.
> We push this cost on the client side.
>
> On Thu, Jun 27, 2024, 19:24 Robert Stupp  wrote:
>
>> IMO that would be a list of "capability" to "set of versions" tuples. The
>> reason to have a "set of (integer) version" is that you have to plan for
>> the future, now.
>>
>> I also think we do need "logical" capabilities to express for example
>> which table/view/etc specs a service supports and to express which request
>> /response schema (query params + request/response object params) a service
>> supports. Not doing that will put clients in the spot of guessing whether a
>> service supports a specific feature of an existing functionality, something
>> that's been added after the functionality has been introduced.
>>
>> Whether we need a capability for each and everything ... I suspect that
>> depends on the actual feature/functionality/change. Many things can be
>> designed in a backwards/forwards compatible way and don't deserve a (REST)
>> spec/capability version bump.
>> Repeating my point that the service must really fail when it encounters
>> an unknown attribute: Imagine there's a new table-requirement - that has to
>> be reflected using the version of a capability. A service should really not
>> just ignore an unknown capability, otherwise important or data correctness
>> issues will occur.
>>
>> For new versions that are still in development, it's possibly the easiest
>> to not do anything special, not even announce the new version. Whether we
>> indicate some "beta version" or reserve "Integer.MAX_VALUE" doesn't really
>> help anyways - client and server can have a completely different
>> understanding of "that particular" beta version. I.e. if a client or
>> service use "in development" functionality, it's up to them - whether it
>> works or not.
>>
>> On 27.06.24 17:55, Jack Ye wrote:
>>
>> I feel Alex is already tapping into the more complex territory I do not
>> want to go into, because as he says, a "capability" is logical, and it can
>> be a set of overlapping endpoints, small features in some endpoints, etc.
>> We already saw that in the original PR we tried to say "pagination" is a
>> capability, but it is really just a very small feature of an endpoint, and
>> it might also evolve on its own to extend to other endpoints in the future,
>> and maybe one endpoint supports it and one does not...
>>
>> My fear is that we are getting into the business of defining things that
>> are totally unnecessary. It takes our energy to define that and debate the
>> boundary of each capability, but in the end what does it buy us? As Eduard
>> says, you still need to have documentation to explain what "partial
>> capabilities" are 

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Péter Váry
I think most of us agree that the server should announce its exact
capabilities, so the clients don't need to guess. The debate is around how
granular this definition should be.

If we do it on service level, then the client needs to examine each and
every service it is using whether it has the specific capability. While
this is more flexible, I think this will become another property file on
the client side listing

If we define that the clients should be able to ignore unknown properties
in the response, then we could decrease the cost of version handling on the
service side, as the server doesn't need to know the exact client version.
We push this cost on the client side.

On Thu, Jun 27, 2024, 19:24 Robert Stupp  wrote:

> IMO that would be a list of "capability" to "set of versions" tuples. The
> reason to have a "set of (integer) version" is that you have to plan for
> the future, now.
>
> I also think we do need "logical" capabilities to express for example
> which table/view/etc specs a service supports and to express which request
> /response schema (query params + request/response object params) a service
> supports. Not doing that will put clients in the spot of guessing whether a
> service supports a specific feature of an existing functionality, something
> that's been added after the functionality has been introduced.
>
> Whether we need a capability for each and everything ... I suspect that
> depends on the actual feature/functionality/change. Many things can be
> designed in a backwards/forwards compatible way and don't deserve a (REST)
> spec/capability version bump.
> Repeating my point that the service must really fail when it encounters an
> unknown attribute: Imagine there's a new table-requirement - that has to be
> reflected using the version of a capability. A service should really not
> just ignore an unknown capability, otherwise important or data correctness
> issues will occur.
>
> For new versions that are still in development, it's possibly the easiest
> to not do anything special, not even announce the new version. Whether we
> indicate some "beta version" or reserve "Integer.MAX_VALUE" doesn't really
> help anyways - client and server can have a completely different
> understanding of "that particular" beta version. I.e. if a client or
> service use "in development" functionality, it's up to them - whether it
> works or not.
>
> On 27.06.24 17:55, Jack Ye wrote:
>
> I feel Alex is already tapping into the more complex territory I do not
> want to go into, because as he says, a "capability" is logical, and it can
> be a set of overlapping endpoints, small features in some endpoints, etc.
> We already saw that in the original PR we tried to say "pagination" is a
> capability, but it is really just a very small feature of an endpoint, and
> it might also evolve on its own to extend to other endpoints in the future,
> and maybe one endpoint supports it and one does not...
>
> My fear is that we are getting into the business of defining things that
> are totally unnecessary. It takes our energy to define that and debate the
> boundary of each capability, but in the end what does it buy us? As Eduard
> says, you still need to have documentation to explain what "partial
> capabilities" are supported for a catalog, and people are supposed to read
> the documentation to not do the unsupported things.
>
> In the end, the client-server needs to understand exactly (1) what
> endpoints can be invoked, and (2) using what request-response schema, that
> is the key to me. If it means returning a response with 20-30ish hard-coded
> entries, and the client is configured based on that, that seems totally
> reasonable to me.
>
> -Jack
>
>
>
>
>
> On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra 
>  wrote:
>
>> Hi all,
>>
>> So far we've been thinking of capabilities as equivalent to a set of
>> endpoints.
>>
>> That's a rather technical definition. It also brings one important
>> limitation: one endpoint can only be "governed" by one capability.
>>
>> Granted, most capabilities do require implementing specific endpoints.
>> But I wonder if, for the sake of being future-proof, we shouldn't broaden
>> the meaning of that term to embrace *logical* or *behavioral* concepts
>> as well.
>>
>> One example that comes to mind: a REST catalog implementor may choose to
>> implement the transactions-commit endpoint to fully comply with the
>> "tables" capability; but for performance reasons, or simply because it's
>> too complex, they could opt for rejecting multi-table commits (iow, if a
>> CommitTransactionRequest contains one single CommitTableRequest, that's
>> fine, otherwise, the endpoint would return an error). It would be nice to
>> express that as a capability: this way the client knows that it is safe to
>> call the transactions-commit endpoint, but with one CommitTableRequest at a
>> time.
>>
>> Such a capability would not be defined by a specific endpoint, but
>> rather, would influence the behavior

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Robert Stupp
IMO that would be a list of "capability" to "set of versions" tuples. 
The reason to have a "set of (integer) version" is that you have to plan 
for the future, now.


I also think we do need "logical" capabilities to express for example 
which table/view/etc specs a service supports and to express which 
request /response schema (query params + request/response object params) 
a service supports. Not doing that will put clients in the spot of 
guessing whether a service supports a specific feature of an existing 
functionality, something that's been added after the functionality has 
been introduced.


Whether we need a capability for each and everything ... I suspect that 
depends on the actual feature/functionality/change. Many things can be 
designed in a backwards/forwards compatible way and don't deserve a 
(REST) spec/capability version bump.


Repeating my point that the service must really fail when it encounters 
an unknown attribute: Imagine there's a new table-requirement - that has 
to be reflected using the version of a capability. A service should 
really not just ignore an unknown capability, otherwise important or 
data correctness issues will occur.


For new versions that are still in development, it's possibly the 
easiest to not do anything special, not even announce the new version. 
Whether we indicate some "beta version" or reserve "Integer.MAX_VALUE" 
doesn't really help anyways - client and server can have a completely 
different understanding of "that particular" beta version. I.e. if a 
client or service use "in development" functionality, it's up to them - 
whether it works or not.


On 27.06.24 17:55, Jack Ye wrote:
I feel Alex is already tapping into the more complex territory I do 
not want to go into, because as he says, a "capability" is logical, 
and it can be a set of overlapping endpoints, small features in some 
endpoints, etc. We already saw that in the original PR we tried to say 
"pagination" is a capability, but it is really just a very small 
feature of an endpoint, and it might also evolve on its own to extend 
to other endpoints in the future, and maybe one endpoint supports it 
and one does not...


My fear is that we are getting into the business of defining things 
that are totally unnecessary. It takes our energy to define that and 
debate the boundary of each capability, but in the end what does it 
buy us? As Eduard says, you still need to have documentation to 
explain what "partial capabilities" are supported for a catalog, and 
people are supposed to read the documentation to not do the 
unsupported things.


In the end, the client-server needs to understand exactly (1) what 
endpoints can be invoked, and (2) using what request-response schema, 
that is the key to me. If it means returning a response with 20-30ish 
hard-coded entries, and the client is configured based on that, that 
seems totally reasonable to me.


-Jack





On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra 
 wrote:


Hi all,

So far we've been thinking of capabilities as equivalent to a set
of endpoints.

That's a rather technical definition. It also brings one important
limitation: one endpoint can only be "governed" by one capability.

Granted, most capabilities do require implementing specific
endpoints. But I wonder if, for the sake of being future-proof, we
shouldn't broaden the meaning of that term to embrace /logical/ or
/behavioral/ concepts as well.

One example that comes to mind: a REST catalog implementor may
choose to implement the transactions-commit endpoint to fully
comply with the "tables" capability; but for performance reasons,
or simply because it's too complex, they could opt for rejecting
multi-table commits (iow, if a CommitTransactionRequest contains
one single CommitTableRequest, that's fine, otherwise, the
endpoint would return an error). It would be nice to express that
as a capability: this way the client knows that it is safe to call
the transactions-commit endpoint, but with one CommitTableRequest
at a time.

Such a capability would not be defined by a specific endpoint, but
rather, would influence the behavior exhibited by certain endpoints.

Thanks,

Alex

On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré
 wrote:

Hi Jack

I like Robert's proposal. Back to the topics, I think grouping
with
tags is more "flexible" (it was what we included in the REST spec
proposal as well).

Regards
JB

On Wed, Jun 26, 2024 at 6:26 PM Jack Ye 
wrote:
>
> It seems like there are 2 sub-topics here:
> 1. should we group operations with tags, or should we do
this per-operation/endpoint?
> 2. how should we do the capability/versioning for each unit
(either per tag or per operation)
>
> Shall we first conclude on 1?
>
> For 1, my 

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Robert Stupp



On 27.06.24 19:05, Micah Kornfield wrote:
Maybe it pays to prototype the individual end point approach to 
demonstrate its relative complexity?


The math is pretty simple: you need to duplicate all endpoints, all 
request/response schema types, all tests, duplicate and/or adopt client 
code. Now imagine that things evolve rather quickly, and you end up with 
a huge amount of boilerplate (because mostly repeated) code. And finally 
pray that all endpoint versions and request/response schema definitions 
are correct.




Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Micah Kornfield
>
> If it means returning a response with 20-30ish hard-coded entries, and the
> client is configured based on that, that seems totally reasonable to me.


It reads to me that a lot of the debate is around the complexity of one
approach for the other.  Maybe it pays to prototype the individual end
point approach to demonstrate its relative complexity?

Thanks,
Micah



On Thu, Jun 27, 2024 at 9:33 AM Jack Ye  wrote:

> I feel Alex is already tapping into the more complex territory I do not
> want to go into, because as he says, a "capability" is logical, and it can
> be a set of overlapping endpoints, small features in some endpoints, etc.
> We already saw that in the original PR we tried to say "pagination" is a
> capability, but it is really just a very small feature of an endpoint, and
> it might also evolve on its own to extend to other endpoints in the future,
> and maybe one endpoint supports it and one does not...
>
> My fear is that we are getting into the business of defining things that
> are totally unnecessary. It takes our energy to define that and debate the
> boundary of each capability, but in the end what does it buy us? As Eduard
> says, you still need to have documentation to explain what "partial
> capabilities" are supported for a catalog, and people are supposed to read
> the documentation to not do the unsupported things.
>
> In the end, the client-server needs to understand exactly (1) what
> endpoints can be invoked, and (2) using what request-response schema, that
> is the key to me. If it means returning a response with 20-30ish hard-coded
> entries, and the client is configured based on that, that seems totally
> reasonable to me.
>
> -Jack
>
>
>
>
>
> On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra 
> wrote:
>
>> Hi all,
>>
>> So far we've been thinking of capabilities as equivalent to a set of
>> endpoints.
>>
>> That's a rather technical definition. It also brings one important
>> limitation: one endpoint can only be "governed" by one capability.
>>
>> Granted, most capabilities do require implementing specific endpoints.
>> But I wonder if, for the sake of being future-proof, we shouldn't broaden
>> the meaning of that term to embrace *logical* or *behavioral* concepts
>> as well.
>>
>> One example that comes to mind: a REST catalog implementor may choose to
>> implement the transactions-commit endpoint to fully comply with the
>> "tables" capability; but for performance reasons, or simply because it's
>> too complex, they could opt for rejecting multi-table commits (iow, if a
>> CommitTransactionRequest contains one single CommitTableRequest, that's
>> fine, otherwise, the endpoint would return an error). It would be nice to
>> express that as a capability: this way the client knows that it is safe to
>> call the transactions-commit endpoint, but with one CommitTableRequest at a
>> time.
>>
>> Such a capability would not be defined by a specific endpoint, but
>> rather, would influence the behavior exhibited by certain endpoints.
>>
>> Thanks,
>>
>> Alex
>>
>> On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Jack
>>>
>>> I like Robert's proposal. Back to the topics, I think grouping with
>>> tags is more "flexible" (it was what we included in the REST spec
>>> proposal as well).
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Jun 26, 2024 at 6:26 PM Jack Ye  wrote:
>>> >
>>> > It seems like there are 2 sub-topics here:
>>> > 1. should we group operations with tags, or should we do this
>>> per-operation/endpoint?
>>> > 2. how should we do the capability/versioning for each unit (either
>>> per tag or per operation)
>>> >
>>> > Shall we first conclude on 1?
>>> >
>>> > For 1, my take is that we will need to do it per operation, for 2
>>> reasons:
>>> >
>>> > (1) There are many REST services that would only implement a very
>>> small set of APIs, such as just loadTable and loadView. Some will choose to
>>> not implement very specific endpoints, such as renameTable. Tags seems
>>> convenient but it is mandating people to implement a specific group of APIs
>>> together, which is a lot of burdens for especially small organizations, if
>>> they just want to support very specific goals like reading through IRC.
>>> >
>>> > (2) Suppose a new tag is added in the future, the server returns that
>>> tag, but an older client does not understand it, it might cause mistakes in
>>> the client's understanding of what is supported and what is not, when a tag
>>> contains both features in existing APIs and also new APIs. If we define
>>> that tags do not overlap with each other, this is probably not a concern.
>>> However, (1) still is a problem from a usability perspective.
>>> >
>>> > Best,
>>> > Jack Ye
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks 
>>> wrote:
>>> >>
>>> >> I think Robert's approach is a reasonable compromise here.
>>> >>
>>> >> If we wanted a "per operation/endpoint" versioning, I think I'd
>>> prefer Micah's OpenAPI spec based ap

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Jack Ye
I feel Alex is already tapping into the more complex territory I do not
want to go into, because as he says, a "capability" is logical, and it can
be a set of overlapping endpoints, small features in some endpoints, etc.
We already saw that in the original PR we tried to say "pagination" is a
capability, but it is really just a very small feature of an endpoint, and
it might also evolve on its own to extend to other endpoints in the future,
and maybe one endpoint supports it and one does not...

My fear is that we are getting into the business of defining things that
are totally unnecessary. It takes our energy to define that and debate the
boundary of each capability, but in the end what does it buy us? As Eduard
says, you still need to have documentation to explain what "partial
capabilities" are supported for a catalog, and people are supposed to read
the documentation to not do the unsupported things.

In the end, the client-server needs to understand exactly (1) what
endpoints can be invoked, and (2) using what request-response schema, that
is the key to me. If it means returning a response with 20-30ish hard-coded
entries, and the client is configured based on that, that seems totally
reasonable to me.

-Jack





On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra 
wrote:

> Hi all,
>
> So far we've been thinking of capabilities as equivalent to a set of
> endpoints.
>
> That's a rather technical definition. It also brings one important
> limitation: one endpoint can only be "governed" by one capability.
>
> Granted, most capabilities do require implementing specific endpoints. But
> I wonder if, for the sake of being future-proof, we shouldn't broaden the
> meaning of that term to embrace *logical* or *behavioral* concepts as
> well.
>
> One example that comes to mind: a REST catalog implementor may choose to
> implement the transactions-commit endpoint to fully comply with the
> "tables" capability; but for performance reasons, or simply because it's
> too complex, they could opt for rejecting multi-table commits (iow, if a
> CommitTransactionRequest contains one single CommitTableRequest, that's
> fine, otherwise, the endpoint would return an error). It would be nice to
> express that as a capability: this way the client knows that it is safe to
> call the transactions-commit endpoint, but with one CommitTableRequest at a
> time.
>
> Such a capability would not be defined by a specific endpoint, but rather,
> would influence the behavior exhibited by certain endpoints.
>
> Thanks,
>
> Alex
>
> On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Jack
>>
>> I like Robert's proposal. Back to the topics, I think grouping with
>> tags is more "flexible" (it was what we included in the REST spec
>> proposal as well).
>>
>> Regards
>> JB
>>
>> On Wed, Jun 26, 2024 at 6:26 PM Jack Ye  wrote:
>> >
>> > It seems like there are 2 sub-topics here:
>> > 1. should we group operations with tags, or should we do this
>> per-operation/endpoint?
>> > 2. how should we do the capability/versioning for each unit (either per
>> tag or per operation)
>> >
>> > Shall we first conclude on 1?
>> >
>> > For 1, my take is that we will need to do it per operation, for 2
>> reasons:
>> >
>> > (1) There are many REST services that would only implement a very small
>> set of APIs, such as just loadTable and loadView. Some will choose to not
>> implement very specific endpoints, such as renameTable. Tags seems
>> convenient but it is mandating people to implement a specific group of APIs
>> together, which is a lot of burdens for especially small organizations, if
>> they just want to support very specific goals like reading through IRC.
>> >
>> > (2) Suppose a new tag is added in the future, the server returns that
>> tag, but an older client does not understand it, it might cause mistakes in
>> the client's understanding of what is supported and what is not, when a tag
>> contains both features in existing APIs and also new APIs. If we define
>> that tags do not overlap with each other, this is probably not a concern.
>> However, (1) still is a problem from a usability perspective.
>> >
>> > Best,
>> > Jack Ye
>> >
>> >
>> >
>> >
>> > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks  wrote:
>> >>
>> >> I think Robert's approach is a reasonable compromise here.
>> >>
>> >> If we wanted a "per operation/endpoint" versioning, I think I'd prefer
>> Micah's OpenAPI spec based approach because it's more standardized, but I
>> feel adds a lot of client complexity.
>> >>
>> >> -Dan
>> >>
>> >>
>> >>
>> >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:
>> >>>
>> >>> (I think, compatibility deserves a separate thread - it's a "huge"
>> topic)
>> >>>
>> >>> Based on experience, we decided on the following with Nessie:
>> >>>
>> >>> Unknown fields/attributes in a structure _DO_ cause (de)serialization
>> failures.
>> >>> "Stable API versions" - endpoint additions and/or added query
>> parameters and/or enhanced structures do 

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Alex Dutra
Hi all,

So far we've been thinking of capabilities as equivalent to a set of
endpoints.

That's a rather technical definition. It also brings one important
limitation: one endpoint can only be "governed" by one capability.

Granted, most capabilities do require implementing specific endpoints. But
I wonder if, for the sake of being future-proof, we shouldn't broaden the
meaning of that term to embrace *logical* or *behavioral* concepts as well.

One example that comes to mind: a REST catalog implementor may choose to
implement the transactions-commit endpoint to fully comply with the
"tables" capability; but for performance reasons, or simply because it's
too complex, they could opt for rejecting multi-table commits (iow, if a
CommitTransactionRequest contains one single CommitTableRequest, that's
fine, otherwise, the endpoint would return an error). It would be nice to
express that as a capability: this way the client knows that it is safe to
call the transactions-commit endpoint, but with one CommitTableRequest at a
time.

Such a capability would not be defined by a specific endpoint, but rather,
would influence the behavior exhibited by certain endpoints.

Thanks,

Alex

On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré 
wrote:

> Hi Jack
>
> I like Robert's proposal. Back to the topics, I think grouping with
> tags is more "flexible" (it was what we included in the REST spec
> proposal as well).
>
> Regards
> JB
>
> On Wed, Jun 26, 2024 at 6:26 PM Jack Ye  wrote:
> >
> > It seems like there are 2 sub-topics here:
> > 1. should we group operations with tags, or should we do this
> per-operation/endpoint?
> > 2. how should we do the capability/versioning for each unit (either per
> tag or per operation)
> >
> > Shall we first conclude on 1?
> >
> > For 1, my take is that we will need to do it per operation, for 2
> reasons:
> >
> > (1) There are many REST services that would only implement a very small
> set of APIs, such as just loadTable and loadView. Some will choose to not
> implement very specific endpoints, such as renameTable. Tags seems
> convenient but it is mandating people to implement a specific group of APIs
> together, which is a lot of burdens for especially small organizations, if
> they just want to support very specific goals like reading through IRC.
> >
> > (2) Suppose a new tag is added in the future, the server returns that
> tag, but an older client does not understand it, it might cause mistakes in
> the client's understanding of what is supported and what is not, when a tag
> contains both features in existing APIs and also new APIs. If we define
> that tags do not overlap with each other, this is probably not a concern.
> However, (1) still is a problem from a usability perspective.
> >
> > Best,
> > Jack Ye
> >
> >
> >
> >
> > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks  wrote:
> >>
> >> I think Robert's approach is a reasonable compromise here.
> >>
> >> If we wanted a "per operation/endpoint" versioning, I think I'd prefer
> Micah's OpenAPI spec based approach because it's more standardized, but I
> feel adds a lot of client complexity.
> >>
> >> -Dan
> >>
> >>
> >>
> >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:
> >>>
> >>> (I think, compatibility deserves a separate thread - it's a "huge"
> topic)
> >>>
> >>> Based on experience, we decided on the following with Nessie:
> >>>
> >>> Unknown fields/attributes in a structure _DO_ cause (de)serialization
> failures.
> >>> "Stable API versions" - endpoint additions and/or added query
> parameters and/or enhanced structures do _NOT_ require a new API version
> (as in the endpoint's route/path).
> >>> "Flexible spec versions" - new and updated "capabilities" however
> might cause a bump in the "spec version" that the server announces in its
> `getConfig` result.
> >>>
> >>> Adding new routes/paths may require new endpoint implementations on
> the server side, which can easily lead to a lot of (unnecessarily
> boilerplate) code. Using different routes/paths is justified if the API is
> changed "fundamentally". We call the "path component" (api/v1/...,
> api/v2/...) API version - the server indicates the minimum and maximum
> supported API version, in case a client wants to "upgrade". I recommend to
> _not_ bump the API version in the route/path if it's not really necessary.
> >>>
> >>> Regarding the requirement to fail on unknown attributes: Unknown
> attributes may contain important information. A client may send a newer
> version of a request object with an important new field, but the (older)
> server discards the new attribute. Think of an attribute that for example
> defines a "commit condition" that the client expects to be respected. "New"
> attributes must be omittable (e.g. don't serialize if null/default) -
> clients indicate the "usage" of an added attribute using some request
> attribute (for example: "boolean returnExtendedInformation").
> >>>
> >>> The list of capabilities can be indicated with include

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Jean-Baptiste Onofré
Hi Jack

I like Robert's proposal. Back to the topics, I think grouping with
tags is more "flexible" (it was what we included in the REST spec
proposal as well).

Regards
JB

On Wed, Jun 26, 2024 at 6:26 PM Jack Ye  wrote:
>
> It seems like there are 2 sub-topics here:
> 1. should we group operations with tags, or should we do this 
> per-operation/endpoint?
> 2. how should we do the capability/versioning for each unit (either per tag 
> or per operation)
>
> Shall we first conclude on 1?
>
> For 1, my take is that we will need to do it per operation, for 2 reasons:
>
> (1) There are many REST services that would only implement a very small set 
> of APIs, such as just loadTable and loadView. Some will choose to not 
> implement very specific endpoints, such as renameTable. Tags seems convenient 
> but it is mandating people to implement a specific group of APIs together, 
> which is a lot of burdens for especially small organizations, if they just 
> want to support very specific goals like reading through IRC.
>
> (2) Suppose a new tag is added in the future, the server returns that tag, 
> but an older client does not understand it, it might cause mistakes in the 
> client's understanding of what is supported and what is not, when a tag 
> contains both features in existing APIs and also new APIs. If we define that 
> tags do not overlap with each other, this is probably not a concern. However, 
> (1) still is a problem from a usability perspective.
>
> Best,
> Jack Ye
>
>
>
>
> On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks  wrote:
>>
>> I think Robert's approach is a reasonable compromise here.
>>
>> If we wanted a "per operation/endpoint" versioning, I think I'd prefer 
>> Micah's OpenAPI spec based approach because it's more standardized, but I 
>> feel adds a lot of client complexity.
>>
>> -Dan
>>
>>
>>
>> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:
>>>
>>> (I think, compatibility deserves a separate thread - it's a "huge" topic)
>>>
>>> Based on experience, we decided on the following with Nessie:
>>>
>>> Unknown fields/attributes in a structure _DO_ cause (de)serialization 
>>> failures.
>>> "Stable API versions" - endpoint additions and/or added query parameters 
>>> and/or enhanced structures do _NOT_ require a new API version (as in the 
>>> endpoint's route/path).
>>> "Flexible spec versions" - new and updated "capabilities" however might 
>>> cause a bump in the "spec version" that the server announces in its 
>>> `getConfig` result.
>>>
>>> Adding new routes/paths may require new endpoint implementations on the 
>>> server side, which can easily lead to a lot of (unnecessarily boilerplate) 
>>> code. Using different routes/paths is justified if the API is changed 
>>> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API 
>>> version - the server indicates the minimum and maximum supported API 
>>> version, in case a client wants to "upgrade". I recommend to _not_ bump the 
>>> API version in the route/path if it's not really necessary.
>>>
>>> Regarding the requirement to fail on unknown attributes: Unknown attributes 
>>> may contain important information. A client may send a newer version of a 
>>> request object with an important new field, but the (older) server discards 
>>> the new attribute. Think of an attribute that for example defines a "commit 
>>> condition" that the client expects to be respected. "New" attributes must 
>>> be omittable (e.g. don't serialize if null/default) - clients indicate the 
>>> "usage" of an added attribute using some request attribute (for example: 
>>> "boolean returnExtendedInformation").
>>>
>>> The list of capabilities can be indicated with included "spec versions", to 
>>> tell clients which features/functionalities a server supports."Production" 
>>> spec versions could start with 1, and "reserve" 0 for 
>>> experimental/unsupported/poc kind of implementation. It could look like 
>>> this:
>>>   capabilities: [
>>> "table-spec/2,3",   // but not table-spec v1 here
>>> "view-spec/1",
>>> "table-api/1",
>>> "view-api/1",
>>> "udf-api/1",
>>> "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
>>> ...
>>>   ]
>>> Incrementing a spec version in the list of capabilities doesn't break any 
>>> client. We could also define a structure to describe each capability:
>>>   components:
>>> schemas:
>>>   Capability:
>>> name:
>>>   type: string
>>>   description: Name of the capability
>>> versions:
>>>   type: array:
>>>   description: List of supported spec versions of this capability. 
>>> 0 means experimental (non-production) without any guarantees about the 
>>> stability of schema for request and response parameters.
>>>   items:
>>> type: integer
>>> format: int32
>>>
>>> In Nessie, we ensure backwards and forwards compatibility using a 
>>> specialized test suite that runs the "in tree" client agai

Re: [DISCUSS] Describing REST Server capabilities

2024-06-27 Thread Eduard Tudenhöfner
IMO a capability is a coarse-grained way of describing that a catalog
supports X, Y, Z.
In order to support X it needs to implement the particular endpoints under
X, otherwise it doesn't fully support X.

>From a user's perspective this makes it easy to understand whether a
catalog fully supports e.g. views, tables, scan planning, and so on.

If a catalog only partially supports a capability then this can be
highlighted through docs or other meaningful ways.
In the long run we want to make it easy to understand for humans and REST
clients what capabilities a server supports.

That being said, I'm in favor of grouping capabilities by tags.

Coming back to the versioning that Robert suggested, I think it's
reasonable to add some versioning info to a capability to signal which
version of a capability a server is supporting.

Thanks
Eduard

On Thu, Jun 27, 2024 at 8:08 AM Péter Váry 
wrote:

> I don't have a very strong opinion in the groups vs. single services
> debate, but I lean towards grouping, as that makes the result of the
> service human readable too. I expect that small/incomplete services will be
> the exception in the long run, and they can highlight the implemented
> services in their documentation for their smaller userbase.
>
> About the API proposed by Robert:
> - It took me some time, to understand that "table-spec" is for the
> supported object specification, and "table-api" is for the supported
> features.
>
> I would split these to 2 different lists for easier understanding, like
> "specifications" and "capabilities".
>
>  Thanks, Peter
>
>
> On Wed, Jun 26, 2024, 23:00 Jack Ye  wrote:
>
>> > evaluate a REST service and if it's good for their use case
>>
>> Feels like you are talking from the perspective of people choosing a
>> vendor product. I believe most vendors will offer near-full capability. But
>> I am coming from an angle of small organizations that are building REST
>> servers just for opening connectivity of internal systems to open engines,
>> and I know there are many doing that right now. And all they need is
>> technically just a handful of APIs, many of them will be read-only, very
>> limited in auxiliary features.
>>
>> And I believe this is also the current state of Databricks Unity, which
>> only implements the following:
>> - getConfig
>> - listNamespaces
>> - loadNamespaceMetadata
>> - listTables
>> - loadTable
>> - tableExists
>> - emitMetrics
>>
>> Code ref:
>> https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/service/IcebergRestCatalogService.java
>>
>> How do I express the capability of that?
>>
>> Maybe with your team joining Databricks, you will add full capabilities,
>> but I think it is totally okay in its current state, it serves the main use
>> cases very well.
>>
>> And feels like we are optimizing people writing 5 capabilities vs 20
>> capabilities, and we need to spend a lot of effort in debating how to group
>> capabilities going forward, which I'd rather spend the energy doing other
>> things...
>>
>> -Jack
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 26, 2024 at 1:41 PM Amogh Jahagirdar 
>> wrote:
>>
>>> I'm in favor of grouping by tags. The way I look at this, there are 2
>>> primary considerations:
>>>
>>> 1.) The client/server protocol complexity tradeoffs. On the first
>>> consideration, unless I'm missing something the client side becomes
>>> significantly more complex; if this has been sketched out earlier in this
>>> thread just point me to it. Grouping by tag seems more easy to manage from
>>> a client side but happy to be proven wrong here if that's not the case,
>>> this is just going off whats in my head at the moment.
>>>
>>> 2.) What is more useful for end users of Iceberg to evaluate the
>>> capabilities of a REST Server?
>>> In the end as REST is becoming more adopted, I think it's more healthy
>>> for the ecosystem
>>> if the community can define groupings which end users can more easily
>>> understand when trying to understand what a REST implementation can
>>> actually do. To me defining "X operation/endpoint Version Y" for all
>>> operations makes it needlessly more difficult for users to evaluate a REST
>>> service and if it's good for their use case. Standardizing the grouping by
>>> a simple tag name has a clear benefit to end users imo.
>>>
>>> > (1) There are many REST services that would only implement a very
>>> small set
>>> > of APIs, such as just loadTable and loadView. Some will choose to not
>>> > implement very specific endpoints, such as renameTable. Tags seems
>>> > convenient but it is mandating people to implement a specific group of
>>> APIs
>>> > together, which is a lot of burdens for especially small
>>> organizations, if
>>> > they just want to support very specific goals like reading through IRC.
>>>
>>> I can understand the concern here but going back to my earlier point,
>>> the capability tagging really should be beneficial for end users which I
>>> think optimizing for tha

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Péter Váry
I don't have a very strong opinion in the groups vs. single services
debate, but I lean towards grouping, as that makes the result of the
service human readable too. I expect that small/incomplete services will be
the exception in the long run, and they can highlight the implemented
services in their documentation for their smaller userbase.

About the API proposed by Robert:
- It took me some time, to understand that "table-spec" is for the
supported object specification, and "table-api" is for the supported
features.

I would split these to 2 different lists for easier understanding, like
"specifications" and "capabilities".

 Thanks, Peter


On Wed, Jun 26, 2024, 23:00 Jack Ye  wrote:

> > evaluate a REST service and if it's good for their use case
>
> Feels like you are talking from the perspective of people choosing a
> vendor product. I believe most vendors will offer near-full capability. But
> I am coming from an angle of small organizations that are building REST
> servers just for opening connectivity of internal systems to open engines,
> and I know there are many doing that right now. And all they need is
> technically just a handful of APIs, many of them will be read-only, very
> limited in auxiliary features.
>
> And I believe this is also the current state of Databricks Unity, which
> only implements the following:
> - getConfig
> - listNamespaces
> - loadNamespaceMetadata
> - listTables
> - loadTable
> - tableExists
> - emitMetrics
>
> Code ref:
> https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/service/IcebergRestCatalogService.java
>
> How do I express the capability of that?
>
> Maybe with your team joining Databricks, you will add full capabilities,
> but I think it is totally okay in its current state, it serves the main use
> cases very well.
>
> And feels like we are optimizing people writing 5 capabilities vs 20
> capabilities, and we need to spend a lot of effort in debating how to group
> capabilities going forward, which I'd rather spend the energy doing other
> things...
>
> -Jack
>
>
>
>
>
>
> On Wed, Jun 26, 2024 at 1:41 PM Amogh Jahagirdar 
> wrote:
>
>> I'm in favor of grouping by tags. The way I look at this, there are 2
>> primary considerations:
>>
>> 1.) The client/server protocol complexity tradeoffs. On the first
>> consideration, unless I'm missing something the client side becomes
>> significantly more complex; if this has been sketched out earlier in this
>> thread just point me to it. Grouping by tag seems more easy to manage from
>> a client side but happy to be proven wrong here if that's not the case,
>> this is just going off whats in my head at the moment.
>>
>> 2.) What is more useful for end users of Iceberg to evaluate the
>> capabilities of a REST Server?
>> In the end as REST is becoming more adopted, I think it's more healthy
>> for the ecosystem
>> if the community can define groupings which end users can more easily
>> understand when trying to understand what a REST implementation can
>> actually do. To me defining "X operation/endpoint Version Y" for all
>> operations makes it needlessly more difficult for users to evaluate a REST
>> service and if it's good for their use case. Standardizing the grouping by
>> a simple tag name has a clear benefit to end users imo.
>>
>> > (1) There are many REST services that would only implement a very small
>> set
>> > of APIs, such as just loadTable and loadView. Some will choose to not
>> > implement very specific endpoints, such as renameTable. Tags seems
>> > convenient but it is mandating people to implement a specific group of
>> APIs
>> > together, which is a lot of burdens for especially small organizations,
>> if
>> > they just want to support very specific goals like reading through IRC.
>>
>> I can understand the concern here but going back to my earlier point, the
>> capability tagging really should be beneficial for end users which I think
>> optimizing for that is more important.  It's true that for a REST server to
>> be considered "capability X compliant" it needs to implement all the
>> endpoints for X which does have a burden on a server implementation, but I
>> think that's net better for the ecosystem since a broader set of users have
>> a clear idea of what's supported and can make good decisions for themselves
>> since everyone is speaking the same standard language.
>>
>> Furthermore, we could also look at if it makes sense to make the tags
>> more granular if the scenario described is actually common.
>>
>> On 2024/06/26 16:26:29 Jack Ye wrote:
>> > It seems like there are 2 sub-topics here:
>> > 1. should we group operations with tags, or should we do this
>> > per-operation/endpoint?
>> > 2. how should we do the capability/versioning for each unit (either per
>> tag
>> > or per operation)
>> >
>> > Shall we first conclude on 1?
>> >
>> > For 1, my take is that we will need to do it per operation, for 2
>> reasons:
>> >
>> > (1) There are m

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Jack Ye
> evaluate a REST service and if it's good for their use case

Feels like you are talking from the perspective of people choosing a vendor
product. I believe most vendors will offer near-full capability. But I am
coming from an angle of small organizations that are building REST servers
just for opening connectivity of internal systems to open engines, and I
know there are many doing that right now. And all they need is technically
just a handful of APIs, many of them will be read-only, very limited in
auxiliary features.

And I believe this is also the current state of Databricks Unity, which
only implements the following:
- getConfig
- listNamespaces
- loadNamespaceMetadata
- listTables
- loadTable
- tableExists
- emitMetrics

Code ref:
https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/service/IcebergRestCatalogService.java

How do I express the capability of that?

Maybe with your team joining Databricks, you will add full capabilities,
but I think it is totally okay in its current state, it serves the main use
cases very well.

And feels like we are optimizing people writing 5 capabilities vs 20
capabilities, and we need to spend a lot of effort in debating how to group
capabilities going forward, which I'd rather spend the energy doing other
things...

-Jack






On Wed, Jun 26, 2024 at 1:41 PM Amogh Jahagirdar  wrote:

> I'm in favor of grouping by tags. The way I look at this, there are 2
> primary considerations:
>
> 1.) The client/server protocol complexity tradeoffs. On the first
> consideration, unless I'm missing something the client side becomes
> significantly more complex; if this has been sketched out earlier in this
> thread just point me to it. Grouping by tag seems more easy to manage from
> a client side but happy to be proven wrong here if that's not the case,
> this is just going off whats in my head at the moment.
>
> 2.) What is more useful for end users of Iceberg to evaluate the
> capabilities of a REST Server?
> In the end as REST is becoming more adopted, I think it's more healthy for
> the ecosystem
> if the community can define groupings which end users can more easily
> understand when trying to understand what a REST implementation can
> actually do. To me defining "X operation/endpoint Version Y" for all
> operations makes it needlessly more difficult for users to evaluate a REST
> service and if it's good for their use case. Standardizing the grouping by
> a simple tag name has a clear benefit to end users imo.
>
> > (1) There are many REST services that would only implement a very small
> set
> > of APIs, such as just loadTable and loadView. Some will choose to not
> > implement very specific endpoints, such as renameTable. Tags seems
> > convenient but it is mandating people to implement a specific group of
> APIs
> > together, which is a lot of burdens for especially small organizations,
> if
> > they just want to support very specific goals like reading through IRC.
>
> I can understand the concern here but going back to my earlier point, the
> capability tagging really should be beneficial for end users which I think
> optimizing for that is more important.  It's true that for a REST server to
> be considered "capability X compliant" it needs to implement all the
> endpoints for X which does have a burden on a server implementation, but I
> think that's net better for the ecosystem since a broader set of users have
> a clear idea of what's supported and can make good decisions for themselves
> since everyone is speaking the same standard language.
>
> Furthermore, we could also look at if it makes sense to make the tags more
> granular if the scenario described is actually common.
>
> On 2024/06/26 16:26:29 Jack Ye wrote:
> > It seems like there are 2 sub-topics here:
> > 1. should we group operations with tags, or should we do this
> > per-operation/endpoint?
> > 2. how should we do the capability/versioning for each unit (either per
> tag
> > or per operation)
> >
> > Shall we first conclude on 1?
> >
> > For 1, my take is that we will need to do it per operation, for 2
> reasons:
> >
> > (1) There are many REST services that would only implement a very small
> set
> > of APIs, such as just loadTable and loadView. Some will choose to not
> > implement very specific endpoints, such as renameTable. Tags seems
> > convenient but it is mandating people to implement a specific group of
> APIs
> > together, which is a lot of burdens for especially small organizations,
> if
> > they just want to support very specific goals like reading through IRC.
> >
> > (2) Suppose a new tag is added in the future, the server returns that
> tag,
> > but an older client does not understand it, it might cause mistakes in
> the
> > client's understanding of what is supported and what is not, when a tag
> > contains both features in existing APIs and also new APIs. If we define
> > that tags do not overlap with each other, this is probably n

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Amogh Jahagirdar
I'm in favor of grouping by tags. The way I look at this, there are 2 primary 
considerations:

1.) The client/server protocol complexity tradeoffs. On the first 
consideration, unless I'm missing something the client side becomes 
significantly more complex; if this has been sketched out earlier in this 
thread just point me to it. Grouping by tag seems more easy to manage from a 
client side but happy to be proven wrong here if that's not the case, this is 
just going off whats in my head at the moment.

2.) What is more useful for end users of Iceberg to evaluate the capabilities 
of a REST Server?
In the end as REST is becoming more adopted, I think it's more healthy for the 
ecosystem 
if the community can define groupings which end users can more easily 
understand when trying to understand what a REST implementation can actually 
do. To me defining "X operation/endpoint Version Y" for all operations makes it 
needlessly more difficult for users to evaluate a REST service and if it's good 
for their use case. Standardizing the grouping by a simple tag name has a clear 
benefit to end users imo.

> (1) There are many REST services that would only implement a very small set
> of APIs, such as just loadTable and loadView. Some will choose to not
> implement very specific endpoints, such as renameTable. Tags seems
> convenient but it is mandating people to implement a specific group of APIs
> together, which is a lot of burdens for especially small organizations, if
> they just want to support very specific goals like reading through IRC.

I can understand the concern here but going back to my earlier point, the 
capability tagging really should be beneficial for end users which I think 
optimizing for that is more important.  It's true that for a REST server to be 
considered "capability X compliant" it needs to implement all the endpoints for 
X which does have a burden on a server implementation, but I think that's net 
better for the ecosystem since a broader set of users have a clear idea of 
what's supported and can make good decisions for themselves since everyone is 
speaking the same standard language.

Furthermore, we could also look at if it makes sense to make the tags more 
granular if the scenario described is actually common.

On 2024/06/26 16:26:29 Jack Ye wrote:
> It seems like there are 2 sub-topics here:
> 1. should we group operations with tags, or should we do this
> per-operation/endpoint?
> 2. how should we do the capability/versioning for each unit (either per tag
> or per operation)
> 
> Shall we first conclude on 1?
> 
> For 1, my take is that we will need to do it per operation, for 2 reasons:
> 
> (1) There are many REST services that would only implement a very small set
> of APIs, such as just loadTable and loadView. Some will choose to not
> implement very specific endpoints, such as renameTable. Tags seems
> convenient but it is mandating people to implement a specific group of APIs
> together, which is a lot of burdens for especially small organizations, if
> they just want to support very specific goals like reading through IRC.
> 
> (2) Suppose a new tag is added in the future, the server returns that tag,
> but an older client does not understand it, it might cause mistakes in the
> client's understanding of what is supported and what is not, when a tag
> contains both features in existing APIs and also new APIs. If we define
> that tags do not overlap with each other, this is probably not a concern.
> However, (1) still is a problem from a usability perspective.
> 
> Best,
> Jack Ye
> 
> 
> 
> 
> On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks  wrote:
> 
> > I think Robert's approach is a reasonable compromise here.
> >
> > If we wanted a "per operation/endpoint" versioning, I think I'd prefer
> > Micah's OpenAPI spec based approach because it's more standardized, but I
> > feel adds a lot of client complexity.
> >
> > -Dan
> >
> >
> >
> > On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:
> >
> >> (I think, compatibility deserves a separate thread - it's a "huge" topic)
> >>
> >> Based on experience, we decided on the following with Nessie:
> >>
> >>- Unknown fields/attributes in a structure _DO_ cause
> >>(de)serialization failures.
> >>- "Stable API versions" - endpoint additions and/or added query
> >>parameters and/or enhanced structures do _NOT_ require a new API version
> >>(as in the endpoint's route/path).
> >>- "Flexible spec versions" - new and updated "capabilities" however
> >>might cause a bump in the "spec version" that the server announces in 
> >> its
> >>`getConfig` result.
> >>
> >> Adding new routes/paths may require new endpoint implementations on the
> >> server side, which can easily lead to a lot of (unnecessarily boilerplate)
> >> code. Using different routes/paths is justified if the API is changed
> >> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API
> >> version - the server indicates

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Jack Ye
It seems like there are 2 sub-topics here:
1. should we group operations with tags, or should we do this
per-operation/endpoint?
2. how should we do the capability/versioning for each unit (either per tag
or per operation)

Shall we first conclude on 1?

For 1, my take is that we will need to do it per operation, for 2 reasons:

(1) There are many REST services that would only implement a very small set
of APIs, such as just loadTable and loadView. Some will choose to not
implement very specific endpoints, such as renameTable. Tags seems
convenient but it is mandating people to implement a specific group of APIs
together, which is a lot of burdens for especially small organizations, if
they just want to support very specific goals like reading through IRC.

(2) Suppose a new tag is added in the future, the server returns that tag,
but an older client does not understand it, it might cause mistakes in the
client's understanding of what is supported and what is not, when a tag
contains both features in existing APIs and also new APIs. If we define
that tags do not overlap with each other, this is probably not a concern.
However, (1) still is a problem from a usability perspective.

Best,
Jack Ye




On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks  wrote:

> I think Robert's approach is a reasonable compromise here.
>
> If we wanted a "per operation/endpoint" versioning, I think I'd prefer
> Micah's OpenAPI spec based approach because it's more standardized, but I
> feel adds a lot of client complexity.
>
> -Dan
>
>
>
> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:
>
>> (I think, compatibility deserves a separate thread - it's a "huge" topic)
>>
>> Based on experience, we decided on the following with Nessie:
>>
>>- Unknown fields/attributes in a structure _DO_ cause
>>(de)serialization failures.
>>- "Stable API versions" - endpoint additions and/or added query
>>parameters and/or enhanced structures do _NOT_ require a new API version
>>(as in the endpoint's route/path).
>>- "Flexible spec versions" - new and updated "capabilities" however
>>might cause a bump in the "spec version" that the server announces in its
>>`getConfig` result.
>>
>> Adding new routes/paths may require new endpoint implementations on the
>> server side, which can easily lead to a lot of (unnecessarily boilerplate)
>> code. Using different routes/paths is justified if the API is changed
>> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API
>> version - the server indicates the minimum and maximum supported API
>> version, in case a client wants to "upgrade". I recommend to _not_ bump the
>> API version in the route/path if it's not really necessary.
>>
>> Regarding the requirement to fail on unknown attributes: Unknown
>> attributes may contain important information. A client may send a newer
>> version of a request object with an important new field, but the (older)
>> server discards the new attribute. Think of an attribute that for example
>> defines a "commit condition" that the client expects to be respected. "New"
>> attributes must be omittable (e.g. don't serialize if null/default) -
>> clients indicate the "usage" of an added attribute using some request
>> attribute (for example: "boolean returnExtendedInformation").
>>
>> The list of capabilities can be indicated with included "spec versions",
>> to tell clients which features/functionalities a server
>> supports."Production" spec versions could start with 1, and "reserve" 0 for
>> experimental/unsupported/poc kind of implementation. It could look like
>> this:
>>   capabilities: [
>> "table-spec/2,3",   // but not table-spec v1 here
>> "view-spec/1",
>> "table-api/1",
>> "view-api/1",
>> "udf-api/1",
>> "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
>> ...
>>   ]
>> Incrementing a spec version in the list of capabilities doesn't break any
>> client. We could also define a structure to describe each capability:
>>   components:
>> schemas:
>>   Capability:
>> name:
>>   type: string
>>   description: Name of the capability
>> versions:
>>   type: array:
>>   description: List of supported spec versions of this
>> capability. 0 means experimental (non-production) without any guarantees
>> about the stability of schema for request and response parameters.
>>   items:
>> type: integer
>> format: int32
>>
>> In Nessie, we ensure backwards and forwards compatibility using a
>> specialized test suite that runs the "in tree" client against older server
>> versions and older client versions against the "in tree" server version. It
>> works fine for us for a few years now - and it did help preventing
>> compatibility issues.
>>
>>
>> On 26.06.24 07:44, Péter Váry wrote:
>>
>> Hi everyone,
>>
>> A few considerations:
>> - I think we should explicitly state which client/service
>> interoperability w

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Daniel Weeks
I think Robert's approach is a reasonable compromise here.

If we wanted a "per operation/endpoint" versioning, I think I'd prefer
Micah's OpenAPI spec based approach because it's more standardized, but I
feel adds a lot of client complexity.

-Dan



On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp  wrote:

> (I think, compatibility deserves a separate thread - it's a "huge" topic)
>
> Based on experience, we decided on the following with Nessie:
>
>- Unknown fields/attributes in a structure _DO_ cause
>(de)serialization failures.
>- "Stable API versions" - endpoint additions and/or added query
>parameters and/or enhanced structures do _NOT_ require a new API version
>(as in the endpoint's route/path).
>- "Flexible spec versions" - new and updated "capabilities" however
>might cause a bump in the "spec version" that the server announces in its
>`getConfig` result.
>
> Adding new routes/paths may require new endpoint implementations on the
> server side, which can easily lead to a lot of (unnecessarily boilerplate)
> code. Using different routes/paths is justified if the API is changed
> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API
> version - the server indicates the minimum and maximum supported API
> version, in case a client wants to "upgrade". I recommend to _not_ bump the
> API version in the route/path if it's not really necessary.
>
> Regarding the requirement to fail on unknown attributes: Unknown
> attributes may contain important information. A client may send a newer
> version of a request object with an important new field, but the (older)
> server discards the new attribute. Think of an attribute that for example
> defines a "commit condition" that the client expects to be respected. "New"
> attributes must be omittable (e.g. don't serialize if null/default) -
> clients indicate the "usage" of an added attribute using some request
> attribute (for example: "boolean returnExtendedInformation").
>
> The list of capabilities can be indicated with included "spec versions",
> to tell clients which features/functionalities a server
> supports."Production" spec versions could start with 1, and "reserve" 0 for
> experimental/unsupported/poc kind of implementation. It could look like
> this:
>   capabilities: [
> "table-spec/2,3",   // but not table-spec v1 here
> "view-spec/1",
> "table-api/1",
> "view-api/1",
> "udf-api/1",
> "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
> ...
>   ]
> Incrementing a spec version in the list of capabilities doesn't break any
> client. We could also define a structure to describe each capability:
>   components:
> schemas:
>   Capability:
> name:
>   type: string
>   description: Name of the capability
> versions:
>   type: array:
>   description: List of supported spec versions of this capability.
> 0 means experimental (non-production) without any guarantees about the
> stability of schema for request and response parameters.
>   items:
> type: integer
> format: int32
>
> In Nessie, we ensure backwards and forwards compatibility using a
> specialized test suite that runs the "in tree" client against older server
> versions and older client versions against the "in tree" server version. It
> works fine for us for a few years now - and it did help preventing
> compatibility issues.
>
>
> On 26.06.24 07:44, Péter Váry wrote:
>
> Hi everyone,
>
> A few considerations:
> - I think we should explicitly state which client/service interoperability
> we are aiming for. I expect that we want to support both old client -> new
> server, and new client -> old server communications.
> - I agree with Jack, that we should think about versions in advance - HMS
> tried to be backwards compatible for everything, and that made it hard to
> move forward / deprecate things.
> - Still we should try to keep the backwards incompatible changes minimal.
> (All clients should be able to ignore unknown incoming fields / New
> optional input parameter should drive new features / Try to avoid enums in
> responses where we expect changes (?))
> - OTOH, it could be important for clients to know which of the backwards
> compatible changes are implemented for the given server - so I would
> decouple the URI from the versioning. Maybe major version change should
> (could) change the URI, but backwards compatible changes should be served
> on the same URI, but could be identified by different minor versions.
>
> This is exciting stuff!
> Thanks for pushing this forward!
>
> Peter
>
>
> On Wed, Jun 26, 2024, 00:15 Jack Ye  wrote:
>
>> Hi everyone,
>>
>> I feel I do not see a good answer to why not just simply version each
>> API? When using tag, it means I have to offer capabilities per-tagged
>> group. However, I could for example just offer loadTable and nothing else
>> in a catalog, and that should still be Iceberg

Re: [DISCUSS] Describing REST Server capabilities

2024-06-26 Thread Robert Stupp

(I think, compatibility deserves a separate thread - it's a "huge" topic)

Based on experience, we decided on the following with Nessie:

 * Unknown fields/attributes in a structure _DO_ cause
   (de)serialization failures.
 * "Stable API versions" - endpoint additions and/or added query
   parameters and/or enhanced structures do _NOT_ require a new API
   version (as in the endpoint's route/path).
 * "Flexible spec versions" - new and updated "capabilities" however
   might cause a bump in the "spec version" that the server announces
   in its `getConfig` result.

Adding new routes/paths may require new endpoint implementations on the 
server side, which can easily lead to a lot of (unnecessarily 
boilerplate) code. Using different routes/paths is justified if the API 
is changed "fundamentally". We call the "path component" (api/v1/..., 
api/v2/...) API version - the server indicates the minimum and maximum 
supported API version, in case a client wants to "upgrade". I recommend 
to _not_ bump the API version in the route/path if it's not really 
necessary.


Regarding the requirement to fail on unknown attributes: Unknown 
attributes may contain important information. A client may send a newer 
version of a request object with an important new field, but the (older) 
server discards the new attribute. Think of an attribute that for 
example defines a "commit condition" that the client expects to be 
respected. "New" attributes must be omittable (e.g. don't serialize if 
null/default) - clients indicate the "usage" of an added attribute using 
some request attribute (for example: "boolean returnExtendedInformation").


The list of capabilities can be indicated with included "spec versions", 
to tell clients which features/functionalities a server 
supports."Production" spec versions could start with 1, and "reserve" 0 
for experimental/unsupported/poc kind of implementation. It could look 
like this:

  capabilities: [
    "table-spec/2,3",   // but not table-spec v1 here
    "view-spec/1",
    "table-api/1",
    "view-api/1",
    "udf-api/1",
    "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
    ...
  ]
Incrementing a spec version in the list of capabilities doesn't break 
any client. We could also define a structure to describe each capability:

  components:
    schemas:
  Capability:
    name:
  type: string
  description: Name of the capability
    versions:
  type: array:
  description: List of supported spec versions of this 
capability. 0 means experimental (non-production) without any guarantees 
about the stability of schema for request and response parameters.

  items:
    type: integer
    format: int32

In Nessie, we ensure backwards and forwards compatibility using a 
specialized test suite that runs the "in tree" client against older 
server versions and older client versions against the "in tree" server 
version. It works fine for us for a few years now - and it did help 
preventing compatibility issues.



On 26.06.24 07:44, Péter Váry wrote:

Hi everyone,

A few considerations:
- I think we should explicitly state which client/service 
interoperability we are aiming for. I expect that we want to support 
both old client -> new server, and new client -> old server 
communications.
- I agree with Jack, that we should think about versions in advance - 
HMS tried to be backwards compatible for everything, and that made it 
hard to move forward / deprecate things.
- Still we should try to keep the backwards incompatible changes 
minimal. (All clients should be able to ignore unknown incoming fields 
/ New optional input parameter should drive new features / Try to 
avoid enums in responses where we expect changes (?))
- OTOH, it could be important for clients to know which of the 
backwards compatible changes are implemented for the given server - so 
I would decouple the URI from the versioning. Maybe major version 
change should (could) change the URI, but backwards compatible changes 
should be served on the same URI, but could be identified by different 
minor versions.


This is exciting stuff!
Thanks for pushing this forward!

Peter


On Wed, Jun 26, 2024, 00:15 Jack Ye  wrote:

Hi everyone,

I feel I do not see a good answer to why not just simply version
each API? When using tag, it means I have to offer capabilities
per-tagged group. However, I could for example just offer
loadTable and nothing else in a catalog, and that should still be
Iceberg REST compliant. And I think we need a versioning story
anyway, there is no way around it.

Here is the workflow in my mind with versioning:

1. Going forward, every time the REST catalog spec introduces any
new API endpoints or backwards incompatible changes to the
existing APIs, the version of the specific API is incremented. So
suppose the PlanTable API is added, this API will be at version
v1. Suppose U

Re: [DISCUSS] Describing REST Server capabilities

2024-06-25 Thread Péter Váry
Hi everyone,

A few considerations:
- I think we should explicitly state which client/service interoperability
we are aiming for. I expect that we want to support both old client -> new
server, and new client -> old server communications.
- I agree with Jack, that we should think about versions in advance - HMS
tried to be backwards compatible for everything, and that made it hard to
move forward / deprecate things.
- Still we should try to keep the backwards incompatible changes minimal.
(All clients should be able to ignore unknown incoming fields / New
optional input parameter should drive new features / Try to avoid enums in
responses where we expect changes (?))
- OTOH, it could be important for clients to know which of the backwards
compatible changes are implemented for the given server - so I would
decouple the URI from the versioning. Maybe major version change should
(could) change the URI, but backwards compatible changes should be served
on the same URI, but could be identified by different minor versions.

This is exciting stuff!
Thanks for pushing this forward!

Peter


On Wed, Jun 26, 2024, 00:15 Jack Ye  wrote:

> Hi everyone,
>
> I feel I do not see a good answer to why not just simply version each API?
> When using tag, it means I have to offer capabilities per-tagged group.
> However, I could for example just offer loadTable and nothing else in a
> catalog, and that should still be Iceberg REST compliant. And I think we
> need a versioning story anyway, there is no way around it.
>
> Here is the workflow in my mind with versioning:
>
> 1. Going forward, every time the REST catalog spec introduces any new API
> endpoints or backwards incompatible changes to the existing APIs, the
> version of the specific API is incremented. So suppose the PlanTable API is
> added, this API will be at version v1. Suppose UpdateTable is updated with
> a new update type, that API will be at version v2, but PlanTable will
> remain at v1.
>
> 2. a catalog must implement getConfig. This API is the only one that is
> required.
>
> 3. in getConfig, in the defaults map (it could be in some new metadata
> structure, but since we want strong backwards compatibility guarantee,
> reusing string maps seems to be the best way), server returns key-value
> pairs of:
> - key: operation:
> - value: version number
>
> 4. the client assumes that the map is ordered, and resolves API versions
> sequentially. For example, suppose I have the following map:
>
> { "operation:planTable": "1", "operation:loadTable": "2" }
>
> Note that by "supporting", it means to return a response in a predictable
> way that is compliant with the spec. It can also return 406
> UnsupportedOperation as a way to support it.
>
> There is also a special version *, that means any version can work.
>
> 5. Backwards compatibility: suppose the client is at a higher version than
> the server, then the client should always be able to understand the
> server's full list of capabilities.
>
> 6. Forward compatibility: suppose the client is at a lower version than
> the server, then the client should parse whatever operation it understands,
> and use the highest version it could support to execute the operation.
> Suppose the client only supports loadTable v1, then it will continue to hit
> the GET v1/namespaces/{ns}/tables/{table} route, instead of GET
> v2/namespaces/{ns}/tables/{table}. The v1 route could continue to support
> the client, or it could throw 406 to indicate that this route is deprecated
> and the client needs to upgrade.
>
> For initial backwards compatibility, I think not returning anything should
> mean that all API that the client understands are having version *.
>
> What do people think of it, compared to the tag approach?
>
> Best,
> Jack Ye
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield 
> wrote:
>
>> I don't have strong opinions either way here, just thought it was worth
>> raising some concerns over possible evolution here.  Some responses inline,
>> but if capabilities seem to meet the requirement at hand, then it does
>> potentially seem the simplest mechanism.
>>
>>
>> I think we also want to avoid relyance on server specific published
>>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>>> confusion around what the canonical spec is and make clients incompatible
>>> if they're generated off of a non-standard spec document.
>>
>>
>> Yeah, I wasn't proposing necessarily using built in functionality but a
>> pre-scrubbed document.  Since there is no reference service implementation
>> for REST it seems like each implementor would need to describe the best way
>> of scrubbing there description.
>>
>>
>>
>>> @Micah this sounds to me as if the client would then have to parse a
>>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>>> view or dropping a table on the given REST server. Rather than having a
>>> dedicated endpoint we're just using the */

Re: [DISCUSS] Describing REST Server capabilities

2024-06-25 Thread Jack Ye
Hi everyone,

I feel I do not see a good answer to why not just simply version each API?
When using tag, it means I have to offer capabilities per-tagged group.
However, I could for example just offer loadTable and nothing else in a
catalog, and that should still be Iceberg REST compliant. And I think we
need a versioning story anyway, there is no way around it.

Here is the workflow in my mind with versioning:

1. Going forward, every time the REST catalog spec introduces any new API
endpoints or backwards incompatible changes to the existing APIs, the
version of the specific API is incremented. So suppose the PlanTable API is
added, this API will be at version v1. Suppose UpdateTable is updated with
a new update type, that API will be at version v2, but PlanTable will
remain at v1.

2. a catalog must implement getConfig. This API is the only one that is
required.

3. in getConfig, in the defaults map (it could be in some new metadata
structure, but since we want strong backwards compatibility guarantee,
reusing string maps seems to be the best way), server returns key-value
pairs of:
- key: operation:
- value: version number

4. the client assumes that the map is ordered, and resolves API versions
sequentially. For example, suppose I have the following map:

{ "operation:planTable": "1", "operation:loadTable": "2" }

Note that by "supporting", it means to return a response in a predictable
way that is compliant with the spec. It can also return 406
UnsupportedOperation as a way to support it.

There is also a special version *, that means any version can work.

5. Backwards compatibility: suppose the client is at a higher version than
the server, then the client should always be able to understand the
server's full list of capabilities.

6. Forward compatibility: suppose the client is at a lower version than the
server, then the client should parse whatever operation it understands, and
use the highest version it could support to execute the operation. Suppose
the client only supports loadTable v1, then it will continue to hit the GET
v1/namespaces/{ns}/tables/{table} route, instead of GET
v2/namespaces/{ns}/tables/{table}. The v1 route could continue to support
the client, or it could throw 406 to indicate that this route is deprecated
and the client needs to upgrade.

For initial backwards compatibility, I think not returning anything should
mean that all API that the client understands are having version *.

What do people think of it, compared to the tag approach?

Best,
Jack Ye



















On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield 
wrote:

> I don't have strong opinions either way here, just thought it was worth
> raising some concerns over possible evolution here.  Some responses inline,
> but if capabilities seem to meet the requirement at hand, then it does
> potentially seem the simplest mechanism.
>
>
> I think we also want to avoid relyance on server specific published
>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>> confusion around what the canonical spec is and make clients incompatible
>> if they're generated off of a non-standard spec document.
>
>
> Yeah, I wasn't proposing necessarily using built in functionality but a
> pre-scrubbed document.  Since there is no reference service implementation
> for REST it seems like each implementor would need to describe the best way
> of scrubbing there description.
>
>
>
>> @Micah this sounds to me as if the client would then have to parse a
>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>> view or dropping a table on the given REST server. Rather than having a
>> dedicated endpoint we're just using the */config* endpoint to provide
>> information about what a server supports.
>
>
> I was not suggesting multiple endpoints here, simply different contents
> for */config *I agree in the short term this does add complexity on the
> clients. But given that the canonical REST API clients are being developed
> into the standard library, I'm not sure how much toil this would cause in
> general. This also does not necessarily need to called up-front but could
> be called to verify existence vs a permission issue after an error was
> received.
>
> What round-trips did you have in mind here?
>
>
> All good points though, but I'm not aware of a standard way to handle this.
>
>
> IIUC, this sounds like a standard service description problem to me, the
> solution with capabilities appears to be one level abstraction on top of
> this.  Service discovery seems like it has been reimplemented a few
> different times depending on the technology [1][2][3]
>
>
> I think versioning adds another level of complexity, but might be
>> necessary since I expect these will evolve to some extent and may even
>> require hitting versioned urls.
>
>
> If there is no concrete proposal on versioning, I agree it probably pays
> to side step this.  The endpoint transitioning from list of strings to list
> of obj

Re: [DISCUSS] Describing REST Server capabilities

2024-06-24 Thread Micah Kornfield
I don't have strong opinions either way here, just thought it was worth
raising some concerns over possible evolution here.  Some responses inline,
but if capabilities seem to meet the requirement at hand, then it does
potentially seem the simplest mechanism.


I think we also want to avoid relyance on server specific published OpenAPI
> as they may leak other options/parameters/etc.  This may lead to confusion
> around what the canonical spec is and make clients incompatible if they're
> generated off of a non-standard spec document.


Yeah, I wasn't proposing necessarily using built in functionality but a
pre-scrubbed document.  Since there is no reference service implementation
for REST it seems like each implementor would need to describe the best way
of scrubbing there description.



> @Micah this sounds to me as if the client would then have to parse a bunch
> of endpoints to figure out whether it's safe to e.g. call loading a view or
> dropping a table on the given REST server. Rather than having a dedicated
> endpoint we're just using the */config* endpoint to provide information
> about what a server supports.


I was not suggesting multiple endpoints here, simply different contents
for */config *I agree in the short term this does add complexity on the
clients. But given that the canonical REST API clients are being developed
into the standard library, I'm not sure how much toil this would cause in
general. This also does not necessarily need to called up-front but could
be called to verify existence vs a permission issue after an error was
received.

What round-trips did you have in mind here?


All good points though, but I'm not aware of a standard way to handle this.


IIUC, this sounds like a standard service description problem to me, the
solution with capabilities appears to be one level abstraction on top of
this.  Service discovery seems like it has been reimplemented a few
different times depending on the technology [1][2][3]


I think versioning adds another level of complexity, but might be necessary
> since I expect these will evolve to some extent and may even require
> hitting versioned urls.


If there is no concrete proposal on versioning, I agree it probably pays to
side step this.  The endpoint transitioning from list of strings to list of
objects, would be an obvious sign to clients that they are out of date.  I
think serving a service description(s), despite its complexity, is likely
the most principled way of versioning items appropriately, but this
definitely requires more in depth thought/design.


Thanks,
Micah

[1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
[2] https://en.wikipedia.org/wiki/Web_Application_Description_Language
[3] https://developers.google.com/discovery/v1/reference/apis




On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks  wrote:

> Hey Micah,
>
> I think what we're trying to achieve is strike a balance between client
> complexity and ability to support multiple server-side capabilities.  One
> challenge we've run into is if a client performs an operation (e.g.
> listViews), but receives a 403 code, it's not clear whether the client
> doesn't have access or the server doesn't support an endpoint but isn't
> sending a 404 for security reasons.  This is a simple way for the client to
> understand what it should expect from the server.
>
> >  Another option would be just list all endpoints . . . and let clients
> take appropriate actions
> > This could be done by vending the OpenAPI spec the server supports at
> its own endpoint. I think this avoids the future problem of having to
> classify new endpoints into a specific capability.
>
> You're right that this would be the most complete way to handle this, but
> it's really complicated and may require additional "handshake" calls even
> for small interactions with the catalog service.  I think this puts a lot
> of onus on the client, when what we're describing is a set of endpoints
> that correspond to a capability.
>
> I think we also want to avoid relyance on server specific published
> OpenAPI as they may leak other options/parameters/etc.  This may lead to
> confusion around what the canonical spec is and make clients incompatible
> if they're generated off of a non-standard spec document.
>
> All good points though, but I'm not aware of a standard way to handle this.
>
> I think versioning adds another level of complexity, but might be
> necessary since I expect these will evolve to some extent and may even
> require hitting versioned urls.
>
> -Dan
>
>
>
>
> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> We had a separate discussion with Dan on the *oauth2* flag last week and
>> came to the same conclusion that removing the *oauth2* capability is
>> probably the best for now.
>> This is mainly because we can't really act on the *oauth2* capability
>> right now, because the */tokens* endpoint is called before we hit the
>> */config* endp

Re: [DISCUSS] Describing REST Server capabilities

2024-06-24 Thread Daniel Weeks
Hey Micah,

I think what we're trying to achieve is strike a balance between client
complexity and ability to support multiple server-side capabilities.  One
challenge we've run into is if a client performs an operation (e.g.
listViews), but receives a 403 code, it's not clear whether the client
doesn't have access or the server doesn't support an endpoint but isn't
sending a 404 for security reasons.  This is a simple way for the client to
understand what it should expect from the server.

>  Another option would be just list all endpoints . . . and let clients
take appropriate actions
> This could be done by vending the OpenAPI spec the server supports at its
own endpoint. I think this avoids the future problem of having to classify
new endpoints into a specific capability.

You're right that this would be the most complete way to handle this, but
it's really complicated and may require additional "handshake" calls even
for small interactions with the catalog service.  I think this puts a lot
of onus on the client, when what we're describing is a set of endpoints
that correspond to a capability.

I think we also want to avoid relyance on server specific published OpenAPI
as they may leak other options/parameters/etc.  This may lead to confusion
around what the canonical spec is and make clients incompatible if they're
generated off of a non-standard spec document.

All good points though, but I'm not aware of a standard way to handle this.

I think versioning adds another level of complexity, but might be necessary
since I expect these will evolve to some extent and may even require
hitting versioned urls.

-Dan




On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
[email protected]> wrote:

> We had a separate discussion with Dan on the *oauth2* flag last week and
> came to the same conclusion that removing the *oauth2* capability is
> probably the best for now.
> This is mainly because we can't really act on the *oauth2* capability
> right now, because the */tokens* endpoint is called before we hit the
> */config* endpoint.
>
> > Another option would be just list all endpoints (and maybe even further
> which operations are supported) the server actually supports and let
> clients take appropriate actions (i.e. grouping could happen on the client
> side).  This could be done by vending the OpenAPI spec the server supports
> at its own endpoint. I think this avoids the future problem of having to
> classify new endpoints into a specific capability.
>
> @Micah this sounds to me as if the client would then have to parse a bunch
> of endpoints to figure out whether it's safe to e.g. call loading a view or
> dropping a table on the given REST server. Rather than having a dedicated
> endpoint we're just using the */config* endpoint to provide information
> about what a server supports.
>
> Thanks
> Eduard
>
> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue 
> wrote:
>
>> Let's remove the oauth2 tag for now until we figure out how to move
>> forward there. That makes sense to me.
>>
>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
>>  wrote:
>>
>>> Hi Eduard,
>>>
>>> The capabilities PR looks good to me overall. I have a concern with the
>>> "oauth2" tag name though.
>>>
>>> I also commented [1] in GH but the comment appears to be closed by
>>> default :)
>>>
>>> I believe the term "oauth2" is confusing in this context with respect to
>>> RFC 6749 [2] as discussed in depth on another thread [3]
>>>
>>> The functionality behind the /tokens endpoint is quite specific to the
>>> Iceberg REST spec and as the other discussion highlights, there are
>>> concerns with respect to OAuth2 interoperability with other OAuth2 servers.
>>>
>>> What do you think about using a different tag name for it, for example
>>> "local-tokens" or "auth-tokens"?
>>>
>>> Thanks,
>>> Dmitri.
>>>
>>> [1]
>>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>>> [2] https://www.rfc-editor.org/rfc/rfc6749
>>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>>
>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
>>> [email protected]> wrote:
>>>
 Hey everyone,

 I'd like to bring up the discussion around describing REST server
 capabilities via the */config* endpoint.
 There is PR #9940  that
 describes the OpenAPI spec changes.

 Mainly we'd like to have a *capabilities* field in the *ConfigResponse* 
 that
 allows servers to indicate to clients which capabilities are being
 supported.

 So far we have the following capabilities:

- tables
- views
- remote-signing
- vended-credentials
- multi-table-commit
- register-table
- table-metrics
- oauth2


 The general idea behind a capability is that if e.g. a server supports
 *views*, then that server must implement all endp

Re: [DISCUSS] Describing REST Server capabilities

2024-06-24 Thread Eduard Tudenhöfner
We had a separate discussion with Dan on the *oauth2* flag last week and
came to the same conclusion that removing the *oauth2* capability is
probably the best for now.
This is mainly because we can't really act on the *oauth2* capability right
now, because the */tokens* endpoint is called before we hit the */config*
endpoint.

> Another option would be just list all endpoints (and maybe even further
which operations are supported) the server actually supports and let
clients take appropriate actions (i.e. grouping could happen on the client
side).  This could be done by vending the OpenAPI spec the server supports
at its own endpoint. I think this avoids the future problem of having to
classify new endpoints into a specific capability.

@Micah this sounds to me as if the client would then have to parse a bunch
of endpoints to figure out whether it's safe to e.g. call loading a view or
dropping a table on the given REST server. Rather than having a dedicated
endpoint we're just using the */config* endpoint to provide information
about what a server supports.

Thanks
Eduard

On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue 
wrote:

> Let's remove the oauth2 tag for now until we figure out how to move
> forward there. That makes sense to me.
>
> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
>  wrote:
>
>> Hi Eduard,
>>
>> The capabilities PR looks good to me overall. I have a concern with the
>> "oauth2" tag name though.
>>
>> I also commented [1] in GH but the comment appears to be closed by
>> default :)
>>
>> I believe the term "oauth2" is confusing in this context with respect to
>> RFC 6749 [2] as discussed in depth on another thread [3]
>>
>> The functionality behind the /tokens endpoint is quite specific to the
>> Iceberg REST spec and as the other discussion highlights, there are
>> concerns with respect to OAuth2 interoperability with other OAuth2 servers.
>>
>> What do you think about using a different tag name for it, for example
>> "local-tokens" or "auth-tokens"?
>>
>> Thanks,
>> Dmitri.
>>
>> [1]
>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>> [2] https://www.rfc-editor.org/rfc/rfc6749
>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>
>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I'd like to bring up the discussion around describing REST server
>>> capabilities via the */config* endpoint.
>>> There is PR #9940  that
>>> describes the OpenAPI spec changes.
>>>
>>> Mainly we'd like to have a *capabilities* field in the *ConfigResponse* that
>>> allows servers to indicate to clients which capabilities are being
>>> supported.
>>>
>>> So far we have the following capabilities:
>>>
>>>- tables
>>>- views
>>>- remote-signing
>>>- vended-credentials
>>>- multi-table-commit
>>>- register-table
>>>- table-metrics
>>>- oauth2
>>>
>>>
>>> The general idea behind a capability is that if e.g. a server supports
>>> *views*, then that server must implement all endpoints grouped under
>>> that capability.
>>> It's worth noting that the */config* endpoint is currently being
>>> implicit (meaning that every REST server would have to implement it).
>>>
>>> One discussion point that came up during review is how we want to handle
>>> capabilities and backwards compatibility and what the default capability
>>> would be, since older servers don't know anything about *capabilities* (in
>>> such a case we could assume that the default capabilities would be
>>> *oauth2* / *tables*).
>>>
>>> Are there any other capabilities that we'd like to include in the list?
>>>
>>> Eduard
>>>
>>
>
> --
> Ryan Blue
> Databricks
>


Re: [DISCUSS] Describing REST Server capabilities

2024-06-21 Thread Ryan Blue
Let's remove the oauth2 tag for now until we figure out how to move forward
there. That makes sense to me.

On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
 wrote:

> Hi Eduard,
>
> The capabilities PR looks good to me overall. I have a concern with the
> "oauth2" tag name though.
>
> I also commented [1] in GH but the comment appears to be closed by default
> :)
>
> I believe the term "oauth2" is confusing in this context with respect to
> RFC 6749 [2] as discussed in depth on another thread [3]
>
> The functionality behind the /tokens endpoint is quite specific to the
> Iceberg REST spec and as the other discussion highlights, there are
> concerns with respect to OAuth2 interoperability with other OAuth2 servers.
>
> What do you think about using a different tag name for it, for example
> "local-tokens" or "auth-tokens"?
>
> Thanks,
> Dmitri.
>
> [1]
> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
> [2] https://www.rfc-editor.org/rfc/rfc6749
> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>
> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
> [email protected]> wrote:
>
>> Hey everyone,
>>
>> I'd like to bring up the discussion around describing REST server
>> capabilities via the */config* endpoint.
>> There is PR #9940  that
>> describes the OpenAPI spec changes.
>>
>> Mainly we'd like to have a *capabilities* field in the *ConfigResponse* that
>> allows servers to indicate to clients which capabilities are being
>> supported.
>>
>> So far we have the following capabilities:
>>
>>- tables
>>- views
>>- remote-signing
>>- vended-credentials
>>- multi-table-commit
>>- register-table
>>- table-metrics
>>- oauth2
>>
>>
>> The general idea behind a capability is that if e.g. a server supports
>> *views*, then that server must implement all endpoints grouped under
>> that capability.
>> It's worth noting that the */config* endpoint is currently being
>> implicit (meaning that every REST server would have to implement it).
>>
>> One discussion point that came up during review is how we want to handle
>> capabilities and backwards compatibility and what the default capability
>> would be, since older servers don't know anything about *capabilities* (in
>> such a case we could assume that the default capabilities would be
>> *oauth2* / *tables*).
>>
>> Are there any other capabilities that we'd like to include in the list?
>>
>> Eduard
>>
>

-- 
Ryan Blue
Databricks


Re: [DISCUSS] Describing REST Server capabilities

2024-06-21 Thread Dmitri Bourlatchkov
Hi Eduard,

The capabilities PR looks good to me overall. I have a concern with the
"oauth2" tag name though.

I also commented [1] in GH but the comment appears to be closed by default
:)

I believe the term "oauth2" is confusing in this context with respect to
RFC 6749 [2] as discussed in depth on another thread [3]

The functionality behind the /tokens endpoint is quite specific to the
Iceberg REST spec and as the other discussion highlights, there are
concerns with respect to OAuth2 interoperability with other OAuth2 servers.

What do you think about using a different tag name for it, for example
"local-tokens" or "auth-tokens"?

Thanks,
Dmitri.

[1]
https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
[2] https://www.rfc-editor.org/rfc/rfc6749
[3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50

On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
[email protected]> wrote:

> Hey everyone,
>
> I'd like to bring up the discussion around describing REST server
> capabilities via the */config* endpoint.
> There is PR #9940  that
> describes the OpenAPI spec changes.
>
> Mainly we'd like to have a *capabilities* field in the *ConfigResponse* that
> allows servers to indicate to clients which capabilities are being
> supported.
>
> So far we have the following capabilities:
>
>- tables
>- views
>- remote-signing
>- vended-credentials
>- multi-table-commit
>- register-table
>- table-metrics
>- oauth2
>
>
> The general idea behind a capability is that if e.g. a server supports
> *views*, then that server must implement all endpoints grouped under that
> capability.
> It's worth noting that the */config* endpoint is currently being implicit
> (meaning that every REST server would have to implement it).
>
> One discussion point that came up during review is how we want to handle
> capabilities and backwards compatibility and what the default capability
> would be, since older servers don't know anything about *capabilities* (in
> such a case we could assume that the default capabilities would be
> *oauth2* / *tables*).
>
> Are there any other capabilities that we'd like to include in the list?
>
> Eduard
>


Re: [DISCUSS] Describing REST Server capabilities

2024-06-21 Thread Jean-Baptiste Onofré
Hi Ryan,

I think I wasn't clear (sorry about that): by
"catalog-level-versioning" capability, I don't mean to actually define
any specific version, it's more to indicate to the client how the
catalog behaving in terms of versioning (per table/views or global to
catalog). It's a "pure" capability informing the client about catalog
behavior.

Regards
JB

On Fri, Jun 21, 2024 at 1:44 AM Ryan Blue  wrote:
>
> I think the capabilities proposal is intended to let people build in a 
> different way than a versioning system would. It's probably valuable to think 
> through the differences between the approaches.
>
> The capabilities that are proposed let catalogs declare sets of features that 
> are supported, like support for tables, views, server-side planning, etc. 
> While you can think of that as a sort of versioning, the intent is to be more 
> flexible. A catalog implementation might not choose to implement server-side 
> planning because it doesn't have access to the underlying metadata files. For 
> example see #10089 for a use case where object store permissions aren't what 
> you might expect. Capabilities allow the service to tell the client what is 
> supported for graceful fallback, not just backward compatibility.
>
> Versioning the API is a different approach because to support the latest 
> version, an implementation would need to support everything. If we had tables 
> in v1, views in v2, and server-side planning in v3, then to support 
> server-side planning a catalog would also need to support views. That's not 
> necessarily a bad thing since it creates strong compatibility requirements; 
> that's what we chose to do for the table format.
>
> I think the question to help us choose between the two options is whether we 
> expect catalogs to all support the same set of features over time, or if we 
> expect some differences. I have a weakly-held opinion that we expect catalogs 
> not to support the same features, but it is very likely based on the 
> assumption that catalogs have significant differences because of limited 
> back-ends (like Hive). That may not be correct since there are quite a few 
> new catalog implementations using the protocol. Perhaps we should consider 
> stronger requirements for what needs to be provided through versioning.
>
> Ryan
>
> On Thu, Jun 20, 2024 at 7:15 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Eduard,
>>
>> That makes sense. Thanks.
>>
>> Maybe we can already anticipate a little and add a "catalog-level
>> versioning" capability as it's a feature supported by Nessie catalog
>> for instance ?
>> We can also imagine a more generic capability like "version scope".
>>
>> Regards
>> JB
>>
>> On Thu, Jun 20, 2024 at 3:47 PM Eduard Tudenhoefner
>>  wrote:
>> >
>> > Hey JB,
>> >
>> > If adding UDFs would require adding new endpoints, then you'd also add a 
>> > udf capability when adding UDF support to the REST catalog.
>> > That way a client knows whether it's safe to call the UDF endpoints on a 
>> > given server.
>> >
>> > Eduard
>> >
>> > On Thu, Jun 20, 2024 at 1:59 PM Jean-Baptiste Onofré  
>> > wrote:
>> >>
>> >> Hi Eduard
>> >>
>> >> It looks good to me. I have a question however :)
>> >>
>> >> Later, Imagine, we add UDF support in Iceberg. Does it mean that you
>> >> will need to update REST Spec (ConfigResponse/capabilities) to add
>> >> this capability ?
>> >> For consistency, I think it makes sense as I don't think we often add
>> >> new capability. And also as every REST server would have to implement
>> >> it, /config is generic enough to add custom/new capabilities (but the
>> >> client will have to deal with capability).
>> >>
>> >> Am I right?
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
>> >>  wrote:
>> >> >
>> >> > Hey everyone,
>> >> >
>> >> > I'd like to bring up the discussion around describing REST server 
>> >> > capabilities via the /config endpoint.
>> >> > There is PR #9940 that describes the OpenAPI spec changes.
>> >> >
>> >> > Mainly we'd like to have a capabilities field in the ConfigResponse 
>> >> > that allows servers to indicate to clients which capabilities are being 
>> >> > supported.
>> >> >
>> >> > So far we have the following capabilities:
>> >> >
>> >> > tables
>> >> > views
>> >> > remote-signing
>> >> > vended-credentials
>> >> > multi-table-commit
>> >> > register-table
>> >> > table-metrics
>> >> > oauth2
>> >> >
>> >> >
>> >> > The general idea behind a capability is that if e.g. a server supports 
>> >> > views, then that server must implement all endpoints grouped under that 
>> >> > capability.
>> >> > It's worth noting that the /config endpoint is currently being implicit 
>> >> > (meaning that every REST server would have to implement it).
>> >> >
>> >> > One discussion point that came up during review is how we want to 
>> >> > handle capabilities and backwards compatibility and what the default 
>> >> > capability would be, since older servers don'

Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Micah Kornfield
>
> The general idea behind a capability is that if e.g. a server supports
> *views*, then that server must implement all endpoints grouped under that
> capability.


I haven't thought deeply about this, but is there a reason to be
prescriptive about this by grouping endpoints in capabilities?  Another
option would be just list all endpoints (and maybe even further which
operations are supported) the server actually supports and let clients take
appropriate actions (i.e. grouping could happen on the client side).  This
could be done by vending the OpenAPI spec the server supports at its own
endpoint. I think this avoids the future problem of having to classify new
endpoints into a specific capability.

On the versioning aspects I'm not sure if this is what J.B. meant but
another way to model this could be as a list of objects where each object
is {"capability": "version (or other metadata relevant to the capability")}.

Thanks,
Micah

On Thu, Jun 20, 2024 at 4:45 PM Ryan Blue 
wrote:

> I think the capabilities proposal is intended to let people build in a
> different way than a versioning system would. It's probably valuable to
> think through the differences between the approaches.
>
> The capabilities that are proposed let catalogs declare sets of features
> that are supported, like support for tables, views, server-side planning,
> etc. While you can think of that as a sort of versioning, the intent is to
> be more flexible. A catalog implementation might not choose to implement
> server-side planning because it doesn't have access to the underlying
> metadata files. For example see #10089
>  for a use case where
> object store permissions aren't what you might expect. Capabilities allow
> the service to tell the client what is supported for graceful fallback, not
> just backward compatibility.
>
> Versioning the API is a different approach because to support the latest
> version, an implementation would need to support everything. If we had
> tables in v1, views in v2, and server-side planning in v3, then to support
> server-side planning a catalog would also need to support views. That's not
> necessarily a bad thing since it creates strong compatibility requirements;
> that's what we chose to do for the table format.
>
> I think the question to help us choose between the two options is whether
> we expect catalogs to all support the same set of features over time, or if
> we expect some differences. I have a weakly-held opinion that we expect
> catalogs not to support the same features, but it is very likely based on
> the assumption that catalogs have significant differences because of
> limited back-ends (like Hive). That may not be correct since there are
> quite a few new catalog implementations using the protocol. Perhaps we
> should consider stronger requirements for what needs to be provided through
> versioning.
>
> Ryan
>
> On Thu, Jun 20, 2024 at 7:15 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Eduard,
>>
>> That makes sense. Thanks.
>>
>> Maybe we can already anticipate a little and add a "catalog-level
>> versioning" capability as it's a feature supported by Nessie catalog
>> for instance ?
>> We can also imagine a more generic capability like "version scope".
>>
>> Regards
>> JB
>>
>> On Thu, Jun 20, 2024 at 3:47 PM Eduard Tudenhoefner
>>  wrote:
>> >
>> > Hey JB,
>> >
>> > If adding UDFs would require adding new endpoints, then you'd also add
>> a udf capability when adding UDF support to the REST catalog.
>> > That way a client knows whether it's safe to call the UDF endpoints on
>> a given server.
>> >
>> > Eduard
>> >
>> > On Thu, Jun 20, 2024 at 1:59 PM Jean-Baptiste Onofré 
>> wrote:
>> >>
>> >> Hi Eduard
>> >>
>> >> It looks good to me. I have a question however :)
>> >>
>> >> Later, Imagine, we add UDF support in Iceberg. Does it mean that you
>> >> will need to update REST Spec (ConfigResponse/capabilities) to add
>> >> this capability ?
>> >> For consistency, I think it makes sense as I don't think we often add
>> >> new capability. And also as every REST server would have to implement
>> >> it, /config is generic enough to add custom/new capabilities (but the
>> >> client will have to deal with capability).
>> >>
>> >> Am I right?
>> >>
>> >> Thanks !
>> >> Regards
>> >> JB
>> >>
>> >> On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
>> >>  wrote:
>> >> >
>> >> > Hey everyone,
>> >> >
>> >> > I'd like to bring up the discussion around describing REST server
>> capabilities via the /config endpoint.
>> >> > There is PR #9940 that describes the OpenAPI spec changes.
>> >> >
>> >> > Mainly we'd like to have a capabilities field in the ConfigResponse
>> that allows servers to indicate to clients which capabilities are being
>> supported.
>> >> >
>> >> > So far we have the following capabilities:
>> >> >
>> >> > tables
>> >> > views
>> >> > remote-signing
>> >> > vended-credentials
>> >> > multi-table-commit
>> >> > register-table
>

Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Ryan Blue
I think the capabilities proposal is intended to let people build in a
different way than a versioning system would. It's probably valuable to
think through the differences between the approaches.

The capabilities that are proposed let catalogs declare sets of features
that are supported, like support for tables, views, server-side planning,
etc. While you can think of that as a sort of versioning, the intent is to
be more flexible. A catalog implementation might not choose to implement
server-side planning because it doesn't have access to the underlying
metadata files. For example see #10089
 for a use case where
object store permissions aren't what you might expect. Capabilities allow
the service to tell the client what is supported for graceful fallback, not
just backward compatibility.

Versioning the API is a different approach because to support the latest
version, an implementation would need to support everything. If we had
tables in v1, views in v2, and server-side planning in v3, then to support
server-side planning a catalog would also need to support views. That's not
necessarily a bad thing since it creates strong compatibility requirements;
that's what we chose to do for the table format.

I think the question to help us choose between the two options is whether
we expect catalogs to all support the same set of features over time, or if
we expect some differences. I have a weakly-held opinion that we expect
catalogs not to support the same features, but it is very likely based on
the assumption that catalogs have significant differences because of
limited back-ends (like Hive). That may not be correct since there are
quite a few new catalog implementations using the protocol. Perhaps we
should consider stronger requirements for what needs to be provided through
versioning.

Ryan

On Thu, Jun 20, 2024 at 7:15 AM Jean-Baptiste Onofré 
wrote:

> Hi Eduard,
>
> That makes sense. Thanks.
>
> Maybe we can already anticipate a little and add a "catalog-level
> versioning" capability as it's a feature supported by Nessie catalog
> for instance ?
> We can also imagine a more generic capability like "version scope".
>
> Regards
> JB
>
> On Thu, Jun 20, 2024 at 3:47 PM Eduard Tudenhoefner
>  wrote:
> >
> > Hey JB,
> >
> > If adding UDFs would require adding new endpoints, then you'd also add a
> udf capability when adding UDF support to the REST catalog.
> > That way a client knows whether it's safe to call the UDF endpoints on a
> given server.
> >
> > Eduard
> >
> > On Thu, Jun 20, 2024 at 1:59 PM Jean-Baptiste Onofré 
> wrote:
> >>
> >> Hi Eduard
> >>
> >> It looks good to me. I have a question however :)
> >>
> >> Later, Imagine, we add UDF support in Iceberg. Does it mean that you
> >> will need to update REST Spec (ConfigResponse/capabilities) to add
> >> this capability ?
> >> For consistency, I think it makes sense as I don't think we often add
> >> new capability. And also as every REST server would have to implement
> >> it, /config is generic enough to add custom/new capabilities (but the
> >> client will have to deal with capability).
> >>
> >> Am I right?
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >> On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
> >>  wrote:
> >> >
> >> > Hey everyone,
> >> >
> >> > I'd like to bring up the discussion around describing REST server
> capabilities via the /config endpoint.
> >> > There is PR #9940 that describes the OpenAPI spec changes.
> >> >
> >> > Mainly we'd like to have a capabilities field in the ConfigResponse
> that allows servers to indicate to clients which capabilities are being
> supported.
> >> >
> >> > So far we have the following capabilities:
> >> >
> >> > tables
> >> > views
> >> > remote-signing
> >> > vended-credentials
> >> > multi-table-commit
> >> > register-table
> >> > table-metrics
> >> > oauth2
> >> >
> >> >
> >> > The general idea behind a capability is that if e.g. a server
> supports views, then that server must implement all endpoints grouped under
> that capability.
> >> > It's worth noting that the /config endpoint is currently being
> implicit (meaning that every REST server would have to implement it).
> >> >
> >> > One discussion point that came up during review is how we want to
> handle capabilities and backwards compatibility and what the default
> capability would be, since older servers don't know anything about
> capabilities (in such a case we could assume that the default capabilities
> would be oauth2 / tables).
> >> >
> >> > Are there any other capabilities that we'd like to include in the
> list?
> >> >
> >> > Eduard
>


-- 
Ryan Blue
Databricks


Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Jean-Baptiste Onofré
Hi Eduard,

That makes sense. Thanks.

Maybe we can already anticipate a little and add a "catalog-level
versioning" capability as it's a feature supported by Nessie catalog
for instance ?
We can also imagine a more generic capability like "version scope".

Regards
JB

On Thu, Jun 20, 2024 at 3:47 PM Eduard Tudenhoefner
 wrote:
>
> Hey JB,
>
> If adding UDFs would require adding new endpoints, then you'd also add a udf 
> capability when adding UDF support to the REST catalog.
> That way a client knows whether it's safe to call the UDF endpoints on a 
> given server.
>
> Eduard
>
> On Thu, Jun 20, 2024 at 1:59 PM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Eduard
>>
>> It looks good to me. I have a question however :)
>>
>> Later, Imagine, we add UDF support in Iceberg. Does it mean that you
>> will need to update REST Spec (ConfigResponse/capabilities) to add
>> this capability ?
>> For consistency, I think it makes sense as I don't think we often add
>> new capability. And also as every REST server would have to implement
>> it, /config is generic enough to add custom/new capabilities (but the
>> client will have to deal with capability).
>>
>> Am I right?
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
>>  wrote:
>> >
>> > Hey everyone,
>> >
>> > I'd like to bring up the discussion around describing REST server 
>> > capabilities via the /config endpoint.
>> > There is PR #9940 that describes the OpenAPI spec changes.
>> >
>> > Mainly we'd like to have a capabilities field in the ConfigResponse that 
>> > allows servers to indicate to clients which capabilities are being 
>> > supported.
>> >
>> > So far we have the following capabilities:
>> >
>> > tables
>> > views
>> > remote-signing
>> > vended-credentials
>> > multi-table-commit
>> > register-table
>> > table-metrics
>> > oauth2
>> >
>> >
>> > The general idea behind a capability is that if e.g. a server supports 
>> > views, then that server must implement all endpoints grouped under that 
>> > capability.
>> > It's worth noting that the /config endpoint is currently being implicit 
>> > (meaning that every REST server would have to implement it).
>> >
>> > One discussion point that came up during review is how we want to handle 
>> > capabilities and backwards compatibility and what the default capability 
>> > would be, since older servers don't know anything about capabilities (in 
>> > such a case we could assume that the default capabilities would be oauth2 
>> > / tables).
>> >
>> > Are there any other capabilities that we'd like to include in the list?
>> >
>> > Eduard


Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Eduard Tudenhoefner
Hey JB,

If adding UDFs would require adding new endpoints, then you'd also add a *udf
*capability when adding UDF support to the REST catalog.
That way a client knows whether it's safe to call the UDF endpoints on a
given server.

Eduard

On Thu, Jun 20, 2024 at 1:59 PM Jean-Baptiste Onofré 
wrote:

> Hi Eduard
>
> It looks good to me. I have a question however :)
>
> Later, Imagine, we add UDF support in Iceberg. Does it mean that you
> will need to update REST Spec (ConfigResponse/capabilities) to add
> this capability ?
> For consistency, I think it makes sense as I don't think we often add
> new capability. And also as every REST server would have to implement
> it, /config is generic enough to add custom/new capabilities (but the
> client will have to deal with capability).
>
> Am I right?
>
> Thanks !
> Regards
> JB
>
> On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
>  wrote:
> >
> > Hey everyone,
> >
> > I'd like to bring up the discussion around describing REST server
> capabilities via the /config endpoint.
> > There is PR #9940 that describes the OpenAPI spec changes.
> >
> > Mainly we'd like to have a capabilities field in the ConfigResponse that
> allows servers to indicate to clients which capabilities are being
> supported.
> >
> > So far we have the following capabilities:
> >
> > tables
> > views
> > remote-signing
> > vended-credentials
> > multi-table-commit
> > register-table
> > table-metrics
> > oauth2
> >
> >
> > The general idea behind a capability is that if e.g. a server supports
> views, then that server must implement all endpoints grouped under that
> capability.
> > It's worth noting that the /config endpoint is currently being implicit
> (meaning that every REST server would have to implement it).
> >
> > One discussion point that came up during review is how we want to handle
> capabilities and backwards compatibility and what the default capability
> would be, since older servers don't know anything about capabilities (in
> such a case we could assume that the default capabilities would be oauth2 /
> tables).
> >
> > Are there any other capabilities that we'd like to include in the list?
> >
> > Eduard
>


Re: [DISCUSS] Describing REST Server capabilities

2024-06-20 Thread Jean-Baptiste Onofré
Hi Eduard

It looks good to me. I have a question however :)

Later, Imagine, we add UDF support in Iceberg. Does it mean that you
will need to update REST Spec (ConfigResponse/capabilities) to add
this capability ?
For consistency, I think it makes sense as I don't think we often add
new capability. And also as every REST server would have to implement
it, /config is generic enough to add custom/new capabilities (but the
client will have to deal with capability).

Am I right?

Thanks !
Regards
JB

On Thu, Jun 20, 2024 at 1:28 PM Eduard Tudenhoefner
 wrote:
>
> Hey everyone,
>
> I'd like to bring up the discussion around describing REST server 
> capabilities via the /config endpoint.
> There is PR #9940 that describes the OpenAPI spec changes.
>
> Mainly we'd like to have a capabilities field in the ConfigResponse that 
> allows servers to indicate to clients which capabilities are being supported.
>
> So far we have the following capabilities:
>
> tables
> views
> remote-signing
> vended-credentials
> multi-table-commit
> register-table
> table-metrics
> oauth2
>
>
> The general idea behind a capability is that if e.g. a server supports views, 
> then that server must implement all endpoints grouped under that capability.
> It's worth noting that the /config endpoint is currently being implicit 
> (meaning that every REST server would have to implement it).
>
> One discussion point that came up during review is how we want to handle 
> capabilities and backwards compatibility and what the default capability 
> would be, since older servers don't know anything about capabilities (in such 
> a case we could assume that the default capabilities would be oauth2 / 
> tables).
>
> Are there any other capabilities that we'd like to include in the list?
>
> Eduard