Re: Inconsistency between REST spec and table/view spec

2024-03-01 Thread Dmitri Bourlatchkov
I'd like to make a slightly different point regarding metadata files.

Currently the table spec does require that metadata be stored in a "file",
however there is no way to discover that file outside of a Catalog. In
other words, a client that is operating purely at the file level has no way
of determining what is the "current" metadata file and whether some
metadata file that is visible represents a valid (committed) state of the
table.

The REST Catalog API returns actual table metadata JSON, not a file
location of it.

I think / agree it would be worth detaching Iceberg table format concepts
from the storage details. Perhaps we could consider the concept of
exporting a Catalog to a set of files, which would then specify how files
are layed out (cross-referenced). In runtime the Catalog may choose to
export every change or do it periodically or on user request, etc.

Cheers,
Dmitri.

On Thu, Feb 29, 2024 at 6:39 PM Jack Ye  wrote:

> > For example, I cannot validate the atomic behaviors Glue claims, but I
> wouldn't assert that it is non-compliant because of that.
>
> I think these are not comparable claims because the API scope is
> completely different, but I don't think it's worth arguing in depth. Let's
> try to see if we can have some consensus.
>
> Based on what you said above, do you agree with the following 3 points?
>
> 1. Today, a table/view in any catalog including a REST spec-compatible
> catalog is an Iceberg table/view if and only if it points to a JSON
> metadata file in storage. This concept is a part of the Iceberg table/view
> spec. There is a debate to be had for if we want to remove this requirement
> or not. The argument for it (as Yufei said) is to use other storage for
> better performance. The argument against it (as Amogh said) is to keep
> Iceberg open source friendly through the JSON format.
>
> 2. Today, a table/view in any catalog including a REST spec-compatible
> catalog is an Iceberg table/view if and only if it behind the scene
> performs the atomic metadata file swap for every commit. This concept is a
> part of the Iceberg table/view spec. We should consider removing this
> requirement in the Iceberg table/view spec.
>
> 3. A table/view in an Iceberg REST spec-compatible catalog may or may not
> be an Iceberg table/view. The REST spec does not enforce this, and this
> stance will remain true going forward. For example, it could use the
> Iceberg table/view metadata structure but does not store the metadata in
> JSON file, or not use the metadata file swap commit procedure, or both, and
> in those cases it is not an Iceberg table/view. More extremely, it might be
> a totally different kind of table that is only surfaced through the REST
> models.
>
> -Jack
>
> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks 
> wrote:
>
>> > In that case are tables in a REST-compliant catalog still an Iceberg
>> table? I don't think so, because it is a table that only partially follows
>> the Iceberg table spec.
>>
>> If the catalog is REST compliant and complies with the Iceberg spec, they
>> are still Iceberg tables.  I can see there is an argument that if the
>> catalog is REST compliant but does not follow the commit requirements (or
>> aspects of the Iceberg spec), that you cannot call those Iceberg tables.
>> But the assertion that Iceberg tables in a REST catalog are de facto
>> non-compliant is incorrect.
>>
>> > I like the idea about validation for format compliance. But don't think
>> you can technically validate this. You can validate the static table to see
>> if it has all the Iceberg metadata components, but you can not validate the
>> internal behavior of the service during a commit to see if it really
>> atomically swapped a metadata file.
>>
>> Just because you cannot see/validate the implementation doesn't mean that
>> it is non-compliant.  For example, I cannot validate the atomic behaviors
>> Glue claims, but I wouldn't assert that it is non-compliant because of that.
>>
>> I do think there is a discussion to be had about if/when we might adjust
>> the storage/swap requirements, but to reinforce Amogh's point, removing
>> those requirements would impact the openness and accessibility of Iceberg,
>> which I feel would hamper adoption.
>>
>> -Dan
>>
>>
>>
>> On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu  wrote:
>>
>>> We've periodically discussed removing the storage requirement and I
 think there's a path forward to do that and would agree that standardizing
 on REST, but I wouldn't say the justification for making this push is that
 REST is not compliant so we can just ignore the table spec requirements.
 There are a few more things to consider, which is that not everything
 can use REST currently and making a hard cut away from file based metadata
 could bifurcate access to Iceberg data.  There are also aspects to the spec
 that reference the metadata paths (like metadata log, though it's
 optional), but would likely need to be addressed.
>>>
>

Re: Inconsistency between REST spec and table/view spec

2024-03-01 Thread Yufei Gu
>
> We may also choose to relax the requirement for metadata files in the
> future — I see support for the idea and have considered proposing it also.
> But for now, it’s a requirement, even if you don’t have to send the
> location to the client (though note that the client has a hard dependency
> on it!).


Curious to know where a client has a hard dependency on the metadata file.
One of core ideas of REST catalogs is to decoupling metadata file handling
from clients. This means clients might not always get to peek at that file,
sometimes due to permission boundary.

Yufei


On Thu, Feb 29, 2024 at 4:35 PM Ryan Blue  wrote:

> I did not notice the difference between table and view. Should we change
> that for tables then?
>
> It depends on what we consider a breaking change at this point. Plus, we
> may want it to be optional in the future.
>
> My main point, though, is that I wouldn’t read too much into it being
> optional. I think we all have the same expectations for REST services today
> — that they need to follow both the Iceberg table spec and the REST spec. I
> would treat findings like this as an opportunity to make the specs more
> clear about requirements.
>
> Ryan
>
> On Thu, Feb 29, 2024 at 4:28 PM Jack Ye  wrote:
>
>> > I feel like the goal is to identify those cases and steer them back
>> into compliance with the spec
>>
>> +10
>>
>> > as opposed to immediately claiming they're something entirely different
>>
>> In case this comment is talking about my last sentence "More extremely,
>> it might be a totally different kind of table that is only surfaced through
>> the REST models." I don't mean the Iceberg tables/view in a REST compatible
>> catalog are entirely different. I mean there is another more extreme use
>> case, where the REST catalog can surface other non-Iceberg tables (e.g.
>> Hive Parquet tables) through the same REST model, which is a use case I
>> mentioned previously that is an interesting application of REST that we see
>> some users are interested in. Which is also why the metadata location is
>> important to guide those use cases.
>>
>> > when we added the endpoint to load a VIEW, metadata-location was
>> correctly marked as required
>>
>> hmmm interesting, you are right, I did not notice the difference between
>> table and view. Should we change that for tables then?
>>
>> -Jack
>>
>> On Thu, Feb 29, 2024 at 4:20 PM Ryan Blue  wrote:
>>
>>> Oops. In the first paragraph, I meant “when we added the endpoint to
>>> load a VIEW, metadata-location was correctly marked as required."
>>>
>>> On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue  wrote:
>>>
 Once again, I’m catching up late and might have a helpful perspective.

 I think there was a mistake in the OpenAPI spec for loading tables and
 the metadata-location is not listed as required. I don’t recall that
 being intentional, but maybe it was? Maybe for a different reason? Either
 way, when we added the endpoint to load a catalog, metadata-location
 was correctly marked as required.

 Whatever the reason for the field being optional, *the intent was
 never to change requirements from Iceberg* that metadata is written to
 files and atomic operations guarantee a linear history.

 I’m glad to clear up the confusion on that. Right now, *catalogs must
 write metadata files for Iceberg tables and should guarantee a linear
 history*.

 You may be able to get away with bending those rules (what Dan refers
 to as not compliant), but that’s unintentional. We may also choose to relax
 the requirement for metadata files in the future — I see support for the
 idea and have considered proposing it also. But for now, it’s a
 requirement, even if you don’t have to send the location to the client
 (though note that the client has a hard dependency on it!).

 Ryan

 On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks 
 wrote:

> 1. I agree, this is what the spec currently requires
>
> 2. I agree, it's up for consideration
>
> 3. I agree, I think if an implementation didn't adhere to the current
> spec requirements, I would say it's out of spec (not sure I'd go as far as
> to say it's a different kind of table entirely).
>
> Just to expand on #3, we will find lots of cases where implementations
> deviate (likely unintentionally) from the rest/table spec and I feel like
> the goal is to identify those cases and steer them back into compliance
> with the spec as opposed to immediately claiming they're something 
> entirely
> different.  The overarching goal is to improve openness and
> interoperability.
>
> My main point is that there isn't an inherent incompatibility between
> the REST spec and the Iceberg spec.  The preservation of the storage
> representation was discussed and intentional during the design/development
> of the REST spec.
>
> -Dan
>
>

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
I did not notice the difference between table and view. Should we change
that for tables then?

It depends on what we consider a breaking change at this point. Plus, we
may want it to be optional in the future.

My main point, though, is that I wouldn’t read too much into it being
optional. I think we all have the same expectations for REST services today
— that they need to follow both the Iceberg table spec and the REST spec. I
would treat findings like this as an opportunity to make the specs more
clear about requirements.

Ryan

On Thu, Feb 29, 2024 at 4:28 PM Jack Ye  wrote:

> > I feel like the goal is to identify those cases and steer them back into
> compliance with the spec
>
> +10
>
> > as opposed to immediately claiming they're something entirely different
>
> In case this comment is talking about my last sentence "More extremely, it
> might be a totally different kind of table that is only surfaced through
> the REST models." I don't mean the Iceberg tables/view in a REST compatible
> catalog are entirely different. I mean there is another more extreme use
> case, where the REST catalog can surface other non-Iceberg tables (e.g.
> Hive Parquet tables) through the same REST model, which is a use case I
> mentioned previously that is an interesting application of REST that we see
> some users are interested in. Which is also why the metadata location is
> important to guide those use cases.
>
> > when we added the endpoint to load a VIEW, metadata-location was
> correctly marked as required
>
> hmmm interesting, you are right, I did not notice the difference between
> table and view. Should we change that for tables then?
>
> -Jack
>
> On Thu, Feb 29, 2024 at 4:20 PM Ryan Blue  wrote:
>
>> Oops. In the first paragraph, I meant “when we added the endpoint to load
>> a VIEW, metadata-location was correctly marked as required."
>>
>> On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue  wrote:
>>
>>> Once again, I’m catching up late and might have a helpful perspective.
>>>
>>> I think there was a mistake in the OpenAPI spec for loading tables and
>>> the metadata-location is not listed as required. I don’t recall that
>>> being intentional, but maybe it was? Maybe for a different reason? Either
>>> way, when we added the endpoint to load a catalog, metadata-location
>>> was correctly marked as required.
>>>
>>> Whatever the reason for the field being optional, *the intent was never
>>> to change requirements from Iceberg* that metadata is written to files
>>> and atomic operations guarantee a linear history.
>>>
>>> I’m glad to clear up the confusion on that. Right now, *catalogs must
>>> write metadata files for Iceberg tables and should guarantee a linear
>>> history*.
>>>
>>> You may be able to get away with bending those rules (what Dan refers to
>>> as not compliant), but that’s unintentional. We may also choose to relax
>>> the requirement for metadata files in the future — I see support for the
>>> idea and have considered proposing it also. But for now, it’s a
>>> requirement, even if you don’t have to send the location to the client
>>> (though note that the client has a hard dependency on it!).
>>>
>>> Ryan
>>>
>>> On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks 
>>> wrote:
>>>
 1. I agree, this is what the spec currently requires

 2. I agree, it's up for consideration

 3. I agree, I think if an implementation didn't adhere to the current
 spec requirements, I would say it's out of spec (not sure I'd go as far as
 to say it's a different kind of table entirely).

 Just to expand on #3, we will find lots of cases where implementations
 deviate (likely unintentionally) from the rest/table spec and I feel like
 the goal is to identify those cases and steer them back into compliance
 with the spec as opposed to immediately claiming they're something entirely
 different.  The overarching goal is to improve openness and
 interoperability.

 My main point is that there isn't an inherent incompatibility between
 the REST spec and the Iceberg spec.  The preservation of the storage
 representation was discussed and intentional during the design/development
 of the REST spec.

 -Dan


 On Thu, Feb 29, 2024 at 3:40 PM Jack Ye  wrote:

> > For example, I cannot validate the atomic behaviors Glue claims, but
> I wouldn't assert that it is non-compliant because of that.
>
> I think these are not comparable claims because the API scope is
> completely different, but I don't think it's worth arguing in depth. Let's
> try to see if we can have some consensus.
>
> Based on what you said above, do you agree with the following 3 points?
>
> 1. Today, a table/view in any catalog including a REST spec-compatible
> catalog is an Iceberg table/view if and only if it points to a JSON
> metadata file in storage. This concept is a part of the Iceberg table/view
> spec. There

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> I feel like the goal is to identify those cases and steer them back into
compliance with the spec

+10

> as opposed to immediately claiming they're something entirely different

In case this comment is talking about my last sentence "More extremely, it
might be a totally different kind of table that is only surfaced through
the REST models." I don't mean the Iceberg tables/view in a REST compatible
catalog are entirely different. I mean there is another more extreme use
case, where the REST catalog can surface other non-Iceberg tables (e.g.
Hive Parquet tables) through the same REST model, which is a use case I
mentioned previously that is an interesting application of REST that we see
some users are interested in. Which is also why the metadata location is
important to guide those use cases.

> when we added the endpoint to load a VIEW, metadata-location was
correctly marked as required

hmmm interesting, you are right, I did not notice the difference between
table and view. Should we change that for tables then?

-Jack

On Thu, Feb 29, 2024 at 4:20 PM Ryan Blue  wrote:

> Oops. In the first paragraph, I meant “when we added the endpoint to load
> a VIEW, metadata-location was correctly marked as required."
>
> On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue  wrote:
>
>> Once again, I’m catching up late and might have a helpful perspective.
>>
>> I think there was a mistake in the OpenAPI spec for loading tables and
>> the metadata-location is not listed as required. I don’t recall that
>> being intentional, but maybe it was? Maybe for a different reason? Either
>> way, when we added the endpoint to load a catalog, metadata-location was
>> correctly marked as required.
>>
>> Whatever the reason for the field being optional, *the intent was never
>> to change requirements from Iceberg* that metadata is written to files
>> and atomic operations guarantee a linear history.
>>
>> I’m glad to clear up the confusion on that. Right now, *catalogs must
>> write metadata files for Iceberg tables and should guarantee a linear
>> history*.
>>
>> You may be able to get away with bending those rules (what Dan refers to
>> as not compliant), but that’s unintentional. We may also choose to relax
>> the requirement for metadata files in the future — I see support for the
>> idea and have considered proposing it also. But for now, it’s a
>> requirement, even if you don’t have to send the location to the client
>> (though note that the client has a hard dependency on it!).
>>
>> Ryan
>>
>> On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks 
>> wrote:
>>
>>> 1. I agree, this is what the spec currently requires
>>>
>>> 2. I agree, it's up for consideration
>>>
>>> 3. I agree, I think if an implementation didn't adhere to the current
>>> spec requirements, I would say it's out of spec (not sure I'd go as far as
>>> to say it's a different kind of table entirely).
>>>
>>> Just to expand on #3, we will find lots of cases where implementations
>>> deviate (likely unintentionally) from the rest/table spec and I feel like
>>> the goal is to identify those cases and steer them back into compliance
>>> with the spec as opposed to immediately claiming they're something entirely
>>> different.  The overarching goal is to improve openness and
>>> interoperability.
>>>
>>> My main point is that there isn't an inherent incompatibility between
>>> the REST spec and the Iceberg spec.  The preservation of the storage
>>> representation was discussed and intentional during the design/development
>>> of the REST spec.
>>>
>>> -Dan
>>>
>>>
>>> On Thu, Feb 29, 2024 at 3:40 PM Jack Ye  wrote:
>>>
 > For example, I cannot validate the atomic behaviors Glue claims, but
 I wouldn't assert that it is non-compliant because of that.

 I think these are not comparable claims because the API scope is
 completely different, but I don't think it's worth arguing in depth. Let's
 try to see if we can have some consensus.

 Based on what you said above, do you agree with the following 3 points?

 1. Today, a table/view in any catalog including a REST spec-compatible
 catalog is an Iceberg table/view if and only if it points to a JSON
 metadata file in storage. This concept is a part of the Iceberg table/view
 spec. There is a debate to be had for if we want to remove this requirement
 or not. The argument for it (as Yufei said) is to use other storage for
 better performance. The argument against it (as Amogh said) is to keep
 Iceberg open source friendly through the JSON format.

 2. Today, a table/view in any catalog including a REST spec-compatible
 catalog is an Iceberg table/view if and only if it behind the scene
 performs the atomic metadata file swap for every commit. This concept is a
 part of the Iceberg table/view spec. We should consider removing this
 requirement in the Iceberg table/view spec.

 3. A table/view in an Iceberg REST spec-compatible catalog m

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
Oops. In the first paragraph, I meant “when we added the endpoint to load a
VIEW, metadata-location was correctly marked as required."

On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue  wrote:

> Once again, I’m catching up late and might have a helpful perspective.
>
> I think there was a mistake in the OpenAPI spec for loading tables and the
> metadata-location is not listed as required. I don’t recall that being
> intentional, but maybe it was? Maybe for a different reason? Either way,
> when we added the endpoint to load a catalog, metadata-location was
> correctly marked as required.
>
> Whatever the reason for the field being optional, *the intent was never
> to change requirements from Iceberg* that metadata is written to files
> and atomic operations guarantee a linear history.
>
> I’m glad to clear up the confusion on that. Right now, *catalogs must
> write metadata files for Iceberg tables and should guarantee a linear
> history*.
>
> You may be able to get away with bending those rules (what Dan refers to
> as not compliant), but that’s unintentional. We may also choose to relax
> the requirement for metadata files in the future — I see support for the
> idea and have considered proposing it also. But for now, it’s a
> requirement, even if you don’t have to send the location to the client
> (though note that the client has a hard dependency on it!).
>
> Ryan
>
> On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks 
> wrote:
>
>> 1. I agree, this is what the spec currently requires
>>
>> 2. I agree, it's up for consideration
>>
>> 3. I agree, I think if an implementation didn't adhere to the current
>> spec requirements, I would say it's out of spec (not sure I'd go as far as
>> to say it's a different kind of table entirely).
>>
>> Just to expand on #3, we will find lots of cases where implementations
>> deviate (likely unintentionally) from the rest/table spec and I feel like
>> the goal is to identify those cases and steer them back into compliance
>> with the spec as opposed to immediately claiming they're something entirely
>> different.  The overarching goal is to improve openness and
>> interoperability.
>>
>> My main point is that there isn't an inherent incompatibility between the
>> REST spec and the Iceberg spec.  The preservation of the storage
>> representation was discussed and intentional during the design/development
>> of the REST spec.
>>
>> -Dan
>>
>>
>> On Thu, Feb 29, 2024 at 3:40 PM Jack Ye  wrote:
>>
>>> > For example, I cannot validate the atomic behaviors Glue claims, but I
>>> wouldn't assert that it is non-compliant because of that.
>>>
>>> I think these are not comparable claims because the API scope is
>>> completely different, but I don't think it's worth arguing in depth. Let's
>>> try to see if we can have some consensus.
>>>
>>> Based on what you said above, do you agree with the following 3 points?
>>>
>>> 1. Today, a table/view in any catalog including a REST spec-compatible
>>> catalog is an Iceberg table/view if and only if it points to a JSON
>>> metadata file in storage. This concept is a part of the Iceberg table/view
>>> spec. There is a debate to be had for if we want to remove this requirement
>>> or not. The argument for it (as Yufei said) is to use other storage for
>>> better performance. The argument against it (as Amogh said) is to keep
>>> Iceberg open source friendly through the JSON format.
>>>
>>> 2. Today, a table/view in any catalog including a REST spec-compatible
>>> catalog is an Iceberg table/view if and only if it behind the scene
>>> performs the atomic metadata file swap for every commit. This concept is a
>>> part of the Iceberg table/view spec. We should consider removing this
>>> requirement in the Iceberg table/view spec.
>>>
>>> 3. A table/view in an Iceberg REST spec-compatible catalog may or may
>>> not be an Iceberg table/view. The REST spec does not enforce this, and this
>>> stance will remain true going forward. For example, it could use the
>>> Iceberg table/view metadata structure but does not store the metadata in
>>> JSON file, or not use the metadata file swap commit procedure, or both, and
>>> in those cases it is not an Iceberg table/view. More extremely, it might be
>>> a totally different kind of table that is only surfaced through the REST
>>> models.
>>>
>>> -Jack
>>>
>>> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks 
>>> wrote:
>>>
 > In that case are tables in a REST-compliant catalog still an Iceberg
 table? I don't think so, because it is a table that only partially follows
 the Iceberg table spec.

 If the catalog is REST compliant and complies with the Iceberg spec,
 they are still Iceberg tables.  I can see there is an argument that if the
 catalog is REST compliant but does not follow the commit requirements (or
 aspects of the Iceberg spec), that you cannot call those Iceberg tables.
 But the assertion that Iceberg tables in a REST catalog are de facto
 non-compliant is incorrect.
>

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Ryan Blue
Once again, I’m catching up late and might have a helpful perspective.

I think there was a mistake in the OpenAPI spec for loading tables and the
metadata-location is not listed as required. I don’t recall that being
intentional, but maybe it was? Maybe for a different reason? Either way,
when we added the endpoint to load a catalog, metadata-location was
correctly marked as required.

Whatever the reason for the field being optional, *the intent was never to
change requirements from Iceberg* that metadata is written to files and
atomic operations guarantee a linear history.

I’m glad to clear up the confusion on that. Right now, *catalogs must write
metadata files for Iceberg tables and should guarantee a linear history*.

You may be able to get away with bending those rules (what Dan refers to as
not compliant), but that’s unintentional. We may also choose to relax the
requirement for metadata files in the future — I see support for the idea
and have considered proposing it also. But for now, it’s a requirement,
even if you don’t have to send the location to the client (though note that
the client has a hard dependency on it!).

Ryan

On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks 
wrote:

> 1. I agree, this is what the spec currently requires
>
> 2. I agree, it's up for consideration
>
> 3. I agree, I think if an implementation didn't adhere to the current spec
> requirements, I would say it's out of spec (not sure I'd go as far as to
> say it's a different kind of table entirely).
>
> Just to expand on #3, we will find lots of cases where implementations
> deviate (likely unintentionally) from the rest/table spec and I feel like
> the goal is to identify those cases and steer them back into compliance
> with the spec as opposed to immediately claiming they're something entirely
> different.  The overarching goal is to improve openness and
> interoperability.
>
> My main point is that there isn't an inherent incompatibility between the
> REST spec and the Iceberg spec.  The preservation of the storage
> representation was discussed and intentional during the design/development
> of the REST spec.
>
> -Dan
>
>
> On Thu, Feb 29, 2024 at 3:40 PM Jack Ye  wrote:
>
>> > For example, I cannot validate the atomic behaviors Glue claims, but I
>> wouldn't assert that it is non-compliant because of that.
>>
>> I think these are not comparable claims because the API scope is
>> completely different, but I don't think it's worth arguing in depth. Let's
>> try to see if we can have some consensus.
>>
>> Based on what you said above, do you agree with the following 3 points?
>>
>> 1. Today, a table/view in any catalog including a REST spec-compatible
>> catalog is an Iceberg table/view if and only if it points to a JSON
>> metadata file in storage. This concept is a part of the Iceberg table/view
>> spec. There is a debate to be had for if we want to remove this requirement
>> or not. The argument for it (as Yufei said) is to use other storage for
>> better performance. The argument against it (as Amogh said) is to keep
>> Iceberg open source friendly through the JSON format.
>>
>> 2. Today, a table/view in any catalog including a REST spec-compatible
>> catalog is an Iceberg table/view if and only if it behind the scene
>> performs the atomic metadata file swap for every commit. This concept is a
>> part of the Iceberg table/view spec. We should consider removing this
>> requirement in the Iceberg table/view spec.
>>
>> 3. A table/view in an Iceberg REST spec-compatible catalog may or may not
>> be an Iceberg table/view. The REST spec does not enforce this, and this
>> stance will remain true going forward. For example, it could use the
>> Iceberg table/view metadata structure but does not store the metadata in
>> JSON file, or not use the metadata file swap commit procedure, or both, and
>> in those cases it is not an Iceberg table/view. More extremely, it might be
>> a totally different kind of table that is only surfaced through the REST
>> models.
>>
>> -Jack
>>
>> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks 
>> wrote:
>>
>>> > In that case are tables in a REST-compliant catalog still an Iceberg
>>> table? I don't think so, because it is a table that only partially follows
>>> the Iceberg table spec.
>>>
>>> If the catalog is REST compliant and complies with the Iceberg spec,
>>> they are still Iceberg tables.  I can see there is an argument that if the
>>> catalog is REST compliant but does not follow the commit requirements (or
>>> aspects of the Iceberg spec), that you cannot call those Iceberg tables.
>>> But the assertion that Iceberg tables in a REST catalog are de facto
>>> non-compliant is incorrect.
>>>
>>> > I like the idea about validation for format compliance. But don't
>>> think you can technically validate this. You can validate the static table
>>> to see if it has all the Iceberg metadata components, but you can not
>>> validate the internal behavior of the service during a commit to see if it
>

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
1. I agree, this is what the spec currently requires

2. I agree, it's up for consideration

3. I agree, I think if an implementation didn't adhere to the current spec
requirements, I would say it's out of spec (not sure I'd go as far as to
say it's a different kind of table entirely).

Just to expand on #3, we will find lots of cases where implementations
deviate (likely unintentionally) from the rest/table spec and I feel like
the goal is to identify those cases and steer them back into compliance
with the spec as opposed to immediately claiming they're something entirely
different.  The overarching goal is to improve openness and
interoperability.

My main point is that there isn't an inherent incompatibility between the
REST spec and the Iceberg spec.  The preservation of the storage
representation was discussed and intentional during the design/development
of the REST spec.

-Dan


On Thu, Feb 29, 2024 at 3:40 PM Jack Ye  wrote:

> > For example, I cannot validate the atomic behaviors Glue claims, but I
> wouldn't assert that it is non-compliant because of that.
>
> I think these are not comparable claims because the API scope is
> completely different, but I don't think it's worth arguing in depth. Let's
> try to see if we can have some consensus.
>
> Based on what you said above, do you agree with the following 3 points?
>
> 1. Today, a table/view in any catalog including a REST spec-compatible
> catalog is an Iceberg table/view if and only if it points to a JSON
> metadata file in storage. This concept is a part of the Iceberg table/view
> spec. There is a debate to be had for if we want to remove this requirement
> or not. The argument for it (as Yufei said) is to use other storage for
> better performance. The argument against it (as Amogh said) is to keep
> Iceberg open source friendly through the JSON format.
>
> 2. Today, a table/view in any catalog including a REST spec-compatible
> catalog is an Iceberg table/view if and only if it behind the scene
> performs the atomic metadata file swap for every commit. This concept is a
> part of the Iceberg table/view spec. We should consider removing this
> requirement in the Iceberg table/view spec.
>
> 3. A table/view in an Iceberg REST spec-compatible catalog may or may not
> be an Iceberg table/view. The REST spec does not enforce this, and this
> stance will remain true going forward. For example, it could use the
> Iceberg table/view metadata structure but does not store the metadata in
> JSON file, or not use the metadata file swap commit procedure, or both, and
> in those cases it is not an Iceberg table/view. More extremely, it might be
> a totally different kind of table that is only surfaced through the REST
> models.
>
> -Jack
>
> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks 
> wrote:
>
>> > In that case are tables in a REST-compliant catalog still an Iceberg
>> table? I don't think so, because it is a table that only partially follows
>> the Iceberg table spec.
>>
>> If the catalog is REST compliant and complies with the Iceberg spec, they
>> are still Iceberg tables.  I can see there is an argument that if the
>> catalog is REST compliant but does not follow the commit requirements (or
>> aspects of the Iceberg spec), that you cannot call those Iceberg tables.
>> But the assertion that Iceberg tables in a REST catalog are de facto
>> non-compliant is incorrect.
>>
>> > I like the idea about validation for format compliance. But don't think
>> you can technically validate this. You can validate the static table to see
>> if it has all the Iceberg metadata components, but you can not validate the
>> internal behavior of the service during a commit to see if it really
>> atomically swapped a metadata file.
>>
>> Just because you cannot see/validate the implementation doesn't mean that
>> it is non-compliant.  For example, I cannot validate the atomic behaviors
>> Glue claims, but I wouldn't assert that it is non-compliant because of that.
>>
>> I do think there is a discussion to be had about if/when we might adjust
>> the storage/swap requirements, but to reinforce Amogh's point, removing
>> those requirements would impact the openness and accessibility of Iceberg,
>> which I feel would hamper adoption.
>>
>> -Dan
>>
>>
>>
>> On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu  wrote:
>>
>>> We've periodically discussed removing the storage requirement and I
 think there's a path forward to do that and would agree that standardizing
 on REST, but I wouldn't say the justification for making this push is that
 REST is not compliant so we can just ignore the table spec requirements.
 There are a few more things to consider, which is that not everything
 can use REST currently and making a hard cut away from file based metadata
 could bifurcate access to Iceberg data.  There are also aspects to the spec
 that reference the metadata paths (like metadata log, though it's
 optional), but would likely need to be addressed.
>>>

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> For example, I cannot validate the atomic behaviors Glue claims, but I
wouldn't assert that it is non-compliant because of that.

I think these are not comparable claims because the API scope is completely
different, but I don't think it's worth arguing in depth. Let's try to see
if we can have some consensus.

Based on what you said above, do you agree with the following 3 points?

1. Today, a table/view in any catalog including a REST spec-compatible
catalog is an Iceberg table/view if and only if it points to a JSON
metadata file in storage. This concept is a part of the Iceberg table/view
spec. There is a debate to be had for if we want to remove this requirement
or not. The argument for it (as Yufei said) is to use other storage for
better performance. The argument against it (as Amogh said) is to keep
Iceberg open source friendly through the JSON format.

2. Today, a table/view in any catalog including a REST spec-compatible
catalog is an Iceberg table/view if and only if it behind the scene
performs the atomic metadata file swap for every commit. This concept is a
part of the Iceberg table/view spec. We should consider removing this
requirement in the Iceberg table/view spec.

3. A table/view in an Iceberg REST spec-compatible catalog may or may not
be an Iceberg table/view. The REST spec does not enforce this, and this
stance will remain true going forward. For example, it could use the
Iceberg table/view metadata structure but does not store the metadata in
JSON file, or not use the metadata file swap commit procedure, or both, and
in those cases it is not an Iceberg table/view. More extremely, it might be
a totally different kind of table that is only surfaced through the REST
models.

-Jack

On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks 
wrote:

> > In that case are tables in a REST-compliant catalog still an Iceberg
> table? I don't think so, because it is a table that only partially follows
> the Iceberg table spec.
>
> If the catalog is REST compliant and complies with the Iceberg spec, they
> are still Iceberg tables.  I can see there is an argument that if the
> catalog is REST compliant but does not follow the commit requirements (or
> aspects of the Iceberg spec), that you cannot call those Iceberg tables.
> But the assertion that Iceberg tables in a REST catalog are de facto
> non-compliant is incorrect.
>
> > I like the idea about validation for format compliance. But don't think
> you can technically validate this. You can validate the static table to see
> if it has all the Iceberg metadata components, but you can not validate the
> internal behavior of the service during a commit to see if it really
> atomically swapped a metadata file.
>
> Just because you cannot see/validate the implementation doesn't mean that
> it is non-compliant.  For example, I cannot validate the atomic behaviors
> Glue claims, but I wouldn't assert that it is non-compliant because of that.
>
> I do think there is a discussion to be had about if/when we might adjust
> the storage/swap requirements, but to reinforce Amogh's point, removing
> those requirements would impact the openness and accessibility of Iceberg,
> which I feel would hamper adoption.
>
> -Dan
>
>
>
> On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu  wrote:
>
>> We've periodically discussed removing the storage requirement and I think
>>> there's a path forward to do that and would agree that standardizing on
>>> REST, but I wouldn't say the justification for making this push is that
>>> REST is not compliant so we can just ignore the table spec requirements.
>>> There are a few more things to consider, which is that not everything
>>> can use REST currently and making a hard cut away from file based metadata
>>> could bifurcate access to Iceberg data.  There are also aspects to the spec
>>> that reference the metadata paths (like metadata log, though it's
>>> optional), but would likely need to be addressed.
>>
>>
>> This is a bit off-topic. It makes sense to me to remove the storage
>> requirement moving foward. The metadata.json file isn't necessary in the
>> Rest catalog. For example, the rest catalog may not have the permission to
>> write to the table owner's storage. It still can save it as a file of
>> course, but doesn't quite make sense. Putting it in a key-value store or
>> RDMS could be a better option.
>>
>> Given that we are going to remove the storage requirement. Should we
>> avoid the file path in the current design for things like view spec? A
>> solution like table identifier + version uuid may serve the purpose.
>>
>> Yufei
>>
>>
>> On Thu, Feb 29, 2024 at 1:29 PM Jack Ye  wrote:
>>
>>> > There's no exemption that says if you're using REST you don't need to
>>> follow the spec.  Why do you think that's the case?
>>>
>>> In that case are tables in a REST-compliant catalog still an Iceberg
>>> table? I don't think so, because it is a table that only partially follows
>>> the Iceberg table spec.
>>>
>>> I like the idea about validat

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
> In that case are tables in a REST-compliant catalog still an Iceberg
table? I don't think so, because it is a table that only partially follows
the Iceberg table spec.

If the catalog is REST compliant and complies with the Iceberg spec, they
are still Iceberg tables.  I can see there is an argument that if the
catalog is REST compliant but does not follow the commit requirements (or
aspects of the Iceberg spec), that you cannot call those Iceberg tables.
But the assertion that Iceberg tables in a REST catalog are de facto
non-compliant is incorrect.

> I like the idea about validation for format compliance. But don't think
you can technically validate this. You can validate the static table to see
if it has all the Iceberg metadata components, but you can not validate the
internal behavior of the service during a commit to see if it really
atomically swapped a metadata file.

Just because you cannot see/validate the implementation doesn't mean that
it is non-compliant.  For example, I cannot validate the atomic behaviors
Glue claims, but I wouldn't assert that it is non-compliant because of that.

I do think there is a discussion to be had about if/when we might adjust
the storage/swap requirements, but to reinforce Amogh's point, removing
those requirements would impact the openness and accessibility of Iceberg,
which I feel would hamper adoption.

-Dan



On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu  wrote:

> We've periodically discussed removing the storage requirement and I think
>> there's a path forward to do that and would agree that standardizing on
>> REST, but I wouldn't say the justification for making this push is that
>> REST is not compliant so we can just ignore the table spec requirements.
>> There are a few more things to consider, which is that not everything can
>> use REST currently and making a hard cut away from file based metadata
>> could bifurcate access to Iceberg data.  There are also aspects to the spec
>> that reference the metadata paths (like metadata log, though it's
>> optional), but would likely need to be addressed.
>
>
> This is a bit off-topic. It makes sense to me to remove the storage
> requirement moving foward. The metadata.json file isn't necessary in the
> Rest catalog. For example, the rest catalog may not have the permission to
> write to the table owner's storage. It still can save it as a file of
> course, but doesn't quite make sense. Putting it in a key-value store or
> RDMS could be a better option.
>
> Given that we are going to remove the storage requirement. Should we avoid
> the file path in the current design for things like view spec? A solution
> like table identifier + version uuid may serve the purpose.
>
> Yufei
>
>
> On Thu, Feb 29, 2024 at 1:29 PM Jack Ye  wrote:
>
>> > There's no exemption that says if you're using REST you don't need to
>> follow the spec.  Why do you think that's the case?
>>
>> In that case are tables in a REST-compliant catalog still an Iceberg
>> table? I don't think so, because it is a table that only partially follows
>> the Iceberg table spec.
>>
>> I like the idea about validation for format compliance. But don't think
>> you can technically validate this. You can validate the static table to see
>> if it has all the Iceberg metadata components, but you can not validate the
>> internal behavior of the service during a commit to see if it really
>> atomically swapped a metadata file.
>>
>> So I think at minimum we should update the table/view spec to remove the
>> metadata file swap requirement. The Iceberg table/view spec should be a
>> pure format spec that specifies how the file is laid out in storage.
>>
>> -Jack
>>
>> On Thu, Feb 29, 2024 at 1:22 PM Amogh Jahagirdar 
>> wrote:
>>
>>> I want to echo Dan's point that just because there is a separate spec
>>> for a REST Catalog does not mean that implementations can deviate from the
>>> spec's definition of the commit protocol or metadata layout, and still be
>>> considered "spec compliant".
>>>
>>> > Secondly, once we do that, we should declare REST spec as the official
>>> catalog spec to interact with Iceberg tables. Otherwise at least I will be
>>> very tempted to just break the atomic pointer swap pattern and store the
>>> entire metadata using the Glue Table object to achieve much better
>>> performance and also Glue native feature integrations, and I think other
>>> players will be equally motivated to do something similar. That will lead
>>> to even more chaos in the Iceberg catalog space.
>>>
>>> On this, a second point I want to make is around the openness of this
>>> ecosystem. We all already know that openness (the file formats, the
>>> metadata layout, the spec itself) is a fundamental tenant of the project.
>>> If we take the provided example of removing the metadata JSON file and
>>> moving it to some other storage, I think that goes against this principle
>>> since a JSON file is quite open by definition. Going back to the first
>>> point, I think a ca

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Yufei Gu
>
> We've periodically discussed removing the storage requirement and I think
> there's a path forward to do that and would agree that standardizing on
> REST, but I wouldn't say the justification for making this push is that
> REST is not compliant so we can just ignore the table spec requirements.
> There are a few more things to consider, which is that not everything can
> use REST currently and making a hard cut away from file based metadata
> could bifurcate access to Iceberg data.  There are also aspects to the spec
> that reference the metadata paths (like metadata log, though it's
> optional), but would likely need to be addressed.


This is a bit off-topic. It makes sense to me to remove the storage
requirement moving foward. The metadata.json file isn't necessary in the
Rest catalog. For example, the rest catalog may not have the permission to
write to the table owner's storage. It still can save it as a file of
course, but doesn't quite make sense. Putting it in a key-value store or
RDMS could be a better option.

Given that we are going to remove the storage requirement. Should we avoid
the file path in the current design for things like view spec? A solution
like table identifier + version uuid may serve the purpose.

Yufei


On Thu, Feb 29, 2024 at 1:29 PM Jack Ye  wrote:

> > There's no exemption that says if you're using REST you don't need to
> follow the spec.  Why do you think that's the case?
>
> In that case are tables in a REST-compliant catalog still an Iceberg
> table? I don't think so, because it is a table that only partially follows
> the Iceberg table spec.
>
> I like the idea about validation for format compliance. But don't think
> you can technically validate this. You can validate the static table to see
> if it has all the Iceberg metadata components, but you can not validate the
> internal behavior of the service during a commit to see if it really
> atomically swapped a metadata file.
>
> So I think at minimum we should update the table/view spec to remove the
> metadata file swap requirement. The Iceberg table/view spec should be a
> pure format spec that specifies how the file is laid out in storage.
>
> -Jack
>
> On Thu, Feb 29, 2024 at 1:22 PM Amogh Jahagirdar  wrote:
>
>> I want to echo Dan's point that just because there is a separate spec for
>> a REST Catalog does not mean that implementations can deviate from the
>> spec's definition of the commit protocol or metadata layout, and still be
>> considered "spec compliant".
>>
>> > Secondly, once we do that, we should declare REST spec as the official
>> catalog spec to interact with Iceberg tables. Otherwise at least I will be
>> very tempted to just break the atomic pointer swap pattern and store the
>> entire metadata using the Glue Table object to achieve much better
>> performance and also Glue native feature integrations, and I think other
>> players will be equally motivated to do something similar. That will lead
>> to even more chaos in the Iceberg catalog space.
>>
>> On this, a second point I want to make is around the openness of this
>> ecosystem. We all already know that openness (the file formats, the
>> metadata layout, the spec itself) is a fundamental tenant of the project.
>> If we take the provided example of removing the metadata JSON file and
>> moving it to some other storage, I think that goes against this principle
>> since a JSON file is quite open by definition. Going back to the first
>> point, I think a catalog which has such a behavior would *not* be
>> considered spec compliant. Another reason this is important is if we think
>> about what's healthiest for all users of Iceberg, is to have a healthy list
>> of options for catalog choices. Storing the metadata JSON in non-open ways
>> can make users lives harder for trying out new catalogs since now the
>> metadata would be stored in their own way, and the users will have a harder
>> time accessing their own data.
>>
>> A last point I'd like to make is I think there's a good discussion to be
>> had on how do we validate that a REST Catalog implementation is spec
>> compliant. I think that's really beneficial for the ecosystem as a whole.
>> Before that, I think first though we'd want to conclude on this topic
>> itself.
>>
>> On Thu, Feb 29, 2024 at 12:29 PM Daniel Weeks 
>> wrote:
>>
>>> > REST spec-compliant catalog does not need to follow the Iceberg spec
>>> to commit or store metadata
>>>
>>> If the REST implementation doesn't follow the Iceberg spec for commit
>>> requirements, it's not compliant with the spec.  There's no exemption that
>>> says if you're using REST you don't need to follow the spec.  Why do you
>>> think that's the case?
>>>
>>> I don't believe there's a reason to say that the REST spec needs to
>>> enforce the commit requirements either, that's a requirement of the Iceberg
>>> spec and still needs to be complied with.
>>>
>>> -Dan
>>>
>>> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye  wrote:
>>>
 > The implementat

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> There's no exemption that says if you're using REST you don't need to
follow the spec.  Why do you think that's the case?

In that case are tables in a REST-compliant catalog still an Iceberg table?
I don't think so, because it is a table that only partially follows the
Iceberg table spec.

I like the idea about validation for format compliance. But don't think you
can technically validate this. You can validate the static table to see if
it has all the Iceberg metadata components, but you can not validate the
internal behavior of the service during a commit to see if it really
atomically swapped a metadata file.

So I think at minimum we should update the table/view spec to remove the
metadata file swap requirement. The Iceberg table/view spec should be a
pure format spec that specifies how the file is laid out in storage.

-Jack

On Thu, Feb 29, 2024 at 1:22 PM Amogh Jahagirdar  wrote:

> I want to echo Dan's point that just because there is a separate spec for
> a REST Catalog does not mean that implementations can deviate from the
> spec's definition of the commit protocol or metadata layout, and still be
> considered "spec compliant".
>
> > Secondly, once we do that, we should declare REST spec as the official
> catalog spec to interact with Iceberg tables. Otherwise at least I will be
> very tempted to just break the atomic pointer swap pattern and store the
> entire metadata using the Glue Table object to achieve much better
> performance and also Glue native feature integrations, and I think other
> players will be equally motivated to do something similar. That will lead
> to even more chaos in the Iceberg catalog space.
>
> On this, a second point I want to make is around the openness of this
> ecosystem. We all already know that openness (the file formats, the
> metadata layout, the spec itself) is a fundamental tenant of the project.
> If we take the provided example of removing the metadata JSON file and
> moving it to some other storage, I think that goes against this principle
> since a JSON file is quite open by definition. Going back to the first
> point, I think a catalog which has such a behavior would *not* be
> considered spec compliant. Another reason this is important is if we think
> about what's healthiest for all users of Iceberg, is to have a healthy list
> of options for catalog choices. Storing the metadata JSON in non-open ways
> can make users lives harder for trying out new catalogs since now the
> metadata would be stored in their own way, and the users will have a harder
> time accessing their own data.
>
> A last point I'd like to make is I think there's a good discussion to be
> had on how do we validate that a REST Catalog implementation is spec
> compliant. I think that's really beneficial for the ecosystem as a whole.
> Before that, I think first though we'd want to conclude on this topic
> itself.
>
> On Thu, Feb 29, 2024 at 12:29 PM Daniel Weeks 
> wrote:
>
>> > REST spec-compliant catalog does not need to follow the Iceberg spec to
>> commit or store metadata
>>
>> If the REST implementation doesn't follow the Iceberg spec for commit
>> requirements, it's not compliant with the spec.  There's no exemption that
>> says if you're using REST you don't need to follow the spec.  Why do you
>> think that's the case?
>>
>> I don't believe there's a reason to say that the REST spec needs to
>> enforce the commit requirements either, that's a requirement of the Iceberg
>> spec and still needs to be complied with.
>>
>> -Dan
>>
>> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye  wrote:
>>
>>> > The implementation of the spec can either be compliant or not.
>>>
>>> This is exactly the problem we are talking about right? Just to give an
>>> example, we cannot technically say that tables/views in the Tabular catalog
>>> are Iceberg tables/views, because a REST spec-compliant catalog does not
>>> need to follow the Iceberg spec to commit or store metadata. Even if you
>>> say it is, there is no way to really prove that, because the REST spec does
>>> not enforce it.
>>>
>>> JB, what do you mean by participating on the Catalog RFC? Is there
>>> already an ongoing RFC?
>>>
>>> -Jack
>>>
>>>
>>> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi Dan,

 I agree with your statement about REST Spec is not an implement but I
 strongly disagree with your statement "impl of the spec can either be
 compliant or not".

 The REST Catalog spec impl should be consistent with the REST Spec.
 That's why a reference implementation in Iceberg would be a must, with a
 TCK.

 The REST Spec should bridge/give access to Table/View metadata. I think
 it would make sense to have a resource to GET the Table/View metadata, also
 supporting PUT to update.
 JSON Schema and eventually JSON RPC could help on some area here
 (compliant with OpenAPI).

 In another thread, I propose to work on a Catalog RFC, exactly to
 ta

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Amogh Jahagirdar
I want to echo Dan's point that just because there is a separate spec for a
REST Catalog does not mean that implementations can deviate from the spec's
definition of the commit protocol or metadata layout, and still be
considered "spec compliant".

> Secondly, once we do that, we should declare REST spec as the official
catalog spec to interact with Iceberg tables. Otherwise at least I will be
very tempted to just break the atomic pointer swap pattern and store the
entire metadata using the Glue Table object to achieve much better
performance and also Glue native feature integrations, and I think other
players will be equally motivated to do something similar. That will lead
to even more chaos in the Iceberg catalog space.

On this, a second point I want to make is around the openness of this
ecosystem. We all already know that openness (the file formats, the
metadata layout, the spec itself) is a fundamental tenant of the project.
If we take the provided example of removing the metadata JSON file and
moving it to some other storage, I think that goes against this principle
since a JSON file is quite open by definition. Going back to the first
point, I think a catalog which has such a behavior would *not* be
considered spec compliant. Another reason this is important is if we think
about what's healthiest for all users of Iceberg, is to have a healthy list
of options for catalog choices. Storing the metadata JSON in non-open ways
can make users lives harder for trying out new catalogs since now the
metadata would be stored in their own way, and the users will have a harder
time accessing their own data.

A last point I'd like to make is I think there's a good discussion to be
had on how do we validate that a REST Catalog implementation is spec
compliant. I think that's really beneficial for the ecosystem as a whole.
Before that, I think first though we'd want to conclude on this topic
itself.

On Thu, Feb 29, 2024 at 12:29 PM Daniel Weeks 
wrote:

> > REST spec-compliant catalog does not need to follow the Iceberg spec to
> commit or store metadata
>
> If the REST implementation doesn't follow the Iceberg spec for commit
> requirements, it's not compliant with the spec.  There's no exemption that
> says if you're using REST you don't need to follow the spec.  Why do you
> think that's the case?
>
> I don't believe there's a reason to say that the REST spec needs to
> enforce the commit requirements either, that's a requirement of the Iceberg
> spec and still needs to be complied with.
>
> -Dan
>
> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye  wrote:
>
>> > The implementation of the spec can either be compliant or not.
>>
>> This is exactly the problem we are talking about right? Just to give an
>> example, we cannot technically say that tables/views in the Tabular catalog
>> are Iceberg tables/views, because a REST spec-compliant catalog does not
>> need to follow the Iceberg spec to commit or store metadata. Even if you
>> say it is, there is no way to really prove that, because the REST spec does
>> not enforce it.
>>
>> JB, what do you mean by participating on the Catalog RFC? Is there
>> already an ongoing RFC?
>>
>> -Jack
>>
>>
>> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Dan,
>>>
>>> I agree with your statement about REST Spec is not an implement but I
>>> strongly disagree with your statement "impl of the spec can either be
>>> compliant or not".
>>>
>>> The REST Catalog spec impl should be consistent with the REST Spec.
>>> That's why a reference implementation in Iceberg would be a must, with a
>>> TCK.
>>>
>>> The REST Spec should bridge/give access to Table/View metadata. I think
>>> it would make sense to have a resource to GET the Table/View metadata, also
>>> supporting PUT to update.
>>> JSON Schema and eventually JSON RPC could help on some area here
>>> (compliant with OpenAPI).
>>>
>>> In another thread, I propose to work on a Catalog RFC, exactly to target
>>> this. I think it would make sense to have the REST/Catalog RFC as the main
>>> catalog API, so it has to be both consistent (giving access to table/view
>>> metadata) and extensible (via OpenAPI Extensions for instance).
>>>
>>> So, I agree with Jack: the minimum would be to have JSON metadata
>>> exposed by the REST Spec.
>>>
>>> @Jack, short term I'm in favor of your proposal, long term, I propose to
>>> participate on the Catalog RFC (REST Spec). WDYT ?
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>>
>>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks 
>>> a écrit :
>>>
 Hey Jack,

 I'm not sure I agree with the framing of this argument.  The REST Spec
 defines a protocol, not an implementation.

 The implementation of the spec can either be compliant or not.  So a
 REST Implementation that adheres to all the requirements (atomic location
 swap, json representation, etc.), would be compliant.  There's no
 requirement around who performs these operations and with REST, that

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hey Jack

It's a proposal in another thread (community effort on Catalog RFC).

Regards
JB

On Thu, Feb 29, 2024 at 9:19 PM Jack Ye  wrote:
>
> > The implementation of the spec can either be compliant or not.
>
> This is exactly the problem we are talking about right? Just to give an 
> example, we cannot technically say that tables/views in the Tabular catalog 
> are Iceberg tables/views, because a REST spec-compliant catalog does not need 
> to follow the Iceberg spec to commit or store metadata. Even if you say it 
> is, there is no way to really prove that, because the REST spec does not 
> enforce it.
>
> JB, what do you mean by participating on the Catalog RFC? Is there already an 
> ongoing RFC?
>
> -Jack
>
>
> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré  
> wrote:
>>
>> Hi Dan,
>>
>> I agree with your statement about REST Spec is not an implement but I 
>> strongly disagree with your statement "impl of the spec can either be 
>> compliant or not".
>>
>> The REST Catalog spec impl should be consistent with the REST Spec. That's 
>> why a reference implementation in Iceberg would be a must, with a TCK.
>>
>> The REST Spec should bridge/give access to Table/View metadata. I think it 
>> would make sense to have a resource to GET the Table/View metadata, also 
>> supporting PUT to update.
>> JSON Schema and eventually JSON RPC could help on some area here (compliant 
>> with OpenAPI).
>>
>> In another thread, I propose to work on a Catalog RFC, exactly to target 
>> this. I think it would make sense to have the REST/Catalog RFC as the main 
>> catalog API, so it has to be both consistent (giving access to table/view 
>> metadata) and extensible (via OpenAPI Extensions for instance).
>>
>> So, I agree with Jack: the minimum would be to have JSON metadata exposed by 
>> the REST Spec.
>>
>> @Jack, short term I'm in favor of your proposal, long term, I propose to 
>> participate on the Catalog RFC (REST Spec). WDYT ?
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks  a 
>> écrit :
>>>
>>> Hey Jack,
>>>
>>> I'm not sure I agree with the framing of this argument.  The REST Spec 
>>> defines a protocol, not an implementation.
>>>
>>> The implementation of the spec can either be compliant or not.  So a REST 
>>> Implementation that adheres to all the requirements (atomic location swap, 
>>> json representation, etc.), would be compliant.  There's no requirement 
>>> around who performs these operations and with REST, that is delegated to 
>>> the server.  The optional metadata location doesn't mean that there isn't a 
>>> metadata location, just that it may not be exposed directly in the response.
>>>
>>> Therefore, an implementation where you just store the table metadata in a 
>>> Glue Table object, would not be compliant, currently.
>>>
>>> We've periodically discussed removing the storage requirement and I think 
>>> there's a path forward to do that and would agree that standardizing on 
>>> REST, but I wouldn't say the justification for making this push is that 
>>> REST is not compliant so we can just ignore the table spec requirements.
>>>
>>> There are a few more things to consider, which is that not everything can 
>>> use REST currently and making a hard cut away from file based metadata 
>>> could bifurcate access to Iceberg data.  There are also aspects to the spec 
>>> that reference the metadata paths (like metadata log, though it's 
>>> optional), but would likely need to be addressed.
>>>
>>> -Dan
>>>
>>>
>>>
>>> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:

 Hi everyone,

 Just want to pull this specific topic out of the materialized view 
 discussion thread. I noticed this during the MV discussion, and I think it 
 is important to clarify this not just for the MV topic, but also for the 
 ongoing discussion to consolidate all the different catalogs.

 How the table/view spec defines Iceberg table/view

 If we look into the table/view spec, the optimistic concurrency section 
 requires the existence of a metadata file, and the atomic swap of the 
 metadata file ensures serializable isolation. This implies 2 things:
 1. the metadata file in a storage that holds the information described in 
 the rest of the spec.
 2. there is an object in a catalog that holds the pointer of the metadata 
 file. What object and what catalog is implementation dependent, but these 
 generalized concepts are always intact.

 The JSON serialization parts of the spec plus the reader requirements also 
 implies that the metadata file is in JSON format.

 So when we talk about an Iceberg table/view that is compliant with the 
 spec, it is the combination of all these 5 requirements:
 1. there is an object in the catalog representing this table/view
 2. there is a pointer to a JSON metadata file in the object
 3. the JSON metadata file exists in storage and contai

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hey Dan

imho, the REST Spec should provide access to the Iceberg spec layer. I
don't say both should be in sync, but REST Spec should expose the
resources of the Iceberg Spec.

Else, I would consider it incomplete and limited in terms of features.

Regards
JB

On Thu, Feb 29, 2024 at 9:28 PM Daniel Weeks  wrote:
>
> > REST spec-compliant catalog does not need to follow the Iceberg spec to 
> > commit or store metadata
>
> If the REST implementation doesn't follow the Iceberg spec for commit 
> requirements, it's not compliant with the spec.  There's no exemption that 
> says if you're using REST you don't need to follow the spec.  Why do you 
> think that's the case?
>
> I don't believe there's a reason to say that the REST spec needs to enforce 
> the commit requirements either, that's a requirement of the Iceberg spec and 
> still needs to be complied with.
>
> -Dan
>
> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye  wrote:
>>
>> > The implementation of the spec can either be compliant or not.
>>
>> This is exactly the problem we are talking about right? Just to give an 
>> example, we cannot technically say that tables/views in the Tabular catalog 
>> are Iceberg tables/views, because a REST spec-compliant catalog does not 
>> need to follow the Iceberg spec to commit or store metadata. Even if you say 
>> it is, there is no way to really prove that, because the REST spec does not 
>> enforce it.
>>
>> JB, what do you mean by participating on the Catalog RFC? Is there already 
>> an ongoing RFC?
>>
>> -Jack
>>
>>
>> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Hi Dan,
>>>
>>> I agree with your statement about REST Spec is not an implement but I 
>>> strongly disagree with your statement "impl of the spec can either be 
>>> compliant or not".
>>>
>>> The REST Catalog spec impl should be consistent with the REST Spec. That's 
>>> why a reference implementation in Iceberg would be a must, with a TCK.
>>>
>>> The REST Spec should bridge/give access to Table/View metadata. I think it 
>>> would make sense to have a resource to GET the Table/View metadata, also 
>>> supporting PUT to update.
>>> JSON Schema and eventually JSON RPC could help on some area here (compliant 
>>> with OpenAPI).
>>>
>>> In another thread, I propose to work on a Catalog RFC, exactly to target 
>>> this. I think it would make sense to have the REST/Catalog RFC as the main 
>>> catalog API, so it has to be both consistent (giving access to table/view 
>>> metadata) and extensible (via OpenAPI Extensions for instance).
>>>
>>> So, I agree with Jack: the minimum would be to have JSON metadata exposed 
>>> by the REST Spec.
>>>
>>> @Jack, short term I'm in favor of your proposal, long term, I propose to 
>>> participate on the Catalog RFC (REST Spec). WDYT ?
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>>
>>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks  a 
>>> écrit :

 Hey Jack,

 I'm not sure I agree with the framing of this argument.  The REST Spec 
 defines a protocol, not an implementation.

 The implementation of the spec can either be compliant or not.  So a REST 
 Implementation that adheres to all the requirements (atomic location swap, 
 json representation, etc.), would be compliant.  There's no requirement 
 around who performs these operations and with REST, that is delegated to 
 the server.  The optional metadata location doesn't mean that there isn't 
 a metadata location, just that it may not be exposed directly in the 
 response.

 Therefore, an implementation where you just store the table metadata in a 
 Glue Table object, would not be compliant, currently.

 We've periodically discussed removing the storage requirement and I think 
 there's a path forward to do that and would agree that standardizing on 
 REST, but I wouldn't say the justification for making this push is that 
 REST is not compliant so we can just ignore the table spec requirements.

 There are a few more things to consider, which is that not everything can 
 use REST currently and making a hard cut away from file based metadata 
 could bifurcate access to Iceberg data.  There are also aspects to the 
 spec that reference the metadata paths (like metadata log, though it's 
 optional), but would likely need to be addressed.

 -Dan



 On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:
>
> Hi everyone,
>
> Just want to pull this specific topic out of the materialized view 
> discussion thread. I noticed this during the MV discussion, and I think 
> it is important to clarify this not just for the MV topic, but also for 
> the ongoing discussion to consolidate all the different catalogs.
>
> How the table/view spec defines Iceberg table/view
>
> If we look into the table/view spec, the optimistic concurrency section 
> requires the existence of a metadata fil

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
> REST spec-compliant catalog does not need to follow the Iceberg spec to
commit or store metadata

If the REST implementation doesn't follow the Iceberg spec for commit
requirements, it's not compliant with the spec.  There's no exemption that
says if you're using REST you don't need to follow the spec.  Why do you
think that's the case?

I don't believe there's a reason to say that the REST spec needs to enforce
the commit requirements either, that's a requirement of the Iceberg spec
and still needs to be complied with.

-Dan

On Thu, Feb 29, 2024 at 12:19 PM Jack Ye  wrote:

> > The implementation of the spec can either be compliant or not.
>
> This is exactly the problem we are talking about right? Just to give an
> example, we cannot technically say that tables/views in the Tabular catalog
> are Iceberg tables/views, because a REST spec-compliant catalog does not
> need to follow the Iceberg spec to commit or store metadata. Even if you
> say it is, there is no way to really prove that, because the REST spec does
> not enforce it.
>
> JB, what do you mean by participating on the Catalog RFC? Is there already
> an ongoing RFC?
>
> -Jack
>
>
> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Dan,
>>
>> I agree with your statement about REST Spec is not an implement but I
>> strongly disagree with your statement "impl of the spec can either be
>> compliant or not".
>>
>> The REST Catalog spec impl should be consistent with the REST Spec.
>> That's why a reference implementation in Iceberg would be a must, with a
>> TCK.
>>
>> The REST Spec should bridge/give access to Table/View metadata. I think
>> it would make sense to have a resource to GET the Table/View metadata, also
>> supporting PUT to update.
>> JSON Schema and eventually JSON RPC could help on some area here
>> (compliant with OpenAPI).
>>
>> In another thread, I propose to work on a Catalog RFC, exactly to target
>> this. I think it would make sense to have the REST/Catalog RFC as the main
>> catalog API, so it has to be both consistent (giving access to table/view
>> metadata) and extensible (via OpenAPI Extensions for instance).
>>
>> So, I agree with Jack: the minimum would be to have JSON metadata exposed
>> by the REST Spec.
>>
>> @Jack, short term I'm in favor of your proposal, long term, I propose to
>> participate on the Catalog RFC (REST Spec). WDYT ?
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks  a
>> écrit :
>>
>>> Hey Jack,
>>>
>>> I'm not sure I agree with the framing of this argument.  The REST Spec
>>> defines a protocol, not an implementation.
>>>
>>> The implementation of the spec can either be compliant or not.  So a
>>> REST Implementation that adheres to all the requirements (atomic location
>>> swap, json representation, etc.), would be compliant.  There's no
>>> requirement around who performs these operations and with REST, that is
>>> delegated to the server.  The optional metadata location doesn't mean that
>>> there isn't a metadata location, just that it may not be exposed directly
>>> in the response.
>>>
>>> Therefore, an implementation where you just store the table metadata in
>>> a Glue Table object, would not be compliant, currently.
>>>
>>> We've periodically discussed removing the storage requirement and I
>>> think there's a path forward to do that and would agree that standardizing
>>> on REST, but I wouldn't say the justification for making this push is that
>>> REST is not compliant so we can just ignore the table spec requirements.
>>>
>>> There are a few more things to consider, which is that not everything
>>> can use REST currently and making a hard cut away from file based metadata
>>> could bifurcate access to Iceberg data.  There are also aspects to the spec
>>> that reference the metadata paths (like metadata log, though it's
>>> optional), but would likely need to be addressed.
>>>
>>> -Dan
>>>
>>>
>>>
>>> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:
>>>
 Hi everyone,

 Just want to pull this specific topic out of the materialized view
 discussion thread. I noticed this during the MV discussion, and I think it
 is important to clarify this not just for the MV topic, but also for the
 ongoing discussion to consolidate all the different catalogs.

 *How the table/view spec defines Iceberg table/view*

 If we look into the table/view spec, the optimistic concurrency section
  requires the
 existence of a metadata file, and the atomic swap of the metadata file
 ensures serializable isolation. This implies 2 things:
 1. the metadata file in a storage that holds the information described
 in the rest of the spec.
 2. there is an object in a catalog that holds the pointer of the
 metadata file. What object and what catalog is implementation dependent,
 but these generalized concepts are always intact.


Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jack Ye
> The implementation of the spec can either be compliant or not.

This is exactly the problem we are talking about right? Just to give an
example, we cannot technically say that tables/views in the Tabular catalog
are Iceberg tables/views, because a REST spec-compliant catalog does not
need to follow the Iceberg spec to commit or store metadata. Even if you
say it is, there is no way to really prove that, because the REST spec does
not enforce it.

JB, what do you mean by participating on the Catalog RFC? Is there already
an ongoing RFC?

-Jack


On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré 
wrote:

> Hi Dan,
>
> I agree with your statement about REST Spec is not an implement but I
> strongly disagree with your statement "impl of the spec can either be
> compliant or not".
>
> The REST Catalog spec impl should be consistent with the REST Spec. That's
> why a reference implementation in Iceberg would be a must, with a TCK.
>
> The REST Spec should bridge/give access to Table/View metadata. I think it
> would make sense to have a resource to GET the Table/View metadata, also
> supporting PUT to update.
> JSON Schema and eventually JSON RPC could help on some area here
> (compliant with OpenAPI).
>
> In another thread, I propose to work on a Catalog RFC, exactly to target
> this. I think it would make sense to have the REST/Catalog RFC as the main
> catalog API, so it has to be both consistent (giving access to table/view
> metadata) and extensible (via OpenAPI Extensions for instance).
>
> So, I agree with Jack: the minimum would be to have JSON metadata exposed
> by the REST Spec.
>
> @Jack, short term I'm in favor of your proposal, long term, I propose to
> participate on the Catalog RFC (REST Spec). WDYT ?
>
> Thanks !
> Regards
> JB
>
>
> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks  a
> écrit :
>
>> Hey Jack,
>>
>> I'm not sure I agree with the framing of this argument.  The REST Spec
>> defines a protocol, not an implementation.
>>
>> The implementation of the spec can either be compliant or not.  So a REST
>> Implementation that adheres to all the requirements (atomic location swap,
>> json representation, etc.), would be compliant.  There's no requirement
>> around who performs these operations and with REST, that is delegated to
>> the server.  The optional metadata location doesn't mean that there isn't a
>> metadata location, just that it may not be exposed directly in the response.
>>
>> Therefore, an implementation where you just store the table metadata in a
>> Glue Table object, would not be compliant, currently.
>>
>> We've periodically discussed removing the storage requirement and I think
>> there's a path forward to do that and would agree that standardizing on
>> REST, but I wouldn't say the justification for making this push is that
>> REST is not compliant so we can just ignore the table spec requirements.
>>
>> There are a few more things to consider, which is that not everything can
>> use REST currently and making a hard cut away from file based metadata
>> could bifurcate access to Iceberg data.  There are also aspects to the spec
>> that reference the metadata paths (like metadata log, though it's
>> optional), but would likely need to be addressed.
>>
>> -Dan
>>
>>
>>
>> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:
>>
>>> Hi everyone,
>>>
>>> Just want to pull this specific topic out of the materialized view
>>> discussion thread. I noticed this during the MV discussion, and I think it
>>> is important to clarify this not just for the MV topic, but also for the
>>> ongoing discussion to consolidate all the different catalogs.
>>>
>>> *How the table/view spec defines Iceberg table/view*
>>>
>>> If we look into the table/view spec, the optimistic concurrency section
>>>  requires the
>>> existence of a metadata file, and the atomic swap of the metadata file
>>> ensures serializable isolation. This implies 2 things:
>>> 1. the metadata file in a storage that holds the information described
>>> in the rest of the spec.
>>> 2. there is an object in a catalog that holds the pointer of the
>>> metadata file. What object and what catalog is implementation dependent,
>>> but these generalized concepts are always intact.
>>>
>>> The JSON serialization parts of the spec plus the reader requirements
>>> also implies that the metadata file is in JSON format.
>>>
>>> So when we talk about an Iceberg table/view that is compliant with the
>>> spec, it is the combination of all these 5 requirements:
>>> 1. there is an object in the catalog representing this table/view
>>> 2. there is a pointer to a JSON metadata file in the object
>>> 3. the JSON metadata file exists in storage and contains the table/view
>>> metadata content
>>> 4. the metadata content is compliant with the standard described in the
>>> spec
>>> 5. serializable isolation is achieved by atomic swap of the object
>>> pointer
>>>
>>> *How non-REST catalogs are comp

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Jean-Baptiste Onofré
Hi Dan,

I agree with your statement about REST Spec is not an implement but I
strongly disagree with your statement "impl of the spec can either be
compliant or not".

The REST Catalog spec impl should be consistent with the REST Spec. That's
why a reference implementation in Iceberg would be a must, with a TCK.

The REST Spec should bridge/give access to Table/View metadata. I think it
would make sense to have a resource to GET the Table/View metadata, also
supporting PUT to update.
JSON Schema and eventually JSON RPC could help on some area here (compliant
with OpenAPI).

In another thread, I propose to work on a Catalog RFC, exactly to target
this. I think it would make sense to have the REST/Catalog RFC as the main
catalog API, so it has to be both consistent (giving access to table/view
metadata) and extensible (via OpenAPI Extensions for instance).

So, I agree with Jack: the minimum would be to have JSON metadata exposed
by the REST Spec.

@Jack, short term I'm in favor of your proposal, long term, I propose to
participate on the Catalog RFC (REST Spec). WDYT ?

Thanks !
Regards
JB


Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks  a
écrit :

> Hey Jack,
>
> I'm not sure I agree with the framing of this argument.  The REST Spec
> defines a protocol, not an implementation.
>
> The implementation of the spec can either be compliant or not.  So a REST
> Implementation that adheres to all the requirements (atomic location swap,
> json representation, etc.), would be compliant.  There's no requirement
> around who performs these operations and with REST, that is delegated to
> the server.  The optional metadata location doesn't mean that there isn't a
> metadata location, just that it may not be exposed directly in the response.
>
> Therefore, an implementation where you just store the table metadata in a
> Glue Table object, would not be compliant, currently.
>
> We've periodically discussed removing the storage requirement and I think
> there's a path forward to do that and would agree that standardizing on
> REST, but I wouldn't say the justification for making this push is that
> REST is not compliant so we can just ignore the table spec requirements.
>
> There are a few more things to consider, which is that not everything can
> use REST currently and making a hard cut away from file based metadata
> could bifurcate access to Iceberg data.  There are also aspects to the spec
> that reference the metadata paths (like metadata log, though it's
> optional), but would likely need to be addressed.
>
> -Dan
>
>
>
> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:
>
>> Hi everyone,
>>
>> Just want to pull this specific topic out of the materialized view
>> discussion thread. I noticed this during the MV discussion, and I think it
>> is important to clarify this not just for the MV topic, but also for the
>> ongoing discussion to consolidate all the different catalogs.
>>
>> *How the table/view spec defines Iceberg table/view*
>>
>> If we look into the table/view spec, the optimistic concurrency section
>>  requires the
>> existence of a metadata file, and the atomic swap of the metadata file
>> ensures serializable isolation. This implies 2 things:
>> 1. the metadata file in a storage that holds the information described in
>> the rest of the spec.
>> 2. there is an object in a catalog that holds the pointer of the metadata
>> file. What object and what catalog is implementation dependent, but these
>> generalized concepts are always intact.
>>
>> The JSON serialization parts of the spec plus the reader requirements
>> also implies that the metadata file is in JSON format.
>>
>> So when we talk about an Iceberg table/view that is compliant with the
>> spec, it is the combination of all these 5 requirements:
>> 1. there is an object in the catalog representing this table/view
>> 2. there is a pointer to a JSON metadata file in the object
>> 3. the JSON metadata file exists in storage and contains the table/view
>> metadata content
>> 4. the metadata content is compliant with the standard described in the
>> spec
>> 5. serializable isolation is achieved by atomic swap of the object pointer
>>
>> *How non-REST catalogs are compliant with the table/view spec*
>>
>> An implementation of the Iceberg table/view is essentially specifying:
>> 1. what is the exact implementation of the catalog, e.g. JDBC, Hive
>> metastore (HMS), Glue, etc.
>> 2. what is the object that represents a table, e.g. a row in the
>> "iceberg_tables" table in JDBC, a Table object in HMS/Glue, etc.
>> 3. how is the JSON metadata file pointer stored, e.g. a column in the
>> table's row in JDBC, metadata_location key in the Table's parameter map in
>> HMS/Glue, etc.
>> 4. how the atomic swap is implemented, e.g. SQL atomic update in JDBC,
>> conditional parameter update in HMS, conditional version update in Glue,
>> etc.
>>
>> *How the REST spec is NOT compliant with the table/view spec*
>

Re: Inconsistency between REST spec and table/view spec

2024-02-29 Thread Daniel Weeks
Hey Jack,

I'm not sure I agree with the framing of this argument.  The REST Spec
defines a protocol, not an implementation.

The implementation of the spec can either be compliant or not.  So a REST
Implementation that adheres to all the requirements (atomic location swap,
json representation, etc.), would be compliant.  There's no requirement
around who performs these operations and with REST, that is delegated to
the server.  The optional metadata location doesn't mean that there isn't a
metadata location, just that it may not be exposed directly in the response.

Therefore, an implementation where you just store the table metadata in a
Glue Table object, would not be compliant, currently.

We've periodically discussed removing the storage requirement and I think
there's a path forward to do that and would agree that standardizing on
REST, but I wouldn't say the justification for making this push is that
REST is not compliant so we can just ignore the table spec requirements.

There are a few more things to consider, which is that not everything can
use REST currently and making a hard cut away from file based metadata
could bifurcate access to Iceberg data.  There are also aspects to the spec
that reference the metadata paths (like metadata log, though it's
optional), but would likely need to be addressed.

-Dan



On Thu, Feb 29, 2024 at 11:13 AM Jack Ye  wrote:

> Hi everyone,
>
> Just want to pull this specific topic out of the materialized view
> discussion thread. I noticed this during the MV discussion, and I think it
> is important to clarify this not just for the MV topic, but also for the
> ongoing discussion to consolidate all the different catalogs.
>
> *How the table/view spec defines Iceberg table/view*
>
> If we look into the table/view spec, the optimistic concurrency section
>  requires the
> existence of a metadata file, and the atomic swap of the metadata file
> ensures serializable isolation. This implies 2 things:
> 1. the metadata file in a storage that holds the information described in
> the rest of the spec.
> 2. there is an object in a catalog that holds the pointer of the metadata
> file. What object and what catalog is implementation dependent, but these
> generalized concepts are always intact.
>
> The JSON serialization parts of the spec plus the reader requirements also
> implies that the metadata file is in JSON format.
>
> So when we talk about an Iceberg table/view that is compliant with the
> spec, it is the combination of all these 5 requirements:
> 1. there is an object in the catalog representing this table/view
> 2. there is a pointer to a JSON metadata file in the object
> 3. the JSON metadata file exists in storage and contains the table/view
> metadata content
> 4. the metadata content is compliant with the standard described in the
> spec
> 5. serializable isolation is achieved by atomic swap of the object pointer
>
> *How non-REST catalogs are compliant with the table/view spec*
>
> An implementation of the Iceberg table/view is essentially specifying:
> 1. what is the exact implementation of the catalog, e.g. JDBC, Hive
> metastore (HMS), Glue, etc.
> 2. what is the object that represents a table, e.g. a row in the
> "iceberg_tables" table in JDBC, a Table object in HMS/Glue, etc.
> 3. how is the JSON metadata file pointer stored, e.g. a column in the
> table's row in JDBC, metadata_location key in the Table's parameter map in
> HMS/Glue, etc.
> 4. how the atomic swap is implemented, e.g. SQL atomic update in JDBC,
> conditional parameter update in HMS, conditional version update in Glue,
> etc.
>
> *How the REST spec is NOT compliant with the table/view spec*
>
> The REST spec technically does not match the following table/view spec
> requirements:
> 2. there is a pointer to a JSON metadata file in the object
> 3. the JSON metadata file exists in storage and contains the table/view
> metadata content
> 5. serializable isolation is achieved by atomic swap of the object pointer
>
> The key parts in REST spec that are not compliant are:
> 1. metadata-location field is optional in LoadTableResponse
> 
> 2. pointer swap is not enforced in the UpdateTable
> 
> operation
>
> Therefore, it opens the door for a REST service to be completely not
> dependent on a JSON metadata file, store the Iceberg table/view metadata
> not as a file, and achieve much better performance characteristics than
> other catalogs. This technically gives a unique advantage for REST catalog
> adopters that is not there for non-REST catalogs like HMS and Glue.
>
> *How can we fix this?*
>
> I suggest the following:
>
> Firstly, I think it is good that we try to remove the requirements of JSON
> metadata file pointer and atomic pointer swap. We know these requir