Re: Change default format-version of our forked Iceberg to v2

2023-03-20 Thread Manu Zhang
Also, UTs like TestCreateTableAsSelect#testCreateRTASWithPartitionSpecChanging
need update to match different logics and results for partition spec update
in v2.

Regards,
Manu

On Mon, Mar 20, 2023 at 8:57 PM Manu Zhang  wrote:

> Thanks Gabor, I realized it's already done after sending out the last
> reply. The setting is actually "table-default.".
> In case someone else needs a back-port as well, the related PR is
> https://github.com/apache/iceberg/pull/4011
>
> Regards,
> Manu
>
> On Mon, Mar 20, 2023 at 6:09 PM Gabor Kaszab
>  wrote:
>
>> I believe the conclusion here was that there is already a catalog level
>> property with the purpose of adding table defaults. This could be used to
>> make the default table format to v2 on a particular catalog. See my last
>> email on this thread. One thing I haven't checked is if this property works
>> for all the catalog types or just a subset of them. But I think it's worth
>> a try to see if it works in your environment.
>> It's "table.default." setting
>>
>> On Mon, Mar 20, 2023 at 5:41 AM Manu Zhang 
>> wrote:
>>
>>> Is there any progress to make default format version a catalog property?
>>>
>>> Thanks,
>>> Manu
>>>
>>> On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab
>>>  wrote:
>>>
 I also ran into this "table-default." setting
 
 prefix. For me it seems that it's a catalog level config so it's enough to
 provide e.g. "table-default.format-version" = "2" to each catalog as a
 startup flag. For me it seems that catalogs derived from
 BaseMetastoreCatalog use this table default prefix
 
 .

 Gabor

 On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu  wrote:

> The functionality has been there if we are talking about setting the
> default format at the Iceberg catalog.  For example, we can set a catalog
> like this. All tables created will be v2 tables.
> spark.sql.catalog.hive_prod.table-default.format-version = "2"
>
> Of course, we need to set it for each Spark App. Setting Trino
> would be easier. It would be one catalog level change.
>
> Best,
>
> Yufei
>
> `This is not a contribution`
>
>
> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
>  wrote:
>
>> It seems we have a consensus on the approach. I can take a look at
>> implementing this if no one has any objections.
>>
>> Gabor
>>
>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:
>>
>>> That sounds like a good idea to me.
>>>
>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye 
>>> wrote:
>>>
 > I think the issue is that all of the built-in catalogs currently
 call the version of `newTableMetadata` that defaults to v1.

 Yes I think this seems like the key issue for the catalogs that
 extend BaseMetastoreCatalog. Looks like we should make changes to make 
 the
 default format version a catalog property, instead of hard-coded in
 TableMetadata?

 -Jack

 On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <
 [email protected]> wrote:

> Hi Gabor,
>
> It makes sense to me. AFAIK, as the tables creation comes from
> catalog
> "controller", they can "decide" the version. So, it would be each
> catalog to deal with the way/version they want to create tables.
>
> Regards
> JB
>
> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <
> [email protected]> wrote:
> >
> > Naively asking, can't we add some property to tell Iceberg which
> version to use as default when creating tables? (If there is no such
> setting currently)
> >
> > Gabor
> >
> > Jack Ye  ezt írta (időpont: 2023. jan.
> 11., Sze 20:04):
> >>
> >> Should we start a community vote on this?
> >>
> >> I remember in today's community sync meeting Russell briefly
> discussed about some compaction supports that are not there yet and 
> some
> users are struggled with small delete files issue, and it was to some
> extent why Spark is still defaulting v1.
> >>
> >> Regarding feature side, changelog scan is mostly there in
> Spark, and there will also likely be movements on Trino side for it 
> very
> soon.
> >>
> >> Overall, I think it would be beneficial to move default to v2,
> which could incentivize the completion of those missing parts across
> engines.
> >>
> >> Best,
> >>

Re: Change default format-version of our forked Iceberg to v2

2023-03-20 Thread Manu Zhang
Thanks Gabor, I realized it's already done after sending out the last
reply. The setting is actually "table-default.".
In case someone else needs a back-port as well, the related PR is
https://github.com/apache/iceberg/pull/4011

Regards,
Manu

On Mon, Mar 20, 2023 at 6:09 PM Gabor Kaszab
 wrote:

> I believe the conclusion here was that there is already a catalog level
> property with the purpose of adding table defaults. This could be used to
> make the default table format to v2 on a particular catalog. See my last
> email on this thread. One thing I haven't checked is if this property works
> for all the catalog types or just a subset of them. But I think it's worth
> a try to see if it works in your environment.
> It's "table.default." setting
>
> On Mon, Mar 20, 2023 at 5:41 AM Manu Zhang 
> wrote:
>
>> Is there any progress to make default format version a catalog property?
>>
>> Thanks,
>> Manu
>>
>> On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab
>>  wrote:
>>
>>> I also ran into this "table-default." setting
>>> 
>>> prefix. For me it seems that it's a catalog level config so it's enough to
>>> provide e.g. "table-default.format-version" = "2" to each catalog as a
>>> startup flag. For me it seems that catalogs derived from
>>> BaseMetastoreCatalog use this table default prefix
>>> 
>>> .
>>>
>>> Gabor
>>>
>>> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu  wrote:
>>>
 The functionality has been there if we are talking about setting the
 default format at the Iceberg catalog.  For example, we can set a catalog
 like this. All tables created will be v2 tables.
 spark.sql.catalog.hive_prod.table-default.format-version = "2"

 Of course, we need to set it for each Spark App. Setting Trino would be
 easier. It would be one catalog level change.

 Best,

 Yufei

 `This is not a contribution`


 On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
  wrote:

> It seems we have a consensus on the approach. I can take a look at
> implementing this if no one has any objections.
>
> Gabor
>
> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:
>
>> That sounds like a good idea to me.
>>
>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:
>>
>>> > I think the issue is that all of the built-in catalogs currently
>>> call the version of `newTableMetadata` that defaults to v1.
>>>
>>> Yes I think this seems like the key issue for the catalogs that
>>> extend BaseMetastoreCatalog. Looks like we should make changes to make 
>>> the
>>> default format version a catalog property, instead of hard-coded in
>>> TableMetadata?
>>>
>>> -Jack
>>>
>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <
>>> [email protected]> wrote:
>>>
 Hi Gabor,

 It makes sense to me. AFAIK, as the tables creation comes from
 catalog
 "controller", they can "decide" the version. So, it would be each
 catalog to deal with the way/version they want to create tables.

 Regards
 JB

 On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <
 [email protected]> wrote:
 >
 > Naively asking, can't we add some property to tell Iceberg which
 version to use as default when creating tables? (If there is no such
 setting currently)
 >
 > Gabor
 >
 > Jack Ye  ezt írta (időpont: 2023. jan. 11.,
 Sze 20:04):
 >>
 >> Should we start a community vote on this?
 >>
 >> I remember in today's community sync meeting Russell briefly
 discussed about some compaction supports that are not there yet and 
 some
 users are struggled with small delete files issue, and it was to some
 extent why Spark is still defaulting v1.
 >>
 >> Regarding feature side, changelog scan is mostly there in Spark,
 and there will also likely be movements on Trino side for it very soon.
 >>
 >> Overall, I think it would be beneficial to move default to v2,
 which could incentivize the completion of those missing parts across
 engines.
 >>
 >> Best,
 >> Jack Ye
 >>
 >>
 >>
 >>
 >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
 [email protected]> wrote:
 >>>
 >>> Hi,
 >>>
 >>> FWIW Trino already creates v2 tables by default.
 >>> Thought it's worth sharing for context.
 >>>
 >>> Best
 >>> PF
 >>>
 >>>
>

Re: Change default format-version of our forked Iceberg to v2

2023-03-20 Thread Gabor Kaszab
I believe the conclusion here was that there is already a catalog level
property with the purpose of adding table defaults. This could be used to
make the default table format to v2 on a particular catalog. See my last
email on this thread. One thing I haven't checked is if this property works
for all the catalog types or just a subset of them. But I think it's worth
a try to see if it works in your environment.
It's "table.default." setting

On Mon, Mar 20, 2023 at 5:41 AM Manu Zhang  wrote:

> Is there any progress to make default format version a catalog property?
>
> Thanks,
> Manu
>
> On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab
>  wrote:
>
>> I also ran into this "table-default." setting
>> 
>> prefix. For me it seems that it's a catalog level config so it's enough to
>> provide e.g. "table-default.format-version" = "2" to each catalog as a
>> startup flag. For me it seems that catalogs derived from
>> BaseMetastoreCatalog use this table default prefix
>> 
>> .
>>
>> Gabor
>>
>> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu  wrote:
>>
>>> The functionality has been there if we are talking about setting the
>>> default format at the Iceberg catalog.  For example, we can set a catalog
>>> like this. All tables created will be v2 tables.
>>> spark.sql.catalog.hive_prod.table-default.format-version = "2"
>>>
>>> Of course, we need to set it for each Spark App. Setting Trino would be
>>> easier. It would be one catalog level change.
>>>
>>> Best,
>>>
>>> Yufei
>>>
>>> `This is not a contribution`
>>>
>>>
>>> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
>>>  wrote:
>>>
 It seems we have a consensus on the approach. I can take a look at
 implementing this if no one has any objections.

 Gabor

 On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:

> That sounds like a good idea to me.
>
> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:
>
>> > I think the issue is that all of the built-in catalogs currently
>> call the version of `newTableMetadata` that defaults to v1.
>>
>> Yes I think this seems like the key issue for the catalogs that
>> extend BaseMetastoreCatalog. Looks like we should make changes to make 
>> the
>> default format version a catalog property, instead of hard-coded in
>> TableMetadata?
>>
>> -Jack
>>
>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <
>> [email protected]> wrote:
>>
>>> Hi Gabor,
>>>
>>> It makes sense to me. AFAIK, as the tables creation comes from
>>> catalog
>>> "controller", they can "decide" the version. So, it would be each
>>> catalog to deal with the way/version they want to create tables.
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <
>>> [email protected]> wrote:
>>> >
>>> > Naively asking, can't we add some property to tell Iceberg which
>>> version to use as default when creating tables? (If there is no such
>>> setting currently)
>>> >
>>> > Gabor
>>> >
>>> > Jack Ye  ezt írta (időpont: 2023. jan. 11.,
>>> Sze 20:04):
>>> >>
>>> >> Should we start a community vote on this?
>>> >>
>>> >> I remember in today's community sync meeting Russell briefly
>>> discussed about some compaction supports that are not there yet and some
>>> users are struggled with small delete files issue, and it was to some
>>> extent why Spark is still defaulting v1.
>>> >>
>>> >> Regarding feature side, changelog scan is mostly there in Spark,
>>> and there will also likely be movements on Trino side for it very soon.
>>> >>
>>> >> Overall, I think it would be beneficial to move default to v2,
>>> which could incentivize the completion of those missing parts across
>>> engines.
>>> >>
>>> >> Best,
>>> >> Jack Ye
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
>>> [email protected]> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> FWIW Trino already creates v2 tables by default.
>>> >>> Thought it's worth sharing for context.
>>> >>>
>>> >>> Best
>>> >>> PF
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang <
>>> [email protected]> wrote:
>>> 
>>>  Hi all,
>>> 
>>>  We've maintained a forked Iceberg internally and all our use
>>> cases involve v2 tables with row-level updates and deletes. Our users 
>>> need
>>> to remember to create table with the `'format-version'='2'` option or 
>>> alter
>>> table afterw

Re: Change default format-version of our forked Iceberg to v2

2023-03-19 Thread Manu Zhang
Is there any progress to make default format version a catalog property?

Thanks,
Manu

On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab
 wrote:

> I also ran into this "table-default." setting
> 
> prefix. For me it seems that it's a catalog level config so it's enough to
> provide e.g. "table-default.format-version" = "2" to each catalog as a
> startup flag. For me it seems that catalogs derived from
> BaseMetastoreCatalog use this table default prefix
> 
> .
>
> Gabor
>
> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu  wrote:
>
>> The functionality has been there if we are talking about setting the
>> default format at the Iceberg catalog.  For example, we can set a catalog
>> like this. All tables created will be v2 tables.
>> spark.sql.catalog.hive_prod.table-default.format-version = "2"
>>
>> Of course, we need to set it for each Spark App. Setting Trino would be
>> easier. It would be one catalog level change.
>>
>> Best,
>>
>> Yufei
>>
>> `This is not a contribution`
>>
>>
>> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
>>  wrote:
>>
>>> It seems we have a consensus on the approach. I can take a look at
>>> implementing this if no one has any objections.
>>>
>>> Gabor
>>>
>>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:
>>>
 That sounds like a good idea to me.

 On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:

> > I think the issue is that all of the built-in catalogs currently
> call the version of `newTableMetadata` that defaults to v1.
>
> Yes I think this seems like the key issue for the catalogs that extend
> BaseMetastoreCatalog. Looks like we should make changes to make the 
> default
> format version a catalog property, instead of hard-coded in TableMetadata?
>
> -Jack
>
> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Gabor,
>>
>> It makes sense to me. AFAIK, as the tables creation comes from catalog
>> "controller", they can "decide" the version. So, it would be each
>> catalog to deal with the way/version they want to create tables.
>>
>> Regards
>> JB
>>
>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
>> wrote:
>> >
>> > Naively asking, can't we add some property to tell Iceberg which
>> version to use as default when creating tables? (If there is no such
>> setting currently)
>> >
>> > Gabor
>> >
>> > Jack Ye  ezt írta (időpont: 2023. jan. 11.,
>> Sze 20:04):
>> >>
>> >> Should we start a community vote on this?
>> >>
>> >> I remember in today's community sync meeting Russell briefly
>> discussed about some compaction supports that are not there yet and some
>> users are struggled with small delete files issue, and it was to some
>> extent why Spark is still defaulting v1.
>> >>
>> >> Regarding feature side, changelog scan is mostly there in Spark,
>> and there will also likely be movements on Trino side for it very soon.
>> >>
>> >> Overall, I think it would be beneficial to move default to v2,
>> which could incentivize the completion of those missing parts across
>> engines.
>> >>
>> >> Best,
>> >> Jack Ye
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
>> [email protected]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> FWIW Trino already creates v2 tables by default.
>> >>> Thought it's worth sharing for context.
>> >>>
>> >>> Best
>> >>> PF
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang <
>> [email protected]> wrote:
>> 
>>  Hi all,
>> 
>>  We've maintained a forked Iceberg internally and all our use
>> cases involve v2 tables with row-level updates and deletes. Our users 
>> need
>> to remember to create table with the `'format-version'='2'` option or 
>> alter
>> table afterwards.
>> 
>>  I'm thinking about changing the default format-version of our
>> forked Iceberg to v2 . Is there any concern for this change? Any hidden
>> issues I've missed?
>> 
>>  Thanks,
>>  Manu
>>
>

 --
 Ryan Blue
 Tabular

>>>


Re: Change default format-version of our forked Iceberg to v2

2023-01-18 Thread Gabor Kaszab
I also ran into this "table-default." setting

prefix. For me it seems that it's a catalog level config so it's enough to
provide e.g. "table-default.format-version" = "2" to each catalog as a
startup flag. For me it seems that catalogs derived from
BaseMetastoreCatalog use this table default prefix

.

Gabor

On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu  wrote:

> The functionality has been there if we are talking about setting the
> default format at the Iceberg catalog.  For example, we can set a catalog
> like this. All tables created will be v2 tables.
> spark.sql.catalog.hive_prod.table-default.format-version = "2"
>
> Of course, we need to set it for each Spark App. Setting Trino would be
> easier. It would be one catalog level change.
>
> Best,
>
> Yufei
>
> `This is not a contribution`
>
>
> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
>  wrote:
>
>> It seems we have a consensus on the approach. I can take a look at
>> implementing this if no one has any objections.
>>
>> Gabor
>>
>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:
>>
>>> That sounds like a good idea to me.
>>>
>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:
>>>
 > I think the issue is that all of the built-in catalogs currently call
 the version of `newTableMetadata` that defaults to v1.

 Yes I think this seems like the key issue for the catalogs that extend
 BaseMetastoreCatalog. Looks like we should make changes to make the default
 format version a catalog property, instead of hard-coded in TableMetadata?

 -Jack

 On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
 wrote:

> Hi Gabor,
>
> It makes sense to me. AFAIK, as the tables creation comes from catalog
> "controller", they can "decide" the version. So, it would be each
> catalog to deal with the way/version they want to create tables.
>
> Regards
> JB
>
> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
> wrote:
> >
> > Naively asking, can't we add some property to tell Iceberg which
> version to use as default when creating tables? (If there is no such
> setting currently)
> >
> > Gabor
> >
> > Jack Ye  ezt írta (időpont: 2023. jan. 11.,
> Sze 20:04):
> >>
> >> Should we start a community vote on this?
> >>
> >> I remember in today's community sync meeting Russell briefly
> discussed about some compaction supports that are not there yet and some
> users are struggled with small delete files issue, and it was to some
> extent why Spark is still defaulting v1.
> >>
> >> Regarding feature side, changelog scan is mostly there in Spark,
> and there will also likely be movements on Trino side for it very soon.
> >>
> >> Overall, I think it would be beneficial to move default to v2,
> which could incentivize the completion of those missing parts across
> engines.
> >>
> >> Best,
> >> Jack Ye
> >>
> >>
> >>
> >>
> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
> [email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> FWIW Trino already creates v2 tables by default.
> >>> Thought it's worth sharing for context.
> >>>
> >>> Best
> >>> PF
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang <
> [email protected]> wrote:
> 
>  Hi all,
> 
>  We've maintained a forked Iceberg internally and all our use
> cases involve v2 tables with row-level updates and deletes. Our users need
> to remember to create table with the `'format-version'='2'` option or 
> alter
> table afterwards.
> 
>  I'm thinking about changing the default format-version of our
> forked Iceberg to v2 . Is there any concern for this change? Any hidden
> issues I've missed?
> 
>  Thanks,
>  Manu
>

>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>


Re: Change default format-version of our forked Iceberg to v2

2023-01-17 Thread Yufei Gu
The functionality has been there if we are talking about setting the
default format at the Iceberg catalog.  For example, we can set a catalog
like this. All tables created will be v2 tables.
spark.sql.catalog.hive_prod.table-default.format-version = "2"

Of course, we need to set it for each Spark App. Setting Trino would be
easier. It would be one catalog level change.

Best,

Yufei

`This is not a contribution`


On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
 wrote:

> It seems we have a consensus on the approach. I can take a look at
> implementing this if no one has any objections.
>
> Gabor
>
> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:
>
>> That sounds like a good idea to me.
>>
>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:
>>
>>> > I think the issue is that all of the built-in catalogs currently call
>>> the version of `newTableMetadata` that defaults to v1.
>>>
>>> Yes I think this seems like the key issue for the catalogs that extend
>>> BaseMetastoreCatalog. Looks like we should make changes to make the default
>>> format version a catalog property, instead of hard-coded in TableMetadata?
>>>
>>> -Jack
>>>
>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi Gabor,

 It makes sense to me. AFAIK, as the tables creation comes from catalog
 "controller", they can "decide" the version. So, it would be each
 catalog to deal with the way/version they want to create tables.

 Regards
 JB

 On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
 wrote:
 >
 > Naively asking, can't we add some property to tell Iceberg which
 version to use as default when creating tables? (If there is no such
 setting currently)
 >
 > Gabor
 >
 > Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze
 20:04):
 >>
 >> Should we start a community vote on this?
 >>
 >> I remember in today's community sync meeting Russell briefly
 discussed about some compaction supports that are not there yet and some
 users are struggled with small delete files issue, and it was to some
 extent why Spark is still defaulting v1.
 >>
 >> Regarding feature side, changelog scan is mostly there in Spark, and
 there will also likely be movements on Trino side for it very soon.
 >>
 >> Overall, I think it would be beneficial to move default to v2, which
 could incentivize the completion of those missing parts across engines.
 >>
 >> Best,
 >> Jack Ye
 >>
 >>
 >>
 >>
 >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
 [email protected]> wrote:
 >>>
 >>> Hi,
 >>>
 >>> FWIW Trino already creates v2 tables by default.
 >>> Thought it's worth sharing for context.
 >>>
 >>> Best
 >>> PF
 >>>
 >>>
 >>>
 >>>
 >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang <
 [email protected]> wrote:
 
  Hi all,
 
  We've maintained a forked Iceberg internally and all our use cases
 involve v2 tables with row-level updates and deletes. Our users need to
 remember to create table with the `'format-version'='2'` option or alter
 table afterwards.
 
  I'm thinking about changing the default format-version of our
 forked Iceberg to v2 . Is there any concern for this change? Any hidden
 issues I've missed?
 
  Thanks,
  Manu

>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>


Re: Change default format-version of our forked Iceberg to v2

2023-01-16 Thread Gabor Kaszab
It seems we have a consensus on the approach. I can take a look at
implementing this if no one has any objections.

Gabor

On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue  wrote:

> That sounds like a good idea to me.
>
> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:
>
>> > I think the issue is that all of the built-in catalogs currently call
>> the version of `newTableMetadata` that defaults to v1.
>>
>> Yes I think this seems like the key issue for the catalogs that extend
>> BaseMetastoreCatalog. Looks like we should make changes to make the default
>> format version a catalog property, instead of hard-coded in TableMetadata?
>>
>> -Jack
>>
>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Gabor,
>>>
>>> It makes sense to me. AFAIK, as the tables creation comes from catalog
>>> "controller", they can "decide" the version. So, it would be each
>>> catalog to deal with the way/version they want to create tables.
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
>>> wrote:
>>> >
>>> > Naively asking, can't we add some property to tell Iceberg which
>>> version to use as default when creating tables? (If there is no such
>>> setting currently)
>>> >
>>> > Gabor
>>> >
>>> > Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze
>>> 20:04):
>>> >>
>>> >> Should we start a community vote on this?
>>> >>
>>> >> I remember in today's community sync meeting Russell briefly
>>> discussed about some compaction supports that are not there yet and some
>>> users are struggled with small delete files issue, and it was to some
>>> extent why Spark is still defaulting v1.
>>> >>
>>> >> Regarding feature side, changelog scan is mostly there in Spark, and
>>> there will also likely be movements on Trino side for it very soon.
>>> >>
>>> >> Overall, I think it would be beneficial to move default to v2, which
>>> could incentivize the completion of those missing parts across engines.
>>> >>
>>> >> Best,
>>> >> Jack Ye
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
>>> [email protected]> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> FWIW Trino already creates v2 tables by default.
>>> >>> Thought it's worth sharing for context.
>>> >>>
>>> >>> Best
>>> >>> PF
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
>>> wrote:
>>> 
>>>  Hi all,
>>> 
>>>  We've maintained a forked Iceberg internally and all our use cases
>>> involve v2 tables with row-level updates and deletes. Our users need to
>>> remember to create table with the `'format-version'='2'` option or alter
>>> table afterwards.
>>> 
>>>  I'm thinking about changing the default format-version of our
>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden
>>> issues I've missed?
>>> 
>>>  Thanks,
>>>  Manu
>>>
>>
>
> --
> Ryan Blue
> Tabular
>


Re: Change default format-version of our forked Iceberg to v2

2023-01-13 Thread Ryan Blue
That sounds like a good idea to me.

On Fri, Jan 13, 2023 at 11:04 AM Jack Ye  wrote:

> > I think the issue is that all of the built-in catalogs currently call
> the version of `newTableMetadata` that defaults to v1.
>
> Yes I think this seems like the key issue for the catalogs that extend
> BaseMetastoreCatalog. Looks like we should make changes to make the default
> format version a catalog property, instead of hard-coded in TableMetadata?
>
> -Jack
>
> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Gabor,
>>
>> It makes sense to me. AFAIK, as the tables creation comes from catalog
>> "controller", they can "decide" the version. So, it would be each
>> catalog to deal with the way/version they want to create tables.
>>
>> Regards
>> JB
>>
>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
>> wrote:
>> >
>> > Naively asking, can't we add some property to tell Iceberg which
>> version to use as default when creating tables? (If there is no such
>> setting currently)
>> >
>> > Gabor
>> >
>> > Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze
>> 20:04):
>> >>
>> >> Should we start a community vote on this?
>> >>
>> >> I remember in today's community sync meeting Russell briefly discussed
>> about some compaction supports that are not there yet and some users are
>> struggled with small delete files issue, and it was to some extent why
>> Spark is still defaulting v1.
>> >>
>> >> Regarding feature side, changelog scan is mostly there in Spark, and
>> there will also likely be movements on Trino side for it very soon.
>> >>
>> >> Overall, I think it would be beneficial to move default to v2, which
>> could incentivize the completion of those missing parts across engines.
>> >>
>> >> Best,
>> >> Jack Ye
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
>> [email protected]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> FWIW Trino already creates v2 tables by default.
>> >>> Thought it's worth sharing for context.
>> >>>
>> >>> Best
>> >>> PF
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
>> wrote:
>> 
>>  Hi all,
>> 
>>  We've maintained a forked Iceberg internally and all our use cases
>> involve v2 tables with row-level updates and deletes. Our users need to
>> remember to create table with the `'format-version'='2'` option or alter
>> table afterwards.
>> 
>>  I'm thinking about changing the default format-version of our forked
>> Iceberg to v2 . Is there any concern for this change? Any hidden issues
>> I've missed?
>> 
>>  Thanks,
>>  Manu
>>
>

-- 
Ryan Blue
Tabular


Re: Change default format-version of our forked Iceberg to v2

2023-01-13 Thread Jack Ye
> I think the issue is that all of the built-in catalogs currently call the
version of `newTableMetadata` that defaults to v1.

Yes I think this seems like the key issue for the catalogs that extend
BaseMetastoreCatalog. Looks like we should make changes to make the default
format version a catalog property, instead of hard-coded in TableMetadata?

-Jack

On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré 
wrote:

> Hi Gabor,
>
> It makes sense to me. AFAIK, as the tables creation comes from catalog
> "controller", they can "decide" the version. So, it would be each
> catalog to deal with the way/version they want to create tables.
>
> Regards
> JB
>
> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab 
> wrote:
> >
> > Naively asking, can't we add some property to tell Iceberg which version
> to use as default when creating tables? (If there is no such setting
> currently)
> >
> > Gabor
> >
> > Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze
> 20:04):
> >>
> >> Should we start a community vote on this?
> >>
> >> I remember in today's community sync meeting Russell briefly discussed
> about some compaction supports that are not there yet and some users are
> struggled with small delete files issue, and it was to some extent why
> Spark is still defaulting v1.
> >>
> >> Regarding feature side, changelog scan is mostly there in Spark, and
> there will also likely be movements on Trino side for it very soon.
> >>
> >> Overall, I think it would be beneficial to move default to v2, which
> could incentivize the completion of those missing parts across engines.
> >>
> >> Best,
> >> Jack Ye
> >>
> >>
> >>
> >>
> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
> [email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> FWIW Trino already creates v2 tables by default.
> >>> Thought it's worth sharing for context.
> >>>
> >>> Best
> >>> PF
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
> wrote:
> 
>  Hi all,
> 
>  We've maintained a forked Iceberg internally and all our use cases
> involve v2 tables with row-level updates and deletes. Our users need to
> remember to create table with the `'format-version'='2'` option or alter
> table afterwards.
> 
>  I'm thinking about changing the default format-version of our forked
> Iceberg to v2 . Is there any concern for this change? Any hidden issues
> I've missed?
> 
>  Thanks,
>  Manu
>


Re: Change default format-version of our forked Iceberg to v2

2023-01-12 Thread Jean-Baptiste Onofré
Hi Gabor,

It makes sense to me. AFAIK, as the tables creation comes from catalog
"controller", they can "decide" the version. So, it would be each
catalog to deal with the way/version they want to create tables.

Regards
JB

On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab  wrote:
>
> Naively asking, can't we add some property to tell Iceberg which version to 
> use as default when creating tables? (If there is no such setting currently)
>
> Gabor
>
> Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze 20:04):
>>
>> Should we start a community vote on this?
>>
>> I remember in today's community sync meeting Russell briefly discussed about 
>> some compaction supports that are not there yet and some users are struggled 
>> with small delete files issue, and it was to some extent why Spark is still 
>> defaulting v1.
>>
>> Regarding feature side, changelog scan is mostly there in Spark, and there 
>> will also likely be movements on Trino side for it very soon.
>>
>> Overall, I think it would be beneficial to move default to v2, which could 
>> incentivize the completion of those missing parts across engines.
>>
>> Best,
>> Jack Ye
>>
>>
>>
>>
>> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen  
>> wrote:
>>>
>>> Hi,
>>>
>>> FWIW Trino already creates v2 tables by default.
>>> Thought it's worth sharing for context.
>>>
>>> Best
>>> PF
>>>
>>>
>>>
>>>
>>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang  wrote:

 Hi all,

 We've maintained a forked Iceberg internally and all our use cases involve 
 v2 tables with row-level updates and deletes. Our users need to remember 
 to create table with the `'format-version'='2'` option or alter table 
 afterwards.

 I'm thinking about changing the default format-version of our forked 
 Iceberg to v2 . Is there any concern for this change? Any hidden issues 
 I've missed?

 Thanks,
 Manu


Re: Change default format-version of our forked Iceberg to v2

2023-01-12 Thread Ryan Blue
Gabor makes a good point. This is controlled by catalogs, which can choose
to create either one. The Catalog interface is version agnostic. I think
the issue is that all of the built-in catalogs currently call the version
of `newTableMetadata` that defaults to v1.

This is mostly an administrator setting, too. Do we want to have a library
default when this is something that should probably be chosen by a platform
person? It depends on what engines you're using, so we may just want to
make it a catalog config option.

On Wed, Jan 11, 2023 at 2:12 PM Gabor Kaszab  wrote:

> Naively asking, can't we add some property to tell Iceberg which version
> to use as default when creating tables? (If there is no such setting
> currently)
>
> Gabor
>
> Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze
> 20:04):
>
>> Should we start a community vote on this?
>>
>> I remember in today's community sync meeting Russell briefly discussed
>> about some compaction supports that are not there yet and some users are
>> struggled with small delete files issue, and it was to some extent why
>> Spark is still defaulting v1.
>>
>> Regarding feature side, changelog scan is mostly there in Spark, and
>> there will also likely be movements on Trino side for it very soon.
>>
>> Overall, I think it would be beneficial to move default to v2, which
>> could incentivize the completion of those missing parts across engines.
>>
>> Best,
>> Jack Ye
>>
>>
>>
>>
>> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen 
>> wrote:
>>
>>> Hi,
>>>
>>> FWIW Trino already creates v2 tables by default.
>>> Thought it's worth sharing for context.
>>>
>>> Best
>>> PF
>>>
>>>
>>>
>>>
>>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
>>> wrote:
>>>
 Hi all,

 We've maintained a forked Iceberg internally and all our use cases
 involve v2 tables with row-level updates and deletes. Our users need to
 remember to create table with the `'format-version'='2'` option or alter
 table afterwards.

 I'm thinking about changing the default format-version of our
 forked Iceberg to v2 . Is there any concern for this change? Any hidden
 issues I've missed?

 Thanks,
 Manu

>>>

-- 
Ryan Blue
Tabular


Re: Change default format-version of our forked Iceberg to v2

2023-01-11 Thread Gabor Kaszab
Naively asking, can't we add some property to tell Iceberg which version to
use as default when creating tables? (If there is no such setting currently)

Gabor

Jack Ye  ezt írta (időpont: 2023. jan. 11., Sze 20:04):

> Should we start a community vote on this?
>
> I remember in today's community sync meeting Russell briefly discussed
> about some compaction supports that are not there yet and some users are
> struggled with small delete files issue, and it was to some extent why
> Spark is still defaulting v1.
>
> Regarding feature side, changelog scan is mostly there in Spark, and there
> will also likely be movements on Trino side for it very soon.
>
> Overall, I think it would be beneficial to move default to v2, which could
> incentivize the completion of those missing parts across engines.
>
> Best,
> Jack Ye
>
>
>
>
> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen 
> wrote:
>
>> Hi,
>>
>> FWIW Trino already creates v2 tables by default.
>> Thought it's worth sharing for context.
>>
>> Best
>> PF
>>
>>
>>
>>
>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
>> wrote:
>>
>>> Hi all,
>>>
>>> We've maintained a forked Iceberg internally and all our use cases
>>> involve v2 tables with row-level updates and deletes. Our users need to
>>> remember to create table with the `'format-version'='2'` option or alter
>>> table afterwards.
>>>
>>> I'm thinking about changing the default format-version of our
>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden
>>> issues I've missed?
>>>
>>> Thanks,
>>> Manu
>>>
>>


Re: Change default format-version of our forked Iceberg to v2

2023-01-11 Thread Jack Ye
Should we start a community vote on this?

I remember in today's community sync meeting Russell briefly discussed
about some compaction supports that are not there yet and some users are
struggled with small delete files issue, and it was to some extent why
Spark is still defaulting v1.

Regarding feature side, changelog scan is mostly there in Spark, and there
will also likely be movements on Trino side for it very soon.

Overall, I think it would be beneficial to move default to v2, which could
incentivize the completion of those missing parts across engines.

Best,
Jack Ye




On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen 
wrote:

> Hi,
>
> FWIW Trino already creates v2 tables by default.
> Thought it's worth sharing for context.
>
> Best
> PF
>
>
>
>
> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang 
> wrote:
>
>> Hi all,
>>
>> We've maintained a forked Iceberg internally and all our use cases
>> involve v2 tables with row-level updates and deletes. Our users need to
>> remember to create table with the `'format-version'='2'` option or alter
>> table afterwards.
>>
>> I'm thinking about changing the default format-version of our
>> forked Iceberg to v2 . Is there any concern for this change? Any hidden
>> issues I've missed?
>>
>> Thanks,
>> Manu
>>
>


Re: Change default format-version of our forked Iceberg to v2

2023-01-11 Thread Piotr Findeisen
Hi,

FWIW Trino already creates v2 tables by default.
Thought it's worth sharing for context.

Best
PF




On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang  wrote:

> Hi all,
>
> We've maintained a forked Iceberg internally and all our use cases involve
> v2 tables with row-level updates and deletes. Our users need to remember to
> create table with the `'format-version'='2'` option or alter
> table afterwards.
>
> I'm thinking about changing the default format-version of our
> forked Iceberg to v2 . Is there any concern for this change? Any hidden
> issues I've missed?
>
> Thanks,
> Manu
>