Re: [VOTE] Deprecate or remove distinct_count

2025-03-03 Thread Fokko Driesprong
Thanks everyone. It seems like there is a consensus, and I'll go ahead and
mark the field as deprecated for now to avoid any future confusion.

Kind regards,
Fokko

Op di 25 feb 2025 om 00:54 schreef Ajantha Bhat :

> +1 to deprecate it again and remove it later on.
>
> I did some digging and found out that Dremio was interested in this field
> for secondary indexes.
> https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn
>
> But we didn't make progress on that.
>
> - Ajantha
>
> On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell 
> wrote:
>
>> Speaking for Dremio, I checked and we're not using distinct_counts
>> anywhere, we interact with manifests exclusively through the Iceberg Java
>> API which as mentioned doesn't support this field.I'm in favor of
>> removing it, I didn't even know it existed as I tend to look at the Java
>> DataFile/ContentFile interfaces when browsing the metadata structure vs.
>> going to the spec 😂
>>
>>
>> On Mon, Feb 24, 2025 at 3:00 PM [email protected] 
>> wrote:
>>
>>> I can provide some context here. The field is very old and when we
>>> realized that it was not only unused but also difficult to produce and use
>>> in practice (can't be combined) we deprecated the field. However, some
>>> folks from Dremio wanted to bring it back because they said they could
>>> store values there and had a way to use them.
>>>
>>> +1, but it would be good to check in with some Dremio engineers and see
>>> if they are using it. I assume they aren't since this thread hasn't gotten
>>> much attention. Thanks for bringing this up!
>>>
>>> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
>>>  wrote:
>>>
 Xuanwo, do you favor deprecating or removing `distinct_count`?

 Due to lack of any real implementation, I myself favor removal (PR
 12183).

 Jacob Marble
 🔥🐅


 On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:

> Here is my +1 binding.
>
> The current status of `distinct_count` is quite confusing, which has
> also led to additional discussions in `iceberg-rust` about whether we need
> to add it and how to maintain it.
>
> Removing it seems reasonable to me, as there are no known use cases
> for `distinct_count` in a single data file.
>
> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>
> My mistake, I suggested sending out an email with a quick vote on the
> PR. I like the suggestion to use this thread for discussion since the
> number of options is limited.
>
> I'm in favor of deprecating the field, to avoid that we re-use the
> field-id in the future.
>
> Kind regards,
> Fokko
>
> Op di 11 feb 2025 om 05:46 schreef Manu Zhang  >:
>
> Hi Jacob,
>
> Thanks for initiating the vote.
> Typically, we would first have a DISCUSSION thread to reach a
> consensus on the preferred option and then follow it up with a VOTE thread
> for confirmation.
>
> Maybe we can take this as a DISCUSSION thread?
>
> Best,
> Manu
>
>
> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>  wrote:
>
> This vote will be open for at least 72 hours.
>
> I propose that distinct_counts be either deprecated (#12182
> ) or removed (#12183
> ) from the spec.
>
> According to #767 
> data_file.distinct_counts was deprecated about four years ago. 
> Furthermore,
> it not implemented in the canonical Java and Python implementations
>
> Please share your thoughts, and vote one of the following:
> - remove
> - deprecate
> - no-op
>
> Jacob Marble
> 🔥🐅
>
> Xuanwo
>
> https://xuanwo.io/
>
>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-24 Thread Ajantha Bhat
+1 to deprecate it again and remove it later on.

I did some digging and found out that Dremio was interested in this field
for secondary indexes.
https://lists.apache.org/thread/z948wfssgvrpf9b3g6660gh5cxb2d3sn

But we didn't make progress on that.

- Ajantha

On Tue, Feb 25, 2025 at 5:03 AM Scott Cowell  wrote:

> Speaking for Dremio, I checked and we're not using distinct_counts
> anywhere, we interact with manifests exclusively through the Iceberg Java
> API which as mentioned doesn't support this field.I'm in favor of
> removing it, I didn't even know it existed as I tend to look at the Java
> DataFile/ContentFile interfaces when browsing the metadata structure vs.
> going to the spec 😂
>
>
> On Mon, Feb 24, 2025 at 3:00 PM [email protected]  wrote:
>
>> I can provide some context here. The field is very old and when we
>> realized that it was not only unused but also difficult to produce and use
>> in practice (can't be combined) we deprecated the field. However, some
>> folks from Dremio wanted to bring it back because they said they could
>> store values there and had a way to use them.
>>
>> +1, but it would be good to check in with some Dremio engineers and see
>> if they are using it. I assume they aren't since this thread hasn't gotten
>> much attention. Thanks for bringing this up!
>>
>> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
>>  wrote:
>>
>>> Xuanwo, do you favor deprecating or removing `distinct_count`?
>>>
>>> Due to lack of any real implementation, I myself favor removal (PR
>>> 12183).
>>>
>>> Jacob Marble
>>> 🔥🐅
>>>
>>>
>>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:
>>>
 Here is my +1 binding.

 The current status of `distinct_count` is quite confusing, which has
 also led to additional discussions in `iceberg-rust` about whether we need
 to add it and how to maintain it.

 Removing it seems reasonable to me, as there are no known use cases for
 `distinct_count` in a single data file.

 On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:

 My mistake, I suggested sending out an email with a quick vote on the
 PR. I like the suggestion to use this thread for discussion since the
 number of options is limited.

 I'm in favor of deprecating the field, to avoid that we re-use the
 field-id in the future.

 Kind regards,
 Fokko

 Op di 11 feb 2025 om 05:46 schreef Manu Zhang >>> >:

 Hi Jacob,

 Thanks for initiating the vote.
 Typically, we would first have a DISCUSSION thread to reach a consensus
 on the preferred option and then follow it up with a VOTE thread for
 confirmation.

 Maybe we can take this as a DISCUSSION thread?

 Best,
 Manu


 On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
  wrote:

 This vote will be open for at least 72 hours.

 I propose that distinct_counts be either deprecated (#12182
 ) or removed (#12183
 ) from the spec.

 According to #767 
 data_file.distinct_counts was deprecated about four years ago. Furthermore,
 it not implemented in the canonical Java and Python implementations

 Please share your thoughts, and vote one of the following:
 - remove
 - deprecate
 - no-op

 Jacob Marble
 🔥🐅

 Xuanwo

 https://xuanwo.io/




Re: [VOTE] Deprecate or remove distinct_count

2025-02-24 Thread Scott Cowell
Speaking for Dremio, I checked and we're not using distinct_counts
anywhere, we interact with manifests exclusively through the Iceberg Java
API which as mentioned doesn't support this field.I'm in favor of
removing it, I didn't even know it existed as I tend to look at the Java
DataFile/ContentFile interfaces when browsing the metadata structure vs.
going to the spec 😂


On Mon, Feb 24, 2025 at 3:00 PM [email protected]  wrote:

> I can provide some context here. The field is very old and when we
> realized that it was not only unused but also difficult to produce and use
> in practice (can't be combined) we deprecated the field. However, some
> folks from Dremio wanted to bring it back because they said they could
> store values there and had a way to use them.
>
> +1, but it would be good to check in with some Dremio engineers and see if
> they are using it. I assume they aren't since this thread hasn't gotten
> much attention. Thanks for bringing this up!
>
> On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
>  wrote:
>
>> Xuanwo, do you favor deprecating or removing `distinct_count`?
>>
>> Due to lack of any real implementation, I myself favor removal (PR 12183).
>>
>> Jacob Marble
>> 🔥🐅
>>
>>
>> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:
>>
>>> Here is my +1 binding.
>>>
>>> The current status of `distinct_count` is quite confusing, which has
>>> also led to additional discussions in `iceberg-rust` about whether we need
>>> to add it and how to maintain it.
>>>
>>> Removing it seems reasonable to me, as there are no known use cases for
>>> `distinct_count` in a single data file.
>>>
>>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>>
>>> My mistake, I suggested sending out an email with a quick vote on the
>>> PR. I like the suggestion to use this thread for discussion since the
>>> number of options is limited.
>>>
>>> I'm in favor of deprecating the field, to avoid that we re-use the
>>> field-id in the future.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>>>
>>> Hi Jacob,
>>>
>>> Thanks for initiating the vote.
>>> Typically, we would first have a DISCUSSION thread to reach a consensus
>>> on the preferred option and then follow it up with a VOTE thread for
>>> confirmation.
>>>
>>> Maybe we can take this as a DISCUSSION thread?
>>>
>>> Best,
>>> Manu
>>>
>>>
>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>>  wrote:
>>>
>>> This vote will be open for at least 72 hours.
>>>
>>> I propose that distinct_counts be either deprecated (#12182
>>> ) or removed (#12183
>>> ) from the spec.
>>>
>>> According to #767 
>>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>>> it not implemented in the canonical Java and Python implementations
>>>
>>> Please share your thoughts, and vote one of the following:
>>> - remove
>>> - deprecate
>>> - no-op
>>>
>>> Jacob Marble
>>> 🔥🐅
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>
>>>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-24 Thread [email protected]
I can provide some context here. The field is very old and when we realized
that it was not only unused but also difficult to produce and use in
practice (can't be combined) we deprecated the field. However, some folks
from Dremio wanted to bring it back because they said they could store
values there and had a way to use them.

+1, but it would be good to check in with some Dremio engineers and see if
they are using it. I assume they aren't since this thread hasn't gotten
much attention. Thanks for bringing this up!

On Thu, Feb 13, 2025 at 8:02 AM Jacob Marble
 wrote:

> Xuanwo, do you favor deprecating or removing `distinct_count`?
>
> Due to lack of any real implementation, I myself favor removal (PR 12183).
>
> Jacob Marble
> 🔥🐅
>
>
> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:
>
>> Here is my +1 binding.
>>
>> The current status of `distinct_count` is quite confusing, which has also
>> led to additional discussions in `iceberg-rust` about whether we need to
>> add it and how to maintain it.
>>
>> Removing it seems reasonable to me, as there are no known use cases for
>> `distinct_count` in a single data file.
>>
>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>
>> My mistake, I suggested sending out an email with a quick vote on the PR.
>> I like the suggestion to use this thread for discussion since the number of
>> options is limited.
>>
>> I'm in favor of deprecating the field, to avoid that we re-use the
>> field-id in the future.
>>
>> Kind regards,
>> Fokko
>>
>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>>
>> Hi Jacob,
>>
>> Thanks for initiating the vote.
>> Typically, we would first have a DISCUSSION thread to reach a consensus
>> on the preferred option and then follow it up with a VOTE thread for
>> confirmation.
>>
>> Maybe we can take this as a DISCUSSION thread?
>>
>> Best,
>> Manu
>>
>>
>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>  wrote:
>>
>> This vote will be open for at least 72 hours.
>>
>> I propose that distinct_counts be either deprecated (#12182
>> ) or removed (#12183
>> ) from the spec.
>>
>> According to #767 
>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>> it not implemented in the canonical Java and Python implementations
>>
>> Please share your thoughts, and vote one of the following:
>> - remove
>> - deprecate
>> - no-op
>>
>> Jacob Marble
>> 🔥🐅
>>
>> Xuanwo
>>
>> https://xuanwo.io/
>>
>>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-23 Thread Xuanwo
Oh, sorry for the mistake.

My vote should be non-binding.

On Wed, Feb 12, 2025, at 14:23, Xuanwo wrote:
> Here is my +1 binding.
> 
> The current status of `distinct_count` is quite confusing, which has also led 
> to additional discussions in `iceberg-rust` about whether we need to add it 
> and how to maintain it.
> 
> Removing it seems reasonable to me, as there are no known use cases for 
> `distinct_count` in a single data file.
> 
> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>> My mistake, I suggested sending out an email with a quick vote on the PR. I 
>> like the suggestion to use this thread for discussion since the number of 
>> options is limited.
>> 
>> I'm in favor of deprecating the field, to avoid that we re-use the field-id 
>> in the future.
>> 
>> Kind regards,
>> Fokko
>> 
>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>>> Hi Jacob,
>>> 
>>> Thanks for initiating the vote.
>>> Typically, we would first have a DISCUSSION thread to reach a consensus on 
>>> the preferred option and then follow it up with a VOTE thread for 
>>> confirmation.
>>> 
>>> Maybe we can take this as a DISCUSSION thread?
>>> 
>>> Best,
>>> Manu
>>> 
>>> 
>>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble 
>>>  wrote:
 This vote will be open for at least 72 hours.
 
 I propose that distinct_counts be either deprecated (#12182 
 ) or removed (#12183 
 ) from the spec.
 
 According to #767  
 data_file.distinct_counts was deprecated about four years ago. 
 Furthermore, it not implemented in the canonical Java and Python 
 implementations
 
 Please share your thoughts, and vote one of the following:
 - remove
 - deprecate
 - no-op
 
 Jacob Marble
 🔥🐅
> Xuanwo
> 
> https://xuanwo.io/
> 
Xuanwo

https://xuanwo.io/


Re: [VOTE] Deprecate or remove distinct_count

2025-02-23 Thread Jacob Marble
Fokko, what is the next step to merging one of the two PRs?

Jacob Marble
🔥🐅


On Thu, Feb 13, 2025 at 8:01 AM Jacob Marble 
wrote:

> Xuanwo, do you favor deprecating or removing `distinct_count`?
>
> Due to lack of any real implementation, I myself favor removal (PR 12183).
>
> Jacob Marble
> 🔥🐅
>
>
> On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:
>
>> Here is my +1 binding.
>>
>> The current status of `distinct_count` is quite confusing, which has also
>> led to additional discussions in `iceberg-rust` about whether we need to
>> add it and how to maintain it.
>>
>> Removing it seems reasonable to me, as there are no known use cases for
>> `distinct_count` in a single data file.
>>
>> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>>
>> My mistake, I suggested sending out an email with a quick vote on the PR.
>> I like the suggestion to use this thread for discussion since the number of
>> options is limited.
>>
>> I'm in favor of deprecating the field, to avoid that we re-use the
>> field-id in the future.
>>
>> Kind regards,
>> Fokko
>>
>> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>>
>> Hi Jacob,
>>
>> Thanks for initiating the vote.
>> Typically, we would first have a DISCUSSION thread to reach a consensus
>> on the preferred option and then follow it up with a VOTE thread for
>> confirmation.
>>
>> Maybe we can take this as a DISCUSSION thread?
>>
>> Best,
>> Manu
>>
>>
>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>>  wrote:
>>
>> This vote will be open for at least 72 hours.
>>
>> I propose that distinct_counts be either deprecated (#12182
>> ) or removed (#12183
>> ) from the spec.
>>
>> According to #767 
>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>> it not implemented in the canonical Java and Python implementations
>>
>> Please share your thoughts, and vote one of the following:
>> - remove
>> - deprecate
>> - no-op
>>
>> Jacob Marble
>> 🔥🐅
>>
>> Xuanwo
>>
>> https://xuanwo.io/
>>
>>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-13 Thread Jacob Marble
Xuanwo, do you favor deprecating or removing `distinct_count`?

Due to lack of any real implementation, I myself favor removal (PR 12183).

Jacob Marble
🔥🐅


On Tue, Feb 11, 2025 at 10:25 PM Xuanwo  wrote:

> Here is my +1 binding.
>
> The current status of `distinct_count` is quite confusing, which has also
> led to additional discussions in `iceberg-rust` about whether we need to
> add it and how to maintain it.
>
> Removing it seems reasonable to me, as there are no known use cases for
> `distinct_count` in a single data file.
>
> On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
>
> My mistake, I suggested sending out an email with a quick vote on the PR.
> I like the suggestion to use this thread for discussion since the number of
> options is limited.
>
> I'm in favor of deprecating the field, to avoid that we re-use the
> field-id in the future.
>
> Kind regards,
> Fokko
>
> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>
> Hi Jacob,
>
> Thanks for initiating the vote.
> Typically, we would first have a DISCUSSION thread to reach a consensus on
> the preferred option and then follow it up with a VOTE thread for
> confirmation.
>
> Maybe we can take this as a DISCUSSION thread?
>
> Best,
> Manu
>
>
> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>  wrote:
>
> This vote will be open for at least 72 hours.
>
> I propose that distinct_counts be either deprecated (#12182
> ) or removed (#12183
> ) from the spec.
>
> According to #767 
> data_file.distinct_counts was deprecated about four years ago. Furthermore,
> it not implemented in the canonical Java and Python implementations
>
> Please share your thoughts, and vote one of the following:
> - remove
> - deprecate
> - no-op
>
> Jacob Marble
> 🔥🐅
>
> Xuanwo
>
> https://xuanwo.io/
>
>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-11 Thread Xuanwo
Here is my +1 binding.

The current status of `distinct_count` is quite confusing, which has also led 
to additional discussions in `iceberg-rust` about whether we need to add it and 
how to maintain it.

Removing it seems reasonable to me, as there are no known use cases for 
`distinct_count` in a single data file.

On Tue, Feb 11, 2025, at 23:05, Fokko Driesprong wrote:
> My mistake, I suggested sending out an email with a quick vote on the PR. I 
> like the suggestion to use this thread for discussion since the number of 
> options is limited.
> 
> I'm in favor of deprecating the field, to avoid that we re-use the field-id 
> in the future.
> 
> Kind regards,
> Fokko
> 
> Op di 11 feb 2025 om 05:46 schreef Manu Zhang :
>> Hi Jacob,
>> 
>> Thanks for initiating the vote.
>> Typically, we would first have a DISCUSSION thread to reach a consensus on 
>> the preferred option and then follow it up with a VOTE thread for 
>> confirmation.
>> 
>> Maybe we can take this as a DISCUSSION thread?
>> 
>> Best,
>> Manu
>> 
>> 
>> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble 
>>  wrote:
>>> This vote will be open for at least 72 hours.
>>> 
>>> I propose that distinct_counts be either deprecated (#12182 
>>> ) or removed (#12183 
>>> ) from the spec.
>>> 
>>> According to #767  
>>> data_file.distinct_counts was deprecated about four years ago. Furthermore, 
>>> it not implemented in the canonical Java and Python implementations
>>> 
>>> Please share your thoughts, and vote one of the following:
>>> - remove
>>> - deprecate
>>> - no-op
>>> 
>>> Jacob Marble
>>> 🔥🐅
Xuanwo

https://xuanwo.io/


Re: [VOTE] Deprecate or remove distinct_count

2025-02-11 Thread Fokko Driesprong
My mistake, I suggested sending out an email with a quick vote on the PR. I
like the suggestion to use this thread for discussion since the number of
options is limited.

I'm in favor of deprecating the field, to avoid that we re-use the field-id
in the future.

Kind regards,
Fokko

Op di 11 feb 2025 om 05:46 schreef Manu Zhang :

> Hi Jacob,
>
> Thanks for initiating the vote.
> Typically, we would first have a DISCUSSION thread to reach a consensus on
> the preferred option and then follow it up with a VOTE thread for
> confirmation.
>
> Maybe we can take this as a DISCUSSION thread?
>
> Best,
> Manu
>
>
> On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
>  wrote:
>
>> This vote will be open for at least 72 hours.
>>
>> I propose that distinct_counts be either deprecated (#12182
>> ) or removed (#12183
>> ) from the spec.
>>
>> According to #767 
>> data_file.distinct_counts was deprecated about four years ago. Furthermore,
>> it not implemented in the canonical Java and Python implementations
>>
>> Please share your thoughts, and vote one of the following:
>> - remove
>> - deprecate
>> - no-op
>>
>> Jacob Marble
>> 🔥🐅
>>
>


Re: [VOTE] Deprecate or remove distinct_count

2025-02-10 Thread Manu Zhang
Hi Jacob,

Thanks for initiating the vote.
Typically, we would first have a DISCUSSION thread to reach a consensus on
the preferred option and then follow it up with a VOTE thread for
confirmation.

Maybe we can take this as a DISCUSSION thread?

Best,
Manu


On Tue, Feb 11, 2025 at 7:20 AM Jacob Marble
 wrote:

> This vote will be open for at least 72 hours.
>
> I propose that distinct_counts be either deprecated (#12182
> ) or removed (#12183
> ) from the spec.
>
> According to #767 
> data_file.distinct_counts was deprecated about four years ago. Furthermore,
> it not implemented in the canonical Java and Python implementations
>
> Please share your thoughts, and vote one of the following:
> - remove
> - deprecate
> - no-op
>
> Jacob Marble
> 🔥🐅
>