> Instead of using an existing columnar format like parquet (one file for
one type of stats) to store indexes, any reason why we have developed our
own format and any benchmarks taken against Puffin vs other formats?
The format needs to store large blobs, which can easily be multiple
megabytes
+1
On Wed, Jun 22, 2022 at 9:34 AM Piotr Findeisen
wrote:
> Hi Ajantha,
>
> Thank you for spending the time to look into this.
>
> re a: I think I remember Ryan saying Parquet isn't good for bigger pieces
> of data, and some stats sketches or indices can be bigger than others.
> Also, the
Hi Ajantha,
Thank you for spending the time to look into this.
re a: I think I remember Ryan saying Parquet isn't good for bigger pieces
of data, and some stats sketches or indices can be bigger than others.
Also, the Parquet row logical / columnar storage format doesn't give as
much benefit for
Thank you Piotr for all of the work you’ve put into this.
I just checked the spec. I have a few newbie questions.
a. Instead of using an existing columnar format like parquet (one file for
one type of stats) to store indexes, any reason why we have developed our
own format and any benchmarks
+1 on the format! It looks great!
Thanks for materializing the initial design idea.
Miao
From: Kyle Bendickson
Date: Sunday, June 12, 2022 at 1:55 PM
To: dev@iceberg.apache.org
Subject: Re: [VOTE] Adopt Puffin format as a file format for statistics and
indexes
EXTERNAL: Use caution when
+1 [non-binding]
Thank you Piotr for all of the work you’ve put into this.
This should greatly benefit not only Iceberg on Trino, but hopefully can be
used in many novel ways due to its well thought out generic design and
incorporation of the ability to extend with new sketches.
Looking forward
+1, let's do it!
On Fri, Jun 10, 2022 at 2:47 PM John Zhuge wrote:
> +1 Looking forward to the features it enables.
>
> On Fri, Jun 10, 2022 at 10:11 AM Yufei Gu wrote:
>
>> +1. Looking forward to the partition stats.
>> Best,
>>
>> Yufei
>>
>>
>> On Thu, Jun 9, 2022 at 6:32 PM Daniel Weeks
+1 Looking forward to the features it enables.
On Fri, Jun 10, 2022 at 10:11 AM Yufei Gu wrote:
> +1. Looking forward to the partition stats.
> Best,
>
> Yufei
>
>
> On Thu, Jun 9, 2022 at 6:32 PM Daniel Weeks wrote:
>
>> +1 as well. Excited about the progress here.
>>
>> -Dan
>>
>> On Thu,
+1. Looking forward to the partition stats.
Best,
Yufei
On Thu, Jun 9, 2022 at 6:32 PM Daniel Weeks wrote:
> +1 as well. Excited about the progress here.
>
> -Dan
>
> On Thu, Jun 9, 2022, 6:25 PM Junjie Chen wrote:
>
>> +1, really nice! Indexes are coming!
>>
>> On Fri, Jun 10, 2022 at 8:04
+1 as well. Excited about the progress here.
-Dan
On Thu, Jun 9, 2022, 6:25 PM Junjie Chen wrote:
> +1, really nice! Indexes are coming!
>
> On Fri, Jun 10, 2022 at 8:04 AM Szehon Ho wrote:
>
>> +1, it's an exciting step for Iceberg, look forward to all the new
>> statistics and secondary
+1, really nice! Indexes are coming!
On Fri, Jun 10, 2022 at 8:04 AM Szehon Ho wrote:
> +1, it's an exciting step for Iceberg, look forward to all the new
> statistics and secondary indices it will allow.
>
> Had a few questions of what the reference to Puffin file(s) will be in the
> Iceberg
+1, it's an exciting step for Iceberg, look forward to all the new
statistics and secondary indices it will allow.
Had a few questions of what the reference to Puffin file(s) will be in the
Iceberg spec, but it's orthogonal to Puffin file format itself.
Thanks,
Szehon
On Thu, Jun 9, 2022 at
+1 from me!
There may also be people that haven't followed the design discussions and
we can start a DISCUSS thread if needed. But if everyone is comfortable
with the design and implementation, I think it's ready for a vote as well.
Huge thanks to Piotr for getting this ready! I think the format
Hi Everyone,
I propose that we adopt Puffin file format as a file format for statistics
and indexes in Iceberg tables.
Puffin file format specification:
https://github.com/apache/iceberg/blob/master/format/puffin-spec.md
(previous discussions: https://github.com/apache/iceberg/pull/4944,
14 matches
Mail list logo