Re: [VOTE] Adopt Variant from Spark

2024-09-10 Thread rdb...@gmail.com
+1 for adding the variant spec to Parquet. I'm looking forward to working on the addition of shredding. As for the details, I think I also prefer a separate repository, `parquet-variant`, but I don't think we necessarily need to determine that question up front. On Tue, Sep 10, 2024 at 9:05 AM Ga

Re: [DISCUSS] Moving Variant to Parquet Details

2024-09-10 Thread rdb...@gmail.com
To me, what matters the most is not really the repository, but the release process. Since the variant code is going to be fairly rapidly developed and may not have a stable API, I'd prefer to have it on a separate release cycle and start the versioning at 0.1.0 to avoid a misconception that the API

Re: [VOTE] Adopt Variant from Spark

2024-09-13 Thread rdb...@gmail.com
n Wed, Sep 11, 2024 at 8:53 AM Gang Wu < > > ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote: > > > > > > > > > > > > > > > Let's just vote for the adoption in this thread and discuss > > the &

Re: [DISCUSS] Clarify backward-compatibility rules on legacy LIST type

2024-10-31 Thread rdb...@gmail.com
Hopefully I can help because I wrote those rules. I think that the correct type is List>. Because none of the first 3 rules apply, the element type is the repeated type, which is a repeated int32. The rules are primarily trying to account for cases where known structures were used. If the repeate

Re: [DISCUSS] Deprecation of parquet-pig

2025-02-05 Thread rdb...@gmail.com
+1 People that need support can still use older versions, so I don't think that this would be a significant problem for anyone. On Wed, Feb 5, 2025 at 8:20 AM Fokko Driesprong wrote: > Hi everyone, > > I would like to discuss the deprecation/removal of parquet-pig. > > The last Pig release

Re: [VOTE][Format] Add Geometry & Geography types to the specification

2025-02-06 Thread rdb...@gmail.com
+1 (binding) Thanks to everyone that worked on getting this update done! It's been an amazing amount of discussion and I'm excited to see it ready to go. On Thu, Feb 6, 2025 at 8:11 AM Jia Yu wrote: > +1 (non-binding) > > I’m really looking forward to this! It’s going to be a fantastic addition

Re: [DISCUSS] Open Variant Shredding Issues

2024-12-09 Thread rdb...@gmail.com
I think that Parquet should exactly reproduce the data that is written to files, rather than either allowing or requiring Parquet implementations to normalize types. To me, that's a fundamental guarantee of the storage layer. The compute layer can decide to normalize types and take actions to make

Re: [DISCUSS] Open Variant Shredding Issues

2024-12-20 Thread rdb...@gmail.com
> > implementations to shred data by taking the schema of a field the first > > > time it appears as a reasonable heuristic? More generally it might be > > good > > > to start discussing what API changes we expect are needed to support > > > shredding in re

Re: Conflicting sync meetings inventory

2025-03-05 Thread rdb...@gmail.com
Iceberg has conflicting meetings. There is an Iceberg community sync at 9 AM PT every 3 weeks, with the next one on 19 March. There is also an Iceberg REST catalog sync every 3 weeks at 9 AM PT one week after each general community sync, so the next is on 26 March. On Wed, Mar 5, 2025 at 1:02 PM J

Re: Release parquet-format for Variant logical type

2025-02-20 Thread rdb...@gmail.com
The Variant in the thrift definition is a struct, so we can easily add version later. The only reason to add it now is if we want to be able to break forward compatibility with shredding. I'd be fine adding an encoding/shredding version = 1. On Thu, Feb 20, 2025 at 4:49 PM Micah Kornfield wrote: