Re: Implementation Status Page is growing with an entry for Polars

2026-02-04 Thread Micah Kornfield
Awesome, thank you Gijs!

On Wed, Feb 4, 2026 at 11:45 AM Andrew Lamb  wrote:

> Hello,
>
> The Implementation Status page[1] continues to grow in completeness.
> Keeping a list of known open source implementations up to date I think will
> continue to help Parquet spread across the ecosystem.
>
> Thanks to Gijs Burghoorn, there is now a PR[1] to add the Polar parquet
> implementation to the list. If you would like to help review or have any
> comments, please leave them on the PR.
>
> Thanks,
> Andrew
>
>
> [1]: https://parquet.apache.org/docs/file-format/implementationstatus/
> [2]: https://github.com/apache/parquet-site/pull/153
>


Re: Implementation status

2025-10-24 Thread Julien Le Dem
We'll change the format from Markdown to flatbuffer.

(this joke has a very small TAM)

On Fri, Oct 24, 2025 at 5:17 AM Antoine Pitrou  wrote:

>
> Ok, but let's keep in mind that parsing the Thrift footer for all those
> columns will become expensive.
>
> Regards
>
> Antoine.
>
>
> On Thu, 23 Oct 2025 11:13:53 -0700
> Julien Le Dem  wrote:
> > Do relevant people on that list who work at said vendors feel like adding
> > their respective columns?
> > BigQuery, Databricks, Dremio, Snowflake, ... ?
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> >
> > On Wed, Oct 22, 2025 at 10:05 PM Arnav Balyan 
> > wrote:
> >
> > > +1, I can try to help crowdsource the list. Maybe we could use the
> public
> > > Slack channel (seems to have 100+ people)
> > >
> > > On Thu, Oct 23, 2025 at 5:57 AM Andrew Lamb 
> > > wrote:
> > >
> > > > I think it is a great idea -- I can certainly add the columns, but
> as you
> > > > say only people from those companies would be able to fill them out.
> > > >
> > > > Maybe if we added some columns that would add some (positive)
> pressure to
> > > > provide the information
> > > >
> > > > Andrew
> > > >
> > > > On Wed, Oct 22, 2025 at 2:50 PM Julien Le Dem
>  wrote:
> > > >
> > > > > [forking into a new thread]
> > > > > Should we add columns for BigQuery, Databricks, Snowflake, Dremio,
> ...?
> > > > > I feel that this page is even more important for proprietary
> engines
> > > that
> > > > > we can not look at the implementation to check. (But they are
> important
> > > > > member of the ecosystem)
> > > > >
> > > > >
> > > > > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > > Someone has to add V2 data pages to
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
>
> > > > > > > :)
> > > > > >
> > > > > > Your wish is my command:
> > > > https://github.com/apache/parquet-site/pull/124
> > > > > >
> > > > > > As the format grows in popularity and momentum builds to evolve,
> I
> > > feel
> > > > > the
> > > > > > content on the parquet.apache.org site could use refreshing /
> > > > updating.
> > > > > > So, while I had the site open, I made some other PRs to scratch
> > > various
> > > > > > itches
> > > > > >
> > > > > > (I am absolutely 🎣 for someone to please review 🙏):
> > > > > >
> > > > > > 1. Add Variant/Geometry/Geography types to implementation
> status
> > > > matrix:
> > > > > > https://github.com/apache/parquet-site/pull/123
> > > > > > 2. Improve introduction / overview, add more links to spec and
> > > > > > implementation status:
> > > https://github.com/apache/parquet-site/pull/125
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Andrew
> > > > > >
> > > > > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou <
> [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > Hi Julien, hi all,
> > > > > > >
> > > > > > > On Mon, 20 Oct 2025 15:14:58 -0700
> > > > > > > Julien Le Dem  wrote:
> > > > > > > >
> > > > > > > > Another question from me:
> > > > > > > >
> > > > > > > > Since the goal is to not use compression at all in this case
> (no
> > > > > ZSTD)
> > > > > > > > I'm assuming we would be using either:
> > > > > > > > - the Data Page V1 with UNCOMPRESSED in the
> ColumnMetadata.column
> > > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
>
> > > > > > > >
> > > > > > > > field.
> > > > > > > > - the Data Page V2 with false in the
> > > DataPageHeaderV2.is_compressed
> > > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
>
> > > > > > > >
> > > > > > > > field
> > > > > > > > The second helping decide if we can selectively compress
> some
> > > pages
> > > > > if
> > > > > > > they
> > > > > > > > are less compressed by the
> > > > > > > > A few years ago there was a question on the support of the
> > > > > DATA_PAGE_V2
> > > > > > > and
> > > > > > > > I was curious to hear a refresh on how that's generally
> supported
> > > > in
> > > > > > > > Parquet implementations. The is_compressed field was
> exactly
> > > > intended
> > > > > > to
> > > > > > > > avoid block compression when the encoding itself is good
> enough.
> > > > > > >
> > > > > > > Someone has to add V2 data pages to
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
>
> > > > > > > :)
> > > > > > >
> > > > > > > C++, Java and Rust support them for sure. I feel like we should
> > > > > > > probably default to V2 at some point.
> > > >

Re: Implementation status

2025-10-24 Thread Antoine Pitrou


Ok, but let's keep in mind that parsing the Thrift footer for all those
columns will become expensive.

Regards

Antoine.


On Thu, 23 Oct 2025 11:13:53 -0700
Julien Le Dem  wrote:
> Do relevant people on that list who work at said vendors feel like adding
> their respective columns?
> BigQuery, Databricks, Dremio, Snowflake, ... ?
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> 
> On Wed, Oct 22, 2025 at 10:05 PM Arnav Balyan 
> wrote:
> 
> > +1, I can try to help crowdsource the list. Maybe we could use the public
> > Slack channel (seems to have 100+ people)
> >
> > On Thu, Oct 23, 2025 at 5:57 AM Andrew Lamb 
> > wrote:
> >  
> > > I think it is a great idea -- I can certainly add the columns, but as you
> > > say only people from those companies would be able to fill them out.
> > >
> > > Maybe if we added some columns that would add some (positive) pressure to
> > > provide the information
> > >
> > > Andrew
> > >
> > > On Wed, Oct 22, 2025 at 2:50 PM Julien Le Dem 
> > >  wrote:
> > >  
> > > > [forking into a new thread]
> > > > Should we add columns for BigQuery, Databricks, Snowflake, Dremio, ...?
> > > > I feel that this page is even more important for proprietary engines  
> > that  
> > > > we can not look at the implementation to check. (But they are important
> > > > member of the ecosystem)
> > > >
> > > >
> > > > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb 
> > > > wrote:
> > > >  
> > > > > > Someone has to add V2 data pages to
> > > > > >  
> > > > >
> > > > >  
> > > >  
> > >  
> > https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> >   
> > > > > > :)  
> > > > >
> > > > > Your wish is my command:  
> > > https://github.com/apache/parquet-site/pull/124  
> > > > >
> > > > > As the format grows in popularity and momentum builds to evolve, I  
> > feel  
> > > > the  
> > > > > content on the parquet.apache.org site could use refreshing /  
> > > updating.  
> > > > > So, while I had the site open, I made some other PRs to scratch  
> > various  
> > > > > itches
> > > > >
> > > > > (I am absolutely 🎣 for someone to please review 🙏):
> > > > >
> > > > > 1. Add Variant/Geometry/Geography types to implementation status  
> > > matrix:  
> > > > > https://github.com/apache/parquet-site/pull/123
> > > > > 2. Improve introduction / overview, add more links to spec and
> > > > > implementation status:  
> > https://github.com/apache/parquet-site/pull/125  
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou   
> > > > wrote:  
> > > > >  
> > > > > >
> > > > > > Hi Julien, hi all,
> > > > > >
> > > > > > On Mon, 20 Oct 2025 15:14:58 -0700
> > > > > > Julien Le Dem  wrote:  
> > > > > > >
> > > > > > > Another question from me:
> > > > > > >
> > > > > > > Since the goal is to not use compression at all in this case (no  
> > > > ZSTD)  
> > > > > > > I'm assuming we would be using either:
> > > > > > > - the Data Page V1 with UNCOMPRESSED in the ColumnMetadata.column
> > > > > > > <  
> > > > > >  
> > > > >  
> > > >  
> > >  
> > https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
> >   
> > > > > > >
> > > > > > > field.
> > > > > > > - the Data Page V2 with false in the  
> > DataPageHeaderV2.is_compressed  
> > > > > > > <  
> > > > > >  
> > > > >  
> > > >  
> > >  
> > https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
> >   
> > > > > > >
> > > > > > > field
> > > > > > > The second helping decide if we can selectively compress some  
> > pages  
> > > > if  
> > > > > > they  
> > > > > > > are less compressed by the
> > > > > > > A few years ago there was a question on the support of the  
> > > > DATA_PAGE_V2  
> > > > > > and  
> > > > > > > I was curious to hear a refresh on how that's generally supported 
> > > > > > >  
> > > in  
> > > > > > > Parquet implementations. The is_compressed field was exactly  
> > > intended  
> > > > > to  
> > > > > > > avoid block compression when the encoding itself is good enough.  
> > > > > >
> > > > > > Someone has to add V2 data pages to
> > > > > >
> > > > > >  
> > > > >  
> > > >  
> > >  
> > https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> >   
> > > > > > :)
> > > > > >
> > > > > > C++, Java and Rust support them for sure. I feel like we should
> > > > > > probably default to V2 at some point.
> > > > > >
> > > > > > Also see https://github.com/apache/parquet-java/issues/3344 for  
> > > Java.  
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >  
> > > > > > >
> > > > > > > Julien
> > > > > > >
> > > > > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb  
> > > > > > 
> > > > > >  wrote:  
> > > > > > >

Re: Implementation status

2025-10-23 Thread Julien Le Dem
Do relevant people on that list who work at said vendors feel like adding
their respective columns?
BigQuery, Databricks, Dremio, Snowflake, ... ?
https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md

On Wed, Oct 22, 2025 at 10:05 PM Arnav Balyan 
wrote:

> +1, I can try to help crowdsource the list. Maybe we could use the public
> Slack channel (seems to have 100+ people)
>
> On Thu, Oct 23, 2025 at 5:57 AM Andrew Lamb 
> wrote:
>
> > I think it is a great idea -- I can certainly add the columns, but as you
> > say only people from those companies would be able to fill them out.
> >
> > Maybe if we added some columns that would add some (positive) pressure to
> > provide the information
> >
> > Andrew
> >
> > On Wed, Oct 22, 2025 at 2:50 PM Julien Le Dem  wrote:
> >
> > > [forking into a new thread]
> > > Should we add columns for BigQuery, Databricks, Snowflake, Dremio, ...?
> > > I feel that this page is even more important for proprietary engines
> that
> > > we can not look at the implementation to check. (But they are important
> > > member of the ecosystem)
> > >
> > >
> > > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb 
> > > wrote:
> > >
> > > > > Someone has to add V2 data pages to
> > > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > > > :)
> > > >
> > > > Your wish is my command:
> > https://github.com/apache/parquet-site/pull/124
> > > >
> > > > As the format grows in popularity and momentum builds to evolve, I
> feel
> > > the
> > > > content on the parquet.apache.org site could use refreshing /
> > updating.
> > > > So, while I had the site open, I made some other PRs to scratch
> various
> > > > itches
> > > >
> > > > (I am absolutely 🎣 for someone to please review 🙏):
> > > >
> > > > 1. Add Variant/Geometry/Geography types to implementation status
> > matrix:
> > > > https://github.com/apache/parquet-site/pull/123
> > > > 2. Improve introduction / overview, add more links to spec and
> > > > implementation status:
> https://github.com/apache/parquet-site/pull/125
> > > >
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou 
> > > wrote:
> > > >
> > > > >
> > > > > Hi Julien, hi all,
> > > > >
> > > > > On Mon, 20 Oct 2025 15:14:58 -0700
> > > > > Julien Le Dem  wrote:
> > > > > >
> > > > > > Another question from me:
> > > > > >
> > > > > > Since the goal is to not use compression at all in this case (no
> > > ZSTD)
> > > > > > I'm assuming we would be using either:
> > > > > > - the Data Page V1 with UNCOMPRESSED in the ColumnMetadata.column
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
> > > > > >
> > > > > > field.
> > > > > > - the Data Page V2 with false in the
> DataPageHeaderV2.is_compressed
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
> > > > > >
> > > > > > field
> > > > > > The second helping decide if we can selectively compress some
> pages
> > > if
> > > > > they
> > > > > > are less compressed by the
> > > > > > A few years ago there was a question on the support of the
> > > DATA_PAGE_V2
> > > > > and
> > > > > > I was curious to hear a refresh on how that's generally supported
> > in
> > > > > > Parquet implementations. The is_compressed field was exactly
> > intended
> > > > to
> > > > > > avoid block compression when the encoding itself is good enough.
> > > > >
> > > > > Someone has to add V2 data pages to
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > > > :)
> > > > >
> > > > > C++, Java and Rust support them for sure. I feel like we should
> > > > > probably default to V2 at some point.
> > > > >
> > > > > Also see https://github.com/apache/parquet-java/issues/3344 for
> > Java.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > >
> > > > > > Julien
> > > > > >
> > > > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb
> > > > >  wrote:
> > > > > >
> > > > > > > Thanks again Prateek and co for pushing this along!
> > > > > > >
> > > > > > >
> > > > > > > > 1. Design and write our own Parquet-ALP spec so that
> > > > implementations
> > > > > > > > know exactly how to encode and represent data
> > > > > > >
> > > > > > > 100% agree with this (similar to what was done for
> > ParquetVariant)
> > > > > > >
> > > > > > > > 2. I may be missing something, but the paper doesn't seem to
> > > > > mention
> > > > > > > non-finite values (such as +/-Inf and NaNs).
> > > > > > >
> > > > > > > I think they are handled via the "Exception" mechanism.
> Vortex's
> > > ALP
> > > > > > > implementation (below)

Re: Implementation status

2025-10-22 Thread Arnav Balyan
+1, I can try to help crowdsource the list. Maybe we could use the public
Slack channel (seems to have 100+ people)

On Thu, Oct 23, 2025 at 5:57 AM Andrew Lamb  wrote:

> I think it is a great idea -- I can certainly add the columns, but as you
> say only people from those companies would be able to fill them out.
>
> Maybe if we added some columns that would add some (positive) pressure to
> provide the information
>
> Andrew
>
> On Wed, Oct 22, 2025 at 2:50 PM Julien Le Dem  wrote:
>
> > [forking into a new thread]
> > Should we add columns for BigQuery, Databricks, Snowflake, Dremio, ...?
> > I feel that this page is even more important for proprietary engines that
> > we can not look at the implementation to check. (But they are important
> > member of the ecosystem)
> >
> >
> > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb 
> > wrote:
> >
> > > > Someone has to add V2 data pages to
> > > >
> > >
> > >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > > :)
> > >
> > > Your wish is my command:
> https://github.com/apache/parquet-site/pull/124
> > >
> > > As the format grows in popularity and momentum builds to evolve, I feel
> > the
> > > content on the parquet.apache.org site could use refreshing /
> updating.
> > > So, while I had the site open, I made some other PRs to scratch various
> > > itches
> > >
> > > (I am absolutely 🎣 for someone to please review 🙏):
> > >
> > > 1. Add Variant/Geometry/Geography types to implementation status
> matrix:
> > > https://github.com/apache/parquet-site/pull/123
> > > 2. Improve introduction / overview, add more links to spec and
> > > implementation status: https://github.com/apache/parquet-site/pull/125
> > >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou 
> > wrote:
> > >
> > > >
> > > > Hi Julien, hi all,
> > > >
> > > > On Mon, 20 Oct 2025 15:14:58 -0700
> > > > Julien Le Dem  wrote:
> > > > >
> > > > > Another question from me:
> > > > >
> > > > > Since the goal is to not use compression at all in this case (no
> > ZSTD)
> > > > > I'm assuming we would be using either:
> > > > > - the Data Page V1 with UNCOMPRESSED in the ColumnMetadata.column
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
> > > > >
> > > > > field.
> > > > > - the Data Page V2 with false in the DataPageHeaderV2.is_compressed
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
> > > > >
> > > > > field
> > > > > The second helping decide if we can selectively compress some pages
> > if
> > > > they
> > > > > are less compressed by the
> > > > > A few years ago there was a question on the support of the
> > DATA_PAGE_V2
> > > > and
> > > > > I was curious to hear a refresh on how that's generally supported
> in
> > > > > Parquet implementations. The is_compressed field was exactly
> intended
> > > to
> > > > > avoid block compression when the encoding itself is good enough.
> > > >
> > > > Someone has to add V2 data pages to
> > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > > :)
> > > >
> > > > C++, Java and Rust support them for sure. I feel like we should
> > > > probably default to V2 at some point.
> > > >
> > > > Also see https://github.com/apache/parquet-java/issues/3344 for
> Java.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > >
> > > > > Julien
> > > > >
> > > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb
> > > >  wrote:
> > > > >
> > > > > > Thanks again Prateek and co for pushing this along!
> > > > > >
> > > > > >
> > > > > > > 1. Design and write our own Parquet-ALP spec so that
> > > implementations
> > > > > > > know exactly how to encode and represent data
> > > > > >
> > > > > > 100% agree with this (similar to what was done for
> ParquetVariant)
> > > > > >
> > > > > > > 2. I may be missing something, but the paper doesn't seem to
> > > > mention
> > > > > > non-finite values (such as +/-Inf and NaNs).
> > > > > >
> > > > > > I think they are handled via the "Exception" mechanism. Vortex's
> > ALP
> > > > > > implementation (below) does appear to handle finite numbers[2]
> > > > > >
> > > > > > > 3. It seems there is a single implementation, which is the one
> > > > published
> > > > > > > together with the paper. It is not obvious that it will be
> > > > > > > maintained in the future, and reusing it is probably not an
> > option
> > > > for
> > > > > > > non-C++ Parquet implementations
> > > > > >
> > > > > > My understanding from the call was that Prateek and team
> > > re-implemented
> > > > > > ALP  (did not use the implementation from CWI[3]) but that would
> be
> > > > good to
> > > > > > confirm.
> > > > > >
> > > > > > There i

Re: Implementation status

2025-10-22 Thread Andrew Lamb
I think it is a great idea -- I can certainly add the columns, but as you
say only people from those companies would be able to fill them out.

Maybe if we added some columns that would add some (positive) pressure to
provide the information

Andrew

On Wed, Oct 22, 2025 at 2:50 PM Julien Le Dem  wrote:

> [forking into a new thread]
> Should we add columns for BigQuery, Databricks, Snowflake, Dremio, ...?
> I feel that this page is even more important for proprietary engines that
> we can not look at the implementation to check. (But they are important
> member of the ecosystem)
>
>
> On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb 
> wrote:
>
> > > Someone has to add V2 data pages to
> > >
> >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > :)
> >
> > Your wish is my command: https://github.com/apache/parquet-site/pull/124
> >
> > As the format grows in popularity and momentum builds to evolve, I feel
> the
> > content on the parquet.apache.org site could use refreshing / updating.
> > So, while I had the site open, I made some other PRs to scratch various
> > itches
> >
> > (I am absolutely 🎣 for someone to please review 🙏):
> >
> > 1. Add Variant/Geometry/Geography types to implementation status matrix:
> > https://github.com/apache/parquet-site/pull/123
> > 2. Improve introduction / overview, add more links to spec and
> > implementation status: https://github.com/apache/parquet-site/pull/125
> >
> >
> > Thanks,
> > Andrew
> >
> > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou 
> wrote:
> >
> > >
> > > Hi Julien, hi all,
> > >
> > > On Mon, 20 Oct 2025 15:14:58 -0700
> > > Julien Le Dem  wrote:
> > > >
> > > > Another question from me:
> > > >
> > > > Since the goal is to not use compression at all in this case (no
> ZSTD)
> > > > I'm assuming we would be using either:
> > > > - the Data Page V1 with UNCOMPRESSED in the ColumnMetadata.column
> > > > <
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
> > > >
> > > > field.
> > > > - the Data Page V2 with false in the DataPageHeaderV2.is_compressed
> > > > <
> > >
> >
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
> > > >
> > > > field
> > > > The second helping decide if we can selectively compress some pages
> if
> > > they
> > > > are less compressed by the
> > > > A few years ago there was a question on the support of the
> DATA_PAGE_V2
> > > and
> > > > I was curious to hear a refresh on how that's generally supported in
> > > > Parquet implementations. The is_compressed field was exactly intended
> > to
> > > > avoid block compression when the encoding itself is good enough.
> > >
> > > Someone has to add V2 data pages to
> > >
> > >
> >
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> > > :)
> > >
> > > C++, Java and Rust support them for sure. I feel like we should
> > > probably default to V2 at some point.
> > >
> > > Also see https://github.com/apache/parquet-java/issues/3344 for Java.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > >
> > > > Julien
> > > >
> > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb
> > >  wrote:
> > > >
> > > > > Thanks again Prateek and co for pushing this along!
> > > > >
> > > > >
> > > > > > 1. Design and write our own Parquet-ALP spec so that
> > implementations
> > > > > > know exactly how to encode and represent data
> > > > >
> > > > > 100% agree with this (similar to what was done for ParquetVariant)
> > > > >
> > > > > > 2. I may be missing something, but the paper doesn't seem to
> > > mention
> > > > > non-finite values (such as +/-Inf and NaNs).
> > > > >
> > > > > I think they are handled via the "Exception" mechanism. Vortex's
> ALP
> > > > > implementation (below) does appear to handle finite numbers[2]
> > > > >
> > > > > > 3. It seems there is a single implementation, which is the one
> > > published
> > > > > > together with the paper. It is not obvious that it will be
> > > > > > maintained in the future, and reusing it is probably not an
> option
> > > for
> > > > > > non-C++ Parquet implementations
> > > > >
> > > > > My understanding from the call was that Prateek and team
> > re-implemented
> > > > > ALP  (did not use the implementation from CWI[3]) but that would be
> > > good to
> > > > > confirm.
> > > > >
> > > > > There is also a Rust implementation of ALP[1] that is part of the
> > > Vortex
> > > > > file format implementation. I have not reviewed it to see if it
> > > deviates
> > > > > from the algorithm presented in the paper.
> > > > >
> > > > > Andrew
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > >
> >
> https://github.com/vortex-data/vortex/blob/534821969201b91985a8735b23fc0c415a425a56/encodings/alp/src/lib.rs
> > > > > [2]:
> > > > >
> > > > >
> > >
> >
> https://github.com