On Mon, Jul 6, 2020 at 11:08 AM Antoine Pitrou wrote:
>
>
> Le 06/07/2020 à 17:57, Steve Kim a écrit :
> > The Parquet format specification is ambiguous about the exact details of
> > LZ4 compression. However, the *de facto* reference implementation in Java
> > (parquet-mr) uses the Hadoop LZ4 cod
> Would that keep compatibility with existing files produces by Parquet C++?
Changing the lz4 implementation to be compatible with parquet-mr/hadoop
would break compatibility with any existing files that were written by
Parquet C++ using lz4 compression. I believe that it is not possible to
reliab
Le 06/07/2020 à 17:57, Steve Kim a écrit :
> The Parquet format specification is ambiguous about the exact details of
> LZ4 compression. However, the *de facto* reference implementation in Java
> (parquet-mr) uses the Hadoop LZ4 codec.
>
> I think that it is important for Parquet c++ to have com
The Parquet format specification is ambiguous about the exact details of
LZ4 compression. However, the *de facto* reference implementation in Java
(parquet-mr) uses the Hadoop LZ4 codec.
I think that it is important for Parquet c++ to have compatibility and
feature parity with parquet-mr when poss
I don't have a sense of how conservative Parquet users generally are.
Is it worth adding a LZ4_FRAMED compression option in the Parquet
format, or would people just not use it?
Regards
Antoine.
On Tue, 30 Jun 2020 14:33:17 +0200
"Uwe L. Korn" wrote:
> I'm also in favor of disabling support f
I'm also in favor of disabling support for now. Having to deal with broken
files or the detection of various incompatible implementations in the long-term
will harm more than not supporting LZ4 for a while. Snappy is generally more
used than LZ4 in this category as it has been available since th
On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou wrote:
>
>
> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> > hi folks,
> >
> > (cross-posting to dev@arrow and dev@parquet since there are
> > stakeholders in both places)
> >
> > It seems there are still problems at least with the C++ implementatio
Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> hi folks,
>
> (cross-posting to dev@arrow and dev@parquet since there are
> stakeholders in both places)
>
> It seems there are still problems at least with the C++ implementation
> of LZ4 compression in Parquet files
>
> https://issues.apache.or
hi folks,
(cross-posting to dev@arrow and dev@parquet since there are
stakeholders in both places)
It seems there are still problems at least with the C++ implementation
of LZ4 compression in Parquet files
https://issues.apache.org/jira/browse/PARQUET-1241
https://issues.apache.org/jira/browse/P