I don't have a sense of how conservative Parquet users generally are. Is it worth adding a LZ4_FRAMED compression option in the Parquet format, or would people just not use it?
Regards Antoine. On Tue, 30 Jun 2020 14:33:17 +0200 "Uwe L. Korn" <uw...@xhochy.com> wrote: > I'm also in favor of disabling support for now. Having to deal with broken > files or the detection of various incompatible implementations in the > long-term will harm more than not supporting LZ4 for a while. Snappy is > generally more used than LZ4 in this category as it has been available since > the inception of Parquet and thus should be considered as a viable > alternative. > > Cheers > Uwe > > On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: > > On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > > > > > Le 25/06/2020 à 00:02, Wes McKinney a écrit : > > > > hi folks, > > > > > > > > (cross-posting to dev@arrow and dev@parquet since there are > > > > stakeholders in both places) > > > > > > > > It seems there are still problems at least with the C++ implementation > > > > of LZ4 compression in Parquet files > > > > > > > > https://issues.apache.org/jira/browse/PARQUET-1241 > > > > https://issues.apache.org/jira/browse/PARQUET-1878 > > > > > > I don't have any particular opinion on how to solve the LZ4 issue, but > > > I'd like to mention that LZ4 and ZStandard are the two most efficient > > > compression algorithms available, and they span different parts of the > > > speed/compression spectrum, so it would be a pity to disable one of them. > > > > > > > It's true, however I think it's worse to write LZ4-compressed files > > that cannot be read by other Parquet implementations (if that's what's > > happening as I understand it?). If we are indeed shipping something > > broken then we either should fix it or disable it until it can be > > fixed. > > > > > Regards > > > > > > Antoine. > > >