James,

> -----Original Message-----
> From: James Turton <[email protected]>

> Zip is a file format, not a codec.  Various codecs are employed in Zip 
> archives,
> most commonly DEFLATE.  The different set of codecs that are supported in
> the Parquet file format are described in https://github.com/apache/parquet-
> format/blob/master/Compression.md.

Thanks for the link, the problem is that often the codec and the file format 
are synonymous, so people like myself don't make the distinction.

Not helping is the Drill use of the ambiguous "Compression Type" terminology 
rather than "codec" in the Drill options.


> Since, then, Zip is not sensible or possible inside a Parquet file, the only 
> way to
> effect what you describe would be to embed a Parquet file inside a Zip
> archive.  This would be perverse and misguided but possibly still queryable
> since Drill might transparently do the right things to decode it anyway.  
> Using a
> supported codec within the Parquet file format and forgetting about Zip is
> certainly a better approach.

Might seem perverse to you, however, given that that "zip compression" support 
for text file was added in v1.17.0 (DRILL-5674)*, I think it is a reasonable 
question to ask about support for Parquet files.

*there were no details on which of the codecs are supported.


>  If you want compression ratios comparable to
> those found in Zip files then you would choose GZip and pay with CPU
> cycles.  When Drill gains support for Zstandard there will be little reason to
> choose anything else.

This is another area of confusion, if Parquet provides support for ZSTD (as 
well as other codecs) why doesn't Drill?  

Isn't there a standard "Parquet Library" that is available which enables 
Parquet file support with all "features", which any project implementing 
Parquet file support would use?



> 
> On 2021/06/17 18:59, Leyne, Sean wrote:
> > Luoc,
> >
> >>    Could you please tell me first which case you are talking about?
> >> Only write(CTAS syntax) or read(SELECT)?
> > Really both, since you need a mechanism to create the zip'd parquet file to
> begin with.  Having to create a special/side process to zip the file outside 
> of
> drill would be ... awkward.
> >
> >
> > Sean
> >
> >>> 在 2021年6月16日,02:26,Leyne, Sean
> >> <[email protected]> 写道:
> >>> All,
> >>>
> >>> The documentation describes that gzip/gz compression as supported
> >>> for
> >> text files, and that snappy and gzip are support for parquet files.
> >>> I have also read that zip compression was also added (though not
> >> documented) for text files.
> >>>
> >>> But is zip also supported for parquet files?
> >>>
> >>> What about support for other compression algorithms/methods?  LZ4?
> >> Bzip2? zstd??
> >>>
> >>> Sean
> >>>
> >>>
> >>>

Reply via email to