Hello,

We are using the default compression codec for Parquet when we store our
dataframes. The dataframe has a StringType column whose values can be upto
several MBs large.

The funny thing is that once it's stored, we can browse the file content
with a plain text editor and see large portions of the string contents
unencrypted.

If we use the parquet-tool to browse the metadata, it says the column is
GZIP and the compression ratio is 2.6x, but that just doesn't seem right.

Anybody know what's going on?

Reply via email to