Hello, We are using the default compression codec for Parquet when we store our dataframes. The dataframe has a StringType column whose values can be upto several MBs large.
The funny thing is that once it's stored, we can browse the file content with a plain text editor and see large portions of the string contents unencrypted. If we use the parquet-tool to browse the metadata, it says the column is GZIP and the compression ratio is 2.6x, but that just doesn't seem right. Anybody know what's going on?