Jörn,
I agree with you, but the vendor is a little difficult to work with. For now, I
will try to decompress it from S3 and save it plainly into HDFS. If someone
already has this example, please let me know.
Cheers,
Ben
> On Feb 13, 2017, at 9:50 AM, Jörn Franke wrote:
>
> Your vendor shoul
Your vendor should use the parquet internal compression and not take a parquet
file and gzip it.
> On 13 Feb 2017, at 18:48, Benjamin Kim wrote:
>
> We are receiving files from an outside vendor who creates a Parquet data file
> and Gzips it before delivery. Does anyone know how to Gunzip the
We are receiving files from an outside vendor who creates a Parquet data file
and Gzips it before delivery. Does anyone know how to Gunzip the file in Spark
and inject the Parquet data into a DataFrame? I thought using sc.textFile or
sc.wholeTextFiles would automatically Gunzip the file, but I’m