Compression is generally supported by Hadoop directly and all input formats
can use it. Those compression options in hadoop are generally configured
with Configuration (String Key value pairs).

Did you try to run with snappy and hit an error?

We use snappy with Parquet at Stripe.

On Wed, Apr 24, 2019 at 17:05 ybrovman via Scalding Development <
[email protected]> wrote:

> Does scalding-parquet library support reading in snappy compressed Parquet
> files?
>
> I are trying to read in Parquet files of the form:
> > hadoop jar parquet-tools-1.10.1.jar schema
> /my/path/part-00000.snappy.parquet
> message spark_schema {
>   optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0));
>   optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0));
>   optional binary fieldName3 (UTF8);
> }
>
> I are using the following code:
> val fields = new Fields("fieldName1","fieldName2","fieldName3")
> ParquetTupleSource(fields, inputPath)
>   .read
>   .write(Tsv(outputPath))
>
> The fieldName3 column output produces normal output that matches the input
> string, however, fieldName1 and fieldName2 columns produce garbage output.
> Does scalding-parquet library support snappy compressed Parquet files?
> Does it support reading fixed_len_byte_array type, how do I specify this in
> the TypedParquet setting?
>
> Thank you for your help!
> Best,
> Yuri
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to