Re: Does scalding-parquet library support reading in snappy compressed Parquet files?

ybrovman via Scalding Development Thu, 25 Apr 2019 10:07:58 -0700

@Oscar
I am able to read in the data but the fixed_len_byte_array / DECIMAL type 
fields produce garbage results, so I was wondering if it had to do with 
snappy compression. The binary / UTF8 fields read correctly.


Is there an example of how to read in the fixed_len_byte_array / DECIMAL type 
field?
Thank you

On Wednesday, April 24, 2019 at 8:05:10 PM UTC-4, [email protected] wrote:
>
> Does scalding-parquet library support reading in snappy compressed Parquet 
> files?
>
> I are trying to read in Parquet files of the form:
> > hadoop jar parquet-tools-1.10.1.jar schema 
> /my/path/part-00000.snappy.parquet
> message spark_schema {
>   optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0));
>   optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0));
>   optional binary fieldName3 (UTF8);
> }
>
> I are using the following code:
> val fields = new Fields("fieldName1","fieldName2","fieldName3")
> ParquetTupleSource(fields, inputPath)
>   .read
>   .write(Tsv(outputPath))
>
> The fieldName3 column output produces normal output that matches the input 
> string, however, fieldName1 and fieldName2 columns produce garbage output. 
> Does scalding-parquet library support snappy compressed Parquet files? 
> Does it support reading fixed_len_byte_array type, how do I specify this in 
> the TypedParquet setting?
>
> Thank you for your help!
> Best, 
> Yuri
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Does scalding-parquet library support reading in snappy compressed Parquet files?

Reply via email to