Does scalding-parquet library support reading in snappy compressed Parquet
files?
I are trying to read in Parquet files of the form:
> hadoop jar parquet-tools-1.10.1.jar schema
/my/path/part-00000.snappy.parquet
message spark_schema {
optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0));
optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0));
optional binary fieldName3 (UTF8);
}
I are using the following code:
val fields = new Fields("fieldName1","fieldName2","fieldName3")
ParquetTupleSource(fields, inputPath)
.read
.write(Tsv(outputPath))
The fieldName3 column output produces normal output that matches the input
string, however, fieldName1 and fieldName2 columns produce garbage output.
Does scalding-parquet library support snappy compressed Parquet files? Does
it support reading fixed_len_byte_array type, how do I specify this in
the TypedParquet setting?
Thank you for your help!
Best,
Yuri
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.