@Oscar I am able to read in the data but the fixed_len_byte_array / DECIMAL type fields produce garbage results, so I was wondering if it had to do with snappy compression. The binary / UTF8 fields read correctly.
Is there an example of how to read in the fixed_len_byte_array / DECIMAL type field? Thank you On Wednesday, April 24, 2019 at 8:05:10 PM UTC-4, [email protected] wrote: > > Does scalding-parquet library support reading in snappy compressed Parquet > files? > > I are trying to read in Parquet files of the form: > > hadoop jar parquet-tools-1.10.1.jar schema > /my/path/part-00000.snappy.parquet > message spark_schema { > optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0)); > optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0)); > optional binary fieldName3 (UTF8); > } > > I are using the following code: > val fields = new Fields("fieldName1","fieldName2","fieldName3") > ParquetTupleSource(fields, inputPath) > .read > .write(Tsv(outputPath)) > > The fieldName3 column output produces normal output that matches the input > string, however, fieldName1 and fieldName2 columns produce garbage output. > Does scalding-parquet library support snappy compressed Parquet files? > Does it support reading fixed_len_byte_array type, how do I specify this in > the TypedParquet setting? > > Thank you for your help! > Best, > Yuri > > -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
