Parquet handles data encoding / compression a little bit different from most formats (it doesn't write a file, then compress the entire file's bytes, it instead individually compresses parts of the file in separate chunks (each page is compressed separately)). I think you might get a better answer to this question on the parquet mailing list. I don't know what would cause this off the top of my head though. It'd help to show all the settings used and specifically how the data appears to be corrupt / garbage.
On Thu, Apr 25, 2019 at 10:07 AM ybrovman via Scalding Development < [email protected]> wrote: > @Oscar > I am able to read in the data but the fixed_len_byte_array / DECIMAL type > fields produce garbage results, so I was wondering if it had to do with > snappy compression. The binary / UTF8 fields read correctly. > > Is there an example of how to read in the fixed_len_byte_array / DECIMAL type > field? > Thank you > > On Wednesday, April 24, 2019 at 8:05:10 PM UTC-4, [email protected] wrote: >> >> Does scalding-parquet library support reading in snappy compressed >> Parquet files? >> >> I are trying to read in Parquet files of the form: >> > hadoop jar parquet-tools-1.10.1.jar schema >> /my/path/part-00000.snappy.parquet >> message spark_schema { >> optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0)); >> optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0)); >> optional binary fieldName3 (UTF8); >> } >> >> I are using the following code: >> val fields = new Fields("fieldName1","fieldName2","fieldName3") >> ParquetTupleSource(fields, inputPath) >> .read >> .write(Tsv(outputPath)) >> >> The fieldName3 column output produces normal output that matches the >> input string, however, fieldName1 and fieldName2 columns produce garbage >> output. Does scalding-parquet library support snappy compressed Parquet >> files? Does it support reading fixed_len_byte_array type, how do I specify >> this in the TypedParquet setting? >> >> Thank you for your help! >> Best, >> Yuri >> >> -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Alex Levenson @THISWILLWORK -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
