Parquet handles data encoding / compression a little bit different from
most formats (it doesn't write a file, then compress the entire file's
bytes, it instead individually compresses parts of the file in separate
chunks (each page is compressed separately)). I think you might get a
better answer to this question on the parquet mailing list. I don't know
what would cause this off the top of my head though.
It'd help to show all the settings used and specifically how the data
appears to be corrupt / garbage.

On Thu, Apr 25, 2019 at 10:07 AM ybrovman via Scalding Development <
[email protected]> wrote:

> @Oscar
> I am able to read in the data but the fixed_len_byte_array / DECIMAL type
> fields produce garbage results, so I was wondering if it had to do with
> snappy compression. The binary / UTF8 fields read correctly.
>
> Is there an example of how to read in the fixed_len_byte_array / DECIMAL type
> field?
> Thank you
>
> On Wednesday, April 24, 2019 at 8:05:10 PM UTC-4, [email protected] wrote:
>>
>> Does scalding-parquet library support reading in snappy compressed
>> Parquet files?
>>
>> I are trying to read in Parquet files of the form:
>> > hadoop jar parquet-tools-1.10.1.jar schema
>> /my/path/part-00000.snappy.parquet
>> message spark_schema {
>>   optional fixed_len_byte_array(8) fieldName1 (DECIMAL(18,0));
>>   optional fixed_len_byte_array(2) fieldName2 (DECIMAL(4,0));
>>   optional binary fieldName3 (UTF8);
>> }
>>
>> I are using the following code:
>> val fields = new Fields("fieldName1","fieldName2","fieldName3")
>> ParquetTupleSource(fields, inputPath)
>>   .read
>>   .write(Tsv(outputPath))
>>
>> The fieldName3 column output produces normal output that matches the
>> input string, however, fieldName1 and fieldName2 columns produce garbage
>> output. Does scalding-parquet library support snappy compressed Parquet
>> files? Does it support reading fixed_len_byte_array type, how do I specify
>> this in the TypedParquet setting?
>>
>> Thank you for your help!
>> Best,
>> Yuri
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Alex Levenson
@THISWILLWORK

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to