Rakesh Setty commented on PIG-794:

While trying to address the comment about eliminating the AvroValueReader, I 
noticed that the way pos (current position in the stream) is being handled is 
wrong. The position in the stream can only be handled by the ValueReader (Avro 
codebase) due to the non-standard (not making use of
DataOutput's methods to store data) way of storing data by Avro. For example, 
an integer can be stored in anywhere between 1 -
5 bytes while a long can be stored in anywhere between 1 - 10 bytes.
I think we have to ask the Avro team to support this (current position in the 
stream) for us to proceed with this. 

> Use Avro serialization in Pig
> -----------------------------
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>         Attachments: AvroBinStorage.patch
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to