[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705351#action_12705351
 ] 

Olga Natkovich commented on PIG-794:
------------------------------------

Hi Rakesh,

Thanks for the patch. A few comments below.

First, a few general comments:

(1) AVROBinStorage should not be in builtins. We don't want to expose to the 
end user because in the past we had issues with backward compatibility (with 
BinStorage) when the same function was used both internally ane externally.
(2) Every new file needs to have an apache license header. You can get one from 
a file in SVN.
(3) I would just call the class AVROStorage
(4) Once we are fully integrated with AVRO, we should at unit tests but for now 
this is fine
(5) It would be nice to have javadoc comments in the data. At a minimum a 
header for each class on what it does and each public method. Also, it would be 
good to document any non-obvious code.

Now, code related comments: what is the reason for having AVROValueReader. It 
seems to be a streight wrapper around ValueReader + position which we can keep 
track separately. I am concerned with the performance overhad that happens on 
each call. 


> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>         Attachments: AvroBinStorage.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to