Ankit Modi commented on PIG-960:

Thanks for comments Daniel.

1. PigLineRecordReader (PLRR) needs to know the type of InputStream it is 
handling. BZip2 or Uncompressed. Depending on the type of input stream it 
chooses which Reader to utilize. BPIS ( BufferedPositionedInputStream ) stores 
the input stream as a protected member. PLRR can access this via following 
ways: - making member public, - adding a get method to access it or - inherit.
I implemented the last one as it makes least changes to BPIS.
2. Good one. Will be fixed in next patch.
3. Will be added in next patch
4. Corrected in next patch.

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---------------------------------------------------------------------------
>                 Key: PIG-960
>                 URL: https://issues.apache.org/jira/browse/PIG-960
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Ankit Modi
>         Attachments: pig_rlr.patch
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to