Alan Gates updated PIG-599:

    Attachment: loadperf.patch

This patch changes BufferedPositionedInputStream to wrap a BufferedInputStream 
around the provided InputStream.  It also adds a new constructor for 
DefaultTuple (and new calls in TupleFactory) that take an ArrayList<Object> and 
use that directly to construct the DefaultTuple instead of copying the list (as 
was done previously).  In a run of the pig mix queries these changes made most 
queries about 25-40% faster.

> BufferedPositionedInputStream isn't buffered
> --------------------------------------------
>                 Key: PIG-599
>                 URL: https://issues.apache.org/jira/browse/PIG-599
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>         Attachments: loadperf.patch
> org.apache.pig.impl.io.BufferedPositionedInputStream is not actually 
> buffered.  This is because it sits atop a FSDataInputStream (somewhere down 
> the stack), which is buffered.  So to avoid double buffering, which can be 
> bad, BufferedPositionedInputStream was written without buffering.  But the 
> FSDataInputStream is far enough down the stack that it is still quite costly 
> to call read() individually for each byte.  A run through a profiler shows 
> that a fair amount of time is being spent in 
> BufferedPositionedInputStream.read().

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to