[ 
https://issues.apache.org/jira/browse/PIG-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712200#action_12712200
 ] 

Pradeep Kamath commented on PIG-814:
------------------------------------

The patch also contains a simple fix for enable split by 'file' in the load 
statement - in this case, pig should not try to split the input file by block 
size, but process the entire file in a map.

> Make Binstorage more robust when data contains record markers
> -------------------------------------------------------------
>
>                 Key: PIG-814
>                 URL: https://issues.apache.org/jira/browse/PIG-814
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.2.1
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.3.0
>
>         Attachments: PIG-814.patch
>
>
> When the inputstream for BinStorage is at a position where the data has the 
> record marker sequence, the code incorrectly assumes that it is at the 
> beginning of a record (tuple) and calls DataReaderWriter.readDatum() trying 
> to read the tuple. The problem is more likely when RandomSampleLoader (used 
> in order by implementation) skips the input stream for sampling and calls 
> Binstorage.getNext(). The code should be more robust in such cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to