Pradeep Kamath updated PIG-691:

        Fix Version/s: types_branch
    Affects Version/s: types_branch
               Status: Patch Available  (was: Open)

Binstorage uses RECORD_1, RECORD_2 and RECORD_3 byte markers (the bytes 0x01, 
0x02, 0x03) as the beginning of a new record. The current bug in BinStorage is 
that in getNext(), the code looks for RECORD_1 and if it finds RECORD_1, it 
looks for RECORD_2. If it fails to find RECORD_2, it goes back to look for 
entire sequence starting with looking for RECORD_1. However this failes when we 
have the following sequence:RECORD_1-RECORD_1-RECORD_2-RECORD_3. After reading 
the second RECORD_1 in the above sequence, we should not look for RECORD_1 
again but start by looking for RECORD_2. This is an issue only when a record in 
binstorage spans two blocks and the part in the head of the second block has 
the above sequence. This can happen when the last field in the record is null 
(null is represented by the byte 0x01 which is RECORD_1). The attached patch 
fixes this issue.

> BinStorage skips tuples when ^A is present in data
> --------------------------------------------------
>                 Key: PIG-691
>                 URL: https://issues.apache.org/jira/browse/PIG-691
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
> Pradeep found a problem with BinStorage.getNext function that causes data 
> loss. He is working on the fix

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to