[
https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-1080:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
patch committed, thanks Richard!
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the
> blocks may not end at the exact record boundary. Special care is taken to
> ensure that all records are loaded by mappers (and exactly once), even for
> records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends
> at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.