[
https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yan Zhou updated PIG-1207:
--------------------------
Status: Patch Available (was: Open)
> [zebra] Data sanity check should be performed at the end of writing instead
> of later at query time
> ---------------------------------------------------------------------------------------------------
>
> Key: PIG-1207
> URL: https://issues.apache.org/jira/browse/PIG-1207
> Project: Pig
> Issue Type: Improvement
> Reporter: Yan Zhou
> Assignee: Yan Zhou
> Attachments: PIG-1207.patch, PIG-1207.patch
>
>
> Currently the equity check of number of rows across different column groups
> are performed by the query. And the error info is sketchy and only emits a
> "Column groups are not evenly distributed", or worse, throws an
> IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd
> and BasicTable.getKey, which are called just before BasicTable.getValue, only
> checks the first column group in projection and any discrepancy of the number
> of rows per file cross multiple column groups in projection could have
> BasicTable.atEnd return false and BasicTable.getKey return a key normally
> but another column group already exaust its current file and the call to its
> CGScanner.getCGValue throw the exception.
> This check should also be performed at the end of writing and the error info
> should be more informational.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.