[zebra] Data sanity check should be performed at the end  of writing instead of 
later at query time
---------------------------------------------------------------------------------------------------

                 Key: PIG-1207
                 URL: https://issues.apache.org/jira/browse/PIG-1207
             Project: Pig
          Issue Type: Improvement
            Reporter: Yan Zhou


Currently the equity check of number of rows across different column groups are 
performed by the query. And the error info is sketchy and only emits a "Column 
groups are not evenly distributed", or worse,  throws an IndexOufOfBound 
exception from CGScanner.getCGValue since BasicTable.atEnd and 
BasicTable.getKey, which are called just before BasicTable.getValue, only 
checks the first column group in projection and any discrepancy of the number 
of rows per file cross multiple column groups in projection could have  
BasicTable.atEnd  return false and BasicTable.getKey return a key normally but 
another column group already exaust its current file and the call to its 
CGScanner.getCGValue throw the exception. 

This check should also be performed at the end of writing and the error info 
should be more informational.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to