Pradeep Kamath commented on PIG-1014:

The jira is to track if it is possible to automatically convert a 
COUNT(relation) in the script to COUNT_STAR(relation) in the plan so that 
nullness of the fields in the records is not considered while returning the 
count. For example if a relation (A) has two fields and there is the following 
script snippet:
B = group A by $0;
C = foreach B generate group, COUNT(A);
This is equivalent to a count(*) after grouping on the first column in SQL. Per 
SQL semantics, COUNT(*) counts all records for the group without regard to the 
nullness of the individual fields. This behavior is achieved through COUNT_STAR 
built -in in pig. However COUNT built-in in pig is meant for counting a bag 
with a single column  (for example COUNT(A.$0)  above).  So the implementation 
in COUNT checks if the first field in the bag is null or not and only counts 
non null values. In the above script if the first column in the bag is null for 
any record, it does not get counted which would not be the same as the expected 
result for COUNT(*) in SQL. So if the compilation phase in pig can detect that 
the COUNT is being performed on a whole relation (rather than an individual 
column), it can replace the COUNT with COUNT_STAR and achieve the desired 

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: PIG-1014
>                 URL: https://issues.apache.org/jira/browse/PIG-1014
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Pradeep Kamath

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to