[jira] [Created] (PIG-3042) Implement new SPLIT_DISTINCT relational operator

Johannes Schwenk (JIRA) Thu, 08 Nov 2012 05:43:24 -0800

Johannes Schwenk created PIG-3042:
-------------------------------------

             Summary: Implement new SPLIT_DISTINCT relational operator
                 Key: PIG-3042
                 URL: https://issues.apache.org/jira/browse/PIG-3042
             Project: Pig
          Issue Type: New Feature
            Reporter: Johannes Schwenk



If DISTINCT would operate as a function we could do something like this

{code}
SPLIT data INTO
    new_entries IF COUNT(DISTINCT(*)) > 1,
    duplicate_entries OTHERWISE;
{code}

Since this is unfortunately not the case (see also PIG-826), I would like to 
propose a new SPLIT_DISTINCT (name is up for discussion) operator that acts in 
the way the above code intents. One would then just have to write:

{code}
SPLIT_DISTINCT data INTO new_entries, duplicate_entries;
{code}

Wanting to separate duplicates from the rest of e.g. log data, is a common 
scenario I think and the new operator would make this task a lot simpler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-3042) Implement new SPLIT_DISTINCT relational operator

Reply via email to