Johannes Schwenk created PIG-3042: ------------------------------------- Summary: Implement new SPLIT_DISTINCT relational operator Key: PIG-3042 URL: https://issues.apache.org/jira/browse/PIG-3042 Project: Pig Issue Type: New Feature Reporter: Johannes Schwenk
If DISTINCT would operate as a function we could do something like this {code} SPLIT data INTO new_entries IF COUNT(DISTINCT(*)) > 1, duplicate_entries OTHERWISE; {code} Since this is unfortunately not the case (see also PIG-826), I would like to propose a new SPLIT_DISTINCT (name is up for discussion) operator that acts in the way the above code intents. One would then just have to write: {code} SPLIT_DISTINCT data INTO new_entries, duplicate_entries; {code} Wanting to separate duplicates from the rest of e.g. log data, is a common scenario I think and the new operator would make this task a lot simpler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira