reducing number of MR stages with ORDER BY ------------------------------------------
Key: PIG-791 URL: https://issues.apache.org/jira/browse/PIG-791 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich When an order by is not the only operation in a pig script, it is done in two additional MR jobs. The first job samples using a sampling loader, the second does the sort. The sample is used to construct a partitioner that equally balances the data in the sort. The sampler needs to be changed to be a EvalFunc instead of a loader. This way a split can be but in the proceeding MR job, with the main data being written out and the other part flowing to the sampler func, which can then write out the sample. The final MR job can then be the sort. This change depends on multiquery code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.