[ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834957#action_12834957
 ] 

Pradeep Kamath commented on PIG-1218:
-------------------------------------

+1 Patch mostly looks good - couple of comments:
 * In a couple of places instead of using Configuration and JobConf based on 
PigMapReduce.sJobConf, you should create a new Configiuration(false) and new 
JobConf(false) so we create fresh datastructures without any properties coming 
from the Map reduce based datastructures.
 * Since partitionFile is no longer used in POPartitionRearrange.java we should 
remove it.

You can make these changes and go ahead and commit it if it passes tests

> Use distributed cache to store samples
> --------------------------------------
>
>                 Key: PIG-1218
>                 URL: https://issues.apache.org/jira/browse/PIG-1218
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.7.0
>
>         Attachments: PIG-1218.patch, PIG-1218_2.patch
>
>
> Currently, in the case of skew join and order by we use sample that is just 
> written to the dfs (not distributed cache) and, as the result, get opened and 
> copied around more than necessary. This impacts query performance and also 
> places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to