Skewed join sampler misses out the key with the highest frequency
-----------------------------------------------------------------

                 Key: PIG-1264
                 URL: https://issues.apache.org/jira/browse/PIG-1264
             Project: Pig
          Issue Type: Bug
            Reporter: Sriranjan Manjunath
            Assignee: Richard Ding
             Fix For: 0.7.0


I am noticing two issues with the sampler used in skewed join:
1. It does not allocate multiple reducers to the key with the highest frequency.
2. It seems to be allocating the same number of reducers to every key (8 in 
this case).

Query:

a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
b = load 'votertab10k' as (name, age, registration, contributions);
e = join a by name right, b by name using "skewed" parallel 8;
store e into 'SkewedJoin_9.out';


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to