题: Re: skew join in pig
It's just whatever the hash function happens to do. By the time the "hot"
keys are slotted to be spread among multiple reducers, they are no longer
hot, so it doesn't matter if you put a few of the partitions in the same
reducer. Remember, we mostly car
1, 2, 3 for key2? Or 3, 4, 5, 6? What is the ideas
> behind the allocation for all of the hot keys?
>
> Thanks,
> -Gang
>
>
>
>
> - 原始邮件
> 发件人: Alan Gates
> 收件人: pig-dev@hadoop.apache.org
> 发送日期: 2010/6/18 (周五) 2:46:09 下午
> 主 题: Re: skew join in pig
&
behind
the allocation for all of the hot keys?
Thanks,
-Gang
- 原始邮件
发件人: Alan Gates
收件人: pig-dev@hadoop.apache.org
发送日期: 2010/6/18 (周五) 2:46:09 下午
主 题: Re: skew join in pig
Are you asking how many reducers are used to split a hot key? If so, the
answer is as many as we
g-dev@hadoop.apache.org
发送日期: 2010/6/16 (周三) 12:16:13 下午
主 题: Re: skew join in pig
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote:
Hi,
there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec
)
1. does the sampling job sample and build histogram on both tables,
Alan Gates
收件人: pig-dev@hadoop.apache.org
发送日期: 2010/6/16 (周三) 12:16:13 下午
主 题: Re: skew join in pig
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote:
> Hi,
> there is something confusing me in the skew join
> (http://wiki.apache.org/pig/PigSkewedJoinSpec)
> 1. does the sampling job
On Wed, Jun 16, 2010 at 9:16 AM, Alan Gates wrote:
>
>
> 4. for non-hot keys, my understanding is that they are shuffled to reducers
>> based on default hash partitioner. However, it could happen all the keys
>> shuffled to one reducers incurs skew even none of them is skewed
>> individually.
>>
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote:
Hi,
there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec
)
1. does the sampling job sample and build histogram on both tables,
or just one table (in this case, which one) ?
Just the left one.
2. the join
Hi,
there is something confusing me in the skew join
(http://wiki.apache.org/pig/PigSkewedJoinSpec)
1. does the sampling job sample and build histogram on both tables, or just one
table (in this case, which one) ?
2. the join job still take the two table as inputs, and shuffle tuples from
partit