Re: skew join in pig

2010-06-21 Thread Gang Luo
题: Re: skew join in pig It's just whatever the hash function happens to do. By the time the "hot" keys are slotted to be spread among multiple reducers, they are no longer hot, so it doesn't matter if you put a few of the partitions in the same reducer. Remember, we mostly car

Re: skew join in pig

2010-06-21 Thread Dmitriy Ryaboy
1, 2, 3 for key2? Or 3, 4, 5, 6? What is the ideas > behind the allocation for all of the hot keys? > > Thanks, > -Gang > > > > > - 原始邮件 > 发件人: Alan Gates > 收件人: pig-dev@hadoop.apache.org > 发送日期: 2010/6/18 (周五) 2:46:09 下午 > 主 题: Re: skew join in pig &

Re: skew join in pig

2010-06-21 Thread Gang Luo
behind the allocation for all of the hot keys? Thanks, -Gang - 原始邮件 发件人: Alan Gates 收件人: pig-dev@hadoop.apache.org 发送日期: 2010/6/18 (周五) 2:46:09 下午 主 题: Re: skew join in pig Are you asking how many reducers are used to split a hot key? If so, the answer is as many as we

Re: skew join in pig

2010-06-18 Thread Alan Gates
g-dev@hadoop.apache.org 发送日期: 2010/6/16 (周三) 12:16:13 下午 主 题: Re: skew join in pig On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec ) 1. does the sampling job sample and build histogram on both tables,

Re: skew join in pig

2010-06-16 Thread Gang Luo
Alan Gates 收件人: pig-dev@hadoop.apache.org 发送日期: 2010/6/16 (周三) 12:16:13 下午 主 题: Re: skew join in pig On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: > Hi, > there is something confusing me in the skew join > (http://wiki.apache.org/pig/PigSkewedJoinSpec) > 1. does the sampling job

Re: skew join in pig

2010-06-16 Thread Dmitriy Ryaboy
On Wed, Jun 16, 2010 at 9:16 AM, Alan Gates wrote: > > > 4. for non-hot keys, my understanding is that they are shuffled to reducers >> based on default hash partitioner. However, it could happen all the keys >> shuffled to one reducers incurs skew even none of them is skewed >> individually. >>

Re: skew join in pig

2010-06-16 Thread Alan Gates
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec ) 1. does the sampling job sample and build histogram on both tables, or just one table (in this case, which one) ? Just the left one. 2. the join

skew join in pig

2010-06-16 Thread Gang Luo
Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec) 1. does the sampling job sample and build histogram on both tables, or just one table (in this case, which one) ? 2. the join job still take the two table as inputs, and shuffle tuples from partit