Re: skew join optimization

Jov Sun, 20 Mar 2011 06:07:54 -0700

2011/3/20 Igor Tatarinov <i...@decide.com>:
> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> because of a single reduce task that gets the bulk of the work:
> SELECT ...
> FROM T
> LEFT OUTER JOIN S
> ON T.timestamp = S.timestamp and T.id = S.id
> This is a 1:0/1 join so the size of the output is exactly the same as the
> size of T (500M records). S is actually very small (5K).
> I've tried:
> - switching the order of the join conditions
> - using a different hash function setting (jenkins instead of murmur)
> - using SET set hive.auto.convert.join = true;


are you sure your query convert to mapjoin? if not,try use explicit
mapjoin hint.


> - using SET hive.optimize.skewjoin = true;
> but nothing helped :(
> Anything else I can try?
> Thanks!

Re: skew join optimization

Reply via email to