2011/3/20 Igor Tatarinov <i...@decide.com>: > I have the following join that takes 4.5 hours (with 12 nodes) mostly > because of a single reduce task that gets the bulk of the work: > SELECT ... > FROM T > LEFT OUTER JOIN S > ON T.timestamp = S.timestamp and T.id = S.id > This is a 1:0/1 join so the size of the output is exactly the same as the > size of T (500M records). S is actually very small (5K). > I've tried: > - switching the order of the join conditions > - using a different hash function setting (jenkins instead of murmur) > - using SET set hive.auto.convert.join = true;
are you sure your query convert to mapjoin? if not,try use explicit mapjoin hint. > - using SET hive.optimize.skewjoin = true; > but nothing helped :( > Anything else I can try? > Thanks!