Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Lawrence, Ramon Thu, 19 Feb 2009 08:39:49 -0800

________________________________

From: [email protected] on behalf of Robert Haas
I think what we need here is some very simple testing to demonstrate
that this patch demonstrates a speed-up even when the inner side of
the join is a joinrel rather than a baserel.  Can you suggest a single
query against the skewed TPCH dataset that will result in two or more
multi-batch hash joins?  If so, it should be a simple matter to run
that query with and without the patch and verify that the former is
faster than the latter.


This query will have the outer relation be a joinrel rather than a baserel:
 
select count(*) from supplier, part, lineitem where l_partkey = p_partkey and 
s_suppkey = l_suppkey;
 
The approach collects statistics on the outer relation (not the inner relation) 
so the code had to have the ability to determine a stats tuple on a joinrel in 
addition to a baserel.
 
Joshua sent us some preliminary data with this query and others and indicated 
that we could post it.  He wanted time to clean it up and re-run some 
experiments, but the data is generally good and the algorithm performs as 
expected.  I have attached this data to the post.  Note that the last set of 
data (although labelled as Z7) is actually an almost zero skew database and 
represents the worst-case for the algorithm (for most queries the optimization 
is not even used).
 
--
Ramon Lawrence

JoshuaTolleyData.xls
Description: JoshuaTolleyData.xls

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Reply via email to