On 1 January 2015 at 02:47, David Rowley <dgrowle...@gmail.com> wrote:
> Hi, > > I've been hacking a bit at the join code again... This time I've been > putting some effort into optimising the case where the inner side of the > join is known to be unique. > For example, given the tables: > > create table t1 (id int primary key); > create table t2 (id int primary key); > > And query such as: > > select * from t1 left outer join t2 on t1.id=t2.id; > > It is possible to deduce at planning time that "for each row in the outer > relation, only 0 or 1 rows can exist in the inner relation", (inner being > t2) > I've been hacking at this unique join idea again and I've now got it working for all join types -- Patch attached. Here's how the performance is looking: postgres=# create table t1 (id int primary key); CREATE TABLE postgres=# create table t2 (id int primary key); CREATE TABLE postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x); INSERT 0 1000000 postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x); INSERT 0 1000000 postgres=# vacuum analyze; VACUUM postgres=# \q Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id; With Patch on master as of 32bf6ee D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 78 latency average: 769.231 ms tps = 1.288260 (including connections establishing) tps = 1.288635 (excluding connections establishing) Master as of 32bf6ee D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 70 latency average: 857.143 ms tps = 1.158905 (including connections establishing) tps = 1.159264 (excluding connections establishing) That's a 10% performance increase. I still need to perform more thorough benchmarking with different data types. One weird thing that I noticed before is that in an earlier revision of the patch in the executor's join Initialise node code, I had set the unique_inner to true for semi joins and replaced the SEMI_JOIN check for a unique_join check in the execute node for each join method. With this the performance results barely changed from standard... I've yet to find out why. The patch also has added a property to the EXPLAIN (VERBOSE) output which states if the join was found to be unique or not. The patch also still requires a final pass of comment fix-ups. I've just plain run out of time for now. I'll pick this up in 2 weeks time. Regards David Rowley
unijoin_2015-01-31_9af9cfa.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers