On 1 January 2015 at 02:47, David Rowley <dgrowle...@gmail.com> wrote:

> Hi,
>
> I've been hacking a bit at the join code again... This time I've been
> putting some effort into  optimising the case where the inner side of the
> join is known to be unique.
> For example, given the tables:
>
> create table t1 (id int primary key);
> create table t2 (id int primary key);
>
> And query such as:
>
> select * from t1 left outer join t2 on t1.id=t2.id;
>
> It is possible to deduce at planning time that "for each row in the outer
> relation, only 0 or 1 rows can exist in the inner relation", (inner being
> t2)
>

I've been hacking at this unique join idea again and I've now got it
working for all join types -- Patch attached.

Here's how the performance is looking:

postgres=# create table t1 (id int primary key);
CREATE TABLE
postgres=# create table t2 (id int primary key);
CREATE TABLE
postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# vacuum analyze;
VACUUM
postgres=# \q

Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id;

With Patch on master as of 32bf6ee

D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 78
latency average: 769.231 ms
tps = 1.288260 (including connections establishing)
tps = 1.288635 (excluding connections establishing)

Master as of 32bf6ee

D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 70
latency average: 857.143 ms
tps = 1.158905 (including connections establishing)
tps = 1.159264 (excluding connections establishing)

That's a 10% performance increase.

I still need to perform more thorough benchmarking with different data
types.

One weird thing that I noticed before is that in an earlier revision of the
patch in the executor's join Initialise node code, I had set the
unique_inner to true for semi joins and replaced the SEMI_JOIN check for a
unique_join check in the execute node for each join method. With this the
performance results barely changed from standard... I've yet to find out
why.

The patch also has added a property to the EXPLAIN (VERBOSE) output which
states if the join was found to be unique or not.

The patch also still requires a final pass of comment fix-ups. I've just
plain run out of time for now.

I'll pick this up in 2 weeks time.

Regards

David Rowley

Attachment: unijoin_2015-01-31_9af9cfa.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to