Dear all,
First of all, I should apologize if my email doesn't follow all the guidelines.
I'm trying to do that though!

If referencing to links is OK, you can find the full description of
the issue at:
http://dba.stackexchange.com/questions/127082/postgresql-seems-to-create-inefficient-plans-in-simple-conditional-joins

It contains table definitions, queries, explan/explan analyze for them, and
a description of test conditions. But I'll provide a summary of the planning
issue below.

I'm using postgresql 9.3. I've run VACCUME ANALYZE on DB and it is
not modified after that.

Consider these tables:
CREATE TABLE t1
(
  id bigint NOT NULL DEFAULT nextval('ids_seq'::regclass),
  total integer NOT NULL,
  price integer NOT NULL,
  CONSTRAINT pk_t1 PRIMARY KEY (id)
)

CREATE TABLE t2
(
  id bigint NOT NULL,
  category smallint NOT NULL,
  CONSTRAINT pk_t2 PRIMARY KEY (id),
  CONSTRAINT fk_id FOREIGN KEY (id)
      REFERENCES t1 (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

Personally, I expect both queries below to perform exactly the same:

SELECT
    t1.id, *
FROM
    t1
INNER JOIN
    t2 ON t1.id = t2.id
    where t1.id > -9223372036513411363;

And:

SELECT
    t1.id, *
FROM
    t1
INNER JOIN
    t2 ON t1.id = t2.id
    where t1.id > -9223372036513411363 and t2.id > -9223372036513411363;

Unfortunately, they do not. PostgreSQL creates different plans for these
queries, which results in very poor performance for the first one compared
to the second (What I'm testing against is a DB with around 350 million
rows in t1, and slightly less in t2).

EXPLAIN output:
First query: http://explain.depesz.com/s/uauk
Second query: link: http://explain.depesz.com/s/uQd

The problem with the plan for the first query is that it limits
index scan on t1 with the where condition, but doesn't do so for t2.

A similar behavior happens if you replace INNER JOIN with LEFT JOIN,
and if you use "USING (id) where id > -9223372036513411363" instead
of "ON ...".

But it is important to get the first query right. Consider that I want to create
a view on SELECT statement (without condition) to simplify creating queries on
the data. If providing a single id column in the view, a SELECT query
on the view
with such a condition on id column will result in a query similar to
the first one.
With this problem, I should provide both ID columns in the view so that queries
can add each condition on ID column for both of them. Now assume what happens
when we are joining many tables together with ID column...

Is there anything wrong with my queries or with me expecting both queries to be
the sam? Can I do anything so that PostgreSQL will behave similarly for the
first query? Or if this is fixed in newer versions?

Thanks in advance,
Hedayat


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to