On Saturday, May 30, 2015, Glen M. Witherington <g...@fea.st> wrote: > Sorry about the horrendous subject, let me explain by example: > > Let's take this schema: > > > ``` > CREATE TABLE a ( > id bigserial PRIMARY KEY, > created_at timestamp with time zone NOT NULL DEFAULT NOW() > ); > > CREATE TABLE b( > id bigserial PRIMARY KEY, > a_id bigint NOT NULL REFERENCES a(id), > created_at timestamp with time zone NOT NULL DEFAULT NOW() > ); > > CREATE TABLE c( > id bigserial PRIMARY KEY, > b_id bigint NOT NULL REFERENCES b(id), > created_at timestamp with time zone NOT NULL DEFAULT NOW() > ); > ``` > > And let's fill it up with some dummy data, that roughly matches the > distribution of mine: > > ``` > INSERT INTO a SELECT FROM generate_series(1, 5); > INSERT INTO b(a_id) SELECT (i % 5) + 1 FROM generate_series(1, 100) i; > INSERT INTO c(b_id) SELECT (trunc(random() * 100)+1) FROM > generate_series(1, 1000000); > ``` > > > And here's the query I want to do, efficiently: > > ```` > SELECT * FROM c > JOIN b ON b.id = c.b_id > JOIN a ON a.id = b.a_id > WHERE a.id = 3 > ORDER BY b.created_at DESC > LIMIT 10 > ``` > > There seems to simply be no index I can put on the data, that allows me > to run this query efficiently. Four hours of playing with this, the only > solution I can see is, normalizing table `c` by adding a field "b's > a_id" and then creating an index on (bs_a_id, created_at). > > But surely there's a better solution? > > This is one problem with using made up surrogate keys...
The PK of A is a component of both the PK of B and the PK of C but you throw that information away by using serial fields for PKs instead. You should have unique indexes on B and C that incorporate the ID from A and then indeed you will end up with a join sequence that can be executed against efficiently. All that said you really should put indexes on the foreign keys... I haven't run the code to see the actual plan.... David J.