Suppose I have two related tables:

CREATE TABLE events (
     id INTEGER PRIMARY KEY NOT NULL,
     place_id INTEGER
);

CREATE TABLE places (
     id INTEGER PRIMARY KEY NOT NULL,
     name TEXT
);

INSERT INTO places (id, name) VALUES (1, 'Place A');
INSERT INTO places (id, name) VALUES (2, 'Place B');
INSERT INTO places (id, name) VALUES (3, 'Place C');

INSERT INTO events (id, place_id) VALUES (1, 1);
INSERT INTO events (id, place_id) VALUES (2, 2);
INSERT INTO events (id, place_id) VALUES (3, 1);

Now, I want to count the number of 'events' at every 'place'. I started
with a simple join and a group by clause to perform the count:

SELECT name, count (*) AS count
FROM events LEFT JOIN places ON places.id = events.place_id
GROUP BY events.place_id;

name|count
Place A|2
Place B|1

It executes very fast (because the join can take advantage of the
primary key) but does not produce the desired result. As you can see,
not all places are included in the result:

So I changed swapped the tables in the join:

SELECT name, count (events.place_id) AS count
FROM places LEFT JOIN events ON events.place_id = places.id
GROUP BY places.id;

name|count
Place A|2
Place B|1
Place C|1

With this query, the correct result is returned, but the query runs much
slower. Probably because the join cannot use the primary key (according
to 'explain query plan', it is used for the sorting of the group by
clause now). Adding an index improves the speed

CREATE INDEX events_idx_place_id ON events (place_id);

Is there a difference between using a primary key or another index for a
join? And what is the advantage (if there is one) over not using a join
at all:

SELECT name, (SELECT count (*) FROM events WHERE events.place_id =
places.id) AS count FROM places;





-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to