Hi all, I have a large database with e-mail meta-data (no bodies) for over 100 million messages. I am running PostgreSQL 8.2.4 on a server with 2GB of RAM (shared_buffers = 240MB, temp_buffers = 128MB, work_mem = 256MB, maintenance_work_mem = 256MB). I have the data split in two separate tables, "email" and "email_extras":
Table "public.email"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | bigint | not null
load_id | integer | not null
ts | timestamp without time zone | not null
ip | inet | not null
mfrom | text | not null
helo | text |
Table "public.email_extras"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | bigint | not null
ts | timestamp without time zone | not null
size | integer | not null
hdr_from | text |
Each of these tables has been partitioned equally based on the "ts"
(timestamp) field into two dozen or so tables, each covering one week of
messages. For testing purposes, I have only one week's partition filled
for each of the "email" and "email_extras" tables (email_2007_week34
{,extras}).
Now if I perform the following simple join on the "email" and "email_
extras" tables ...
SELECT
count(*)
FROM
email
INNER JOIN email_extras USING (id, ts)
WHERE
mfrom <> hdr_from;
then I get the following horrendously inefficient plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=391396890.89..391396890.90 rows=1 width=0)
-> Merge Join (cost=9338881.64..349156398.02 rows=16896197148 width=0)
Merge Cond: ((public.email_extras.id = public.email.id) AND
(public.email_extras.ts = public.email.ts))
Join Filter: (public.email.mfrom <> public.email_extras.hdr_from)
-> Sort (cost=4592966.95..4658121.33 rows=26061752 width=48)
Sort Key: public.email_extras.id, public.email_extras.ts
-> Append (cost=0.00..491969.52 rows=26061752 width=48)
-> Seq Scan on email_extras (cost=0.00..13.30 rows=330
width=48)
-> Seq Scan on email_2007_week13_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week14_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week15_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week16_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week17_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week18_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week19_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week20_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week21_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week22_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week23_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week24_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week25_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week26_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week27_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week28_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week29_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week30_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week31_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week32_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week33_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week34_extras email_extras
(cost=0.00..491597.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week35_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week36_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week37_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week38_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week39_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week40_extras email_extras
(cost=0.00..13.30 rows=330 width=48)
-> Sort (cost=4745914.69..4811071.87 rows=26062872 width=48)
Sort Key: public.email.id, public.email.ts
-> Append (cost=0.00..644732.72 rows=26062872 width=48)
-> Seq Scan on email (cost=0.00..13.70 rows=370
width=48)
-> Seq Scan on email_2007_week13 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week14 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week15 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week16 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week17 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week18 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week19 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week20 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week21 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week22 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week23 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week24 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week25 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week26 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week27 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week28 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week29 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week30 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week31 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week32 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week33 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week34 email
(cost=0.00..644349.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week35 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week36 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week37 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week38 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week39 email
(cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week40 email
(cost=0.00..13.70 rows=370 width=48)
(68 rows)
However, if I restrict the query to just the partitions that actually do
have data in them ...
SELECT
count(*)
FROM
email_2007_week34
INNER JOIN email_2007_week34_extras USING (id, ts)
WHERE
mfrom <> hdr_from;
then I get a much better plan that uses a hash join:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4266338.94..4266338.95 rows=1 width=0)
-> Hash Join (cost=1111362.80..4266336.07 rows=1145 width=0)
Hash Cond: ((email_2007_week34.ts = email_2007_week34_extras.ts) AND
(email_2007_week34.id = email_2007_week34_extras.id))
Join Filter: (email_2007_week34.mfrom <>
email_2007_week34_extras.hdr_from)
-> Seq Scan on email_2007_week34 (cost=0.00..644349.12
rows=26052512 width=33)
-> Hash (cost=491597.12..491597.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week34_extras
(cost=0.00..491597.12 rows=26052512 width=33)
(7 rows)
Yes, I have `ANALYZE`d the database before running the queries.
How come the query planner gets thrown off that far by the simple table
partitioning? What can I do to put the query planner back on the right
track?
Julian.
signature.asc
Description: This is a digitally signed message part.
