I am using PostgreSQL (7.4) with a schema that was generated
automatically (using hibernate). The schema consists of about 650
relations. One particular query (also generated automatically)
consists of left joining approximately 350 tables.

May I be the first to offer an "ouch"!

At this stage, most tables are empty and those with values have less
than 50 entries. The query takes about 90 seconds to execute (on a
P4, 2.6Ghz).

All of the relations have a primary key which is indexed and all of
the joins are on foreign keys which are explicitly declared. I've
checked the obvious tunables (effective_cache_size, shared_memory and
sort_buffer) but changing these has had no effect. The system has a
total of 750MB RAM, I've varied the shared memory up to 256MB and the
sort buffer up to 128MB without affecting the performance.

The sort-mem is the only thing I can see helping with a single query.

Running the query as a JDBC prepared statement indicates that the
query optimiser is spending a negligable amount of time on the task
(~ 36 ms) compared to the executor (~ 90 seconds). The output of
EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested
Loop Left Join" and all of the scans are of type "Seq Scan". I have
refrained from posting the query and the query plan since these are
80K and 100K apiece but if anyone wants to see them I can certainly
forward them on.

Well, if most tables are small then a seq-scan makes sense. Does it look like it's estimating the number of rows badly anywhere? I'm not sure the list will accept attachments that large - is it possible to upload them somewhere accessible?

My (uninformed) suspicion is that the optimiser has failed over to
the default plan on the basis of the number of tables in the join. My
question is, is there anyone out there using PostgreSQL with this
size of schema? Is there anything that can be done to bring about the
order of magnitude increase in speed that I need?

Well - the genetic planner must surely be kicking in here (see the run-time configuration chapter of the manuals, query-planning, geqo_threshold). However, I'm not sure how much leeway there is in planning a largely left-joined query.

It could be there's some overhead in the executor that's only noticable with hundreds of tables involved, you're running at about 0.25 secs per join.

I take it you have no control over the schema or query, so there's not much fiddling you can do. You've tried sort_mem, so there are only two things I can think of: 1. Try the various enable_xxx config settings and see if disabling seq-scan or the relevant join-type does anything (I'm not sure it will)
2. Try against 8.0 - there may be some improvement there.

Other people on this list have experience on larger systems than me, so they may be able to help more.
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to