Re: [PERFORM] Query plan for very large number of joins

Richard Huxton Thu, 02 Jun 2005 09:13:43 -0700

[EMAIL PROTECTED] wrote:

Hi,


I am using PostgreSQL (7.4) with a schema that was generated
automatically (using hibernate). The schema consists of about 650
relations. One particular query (also generated automatically)
consists of left joining approximately 350 tables.


May I be the first to offer an "ouch"!

At this stage, most tables are empty and those with values have less
than 50 entries. The query takes about 90 seconds to execute (on a
P4, 2.6Ghz).

All of the relations have a primary key which is indexed and all of
the joins are on foreign keys which are explicitly declared. I've
checked the obvious tunables (effective_cache_size, shared_memory and
sort_buffer) but changing these has had no effect. The system has a
total of 750MB RAM, I've varied the shared memory up to 256MB and the
sort buffer up to 128MB without affecting the performance.


The sort-mem is the only thing I can see helping with a single query.

Running the query as a JDBC prepared statement indicates that the
query optimiser is spending a negligable amount of time on the task
(~ 36 ms) compared to the executor (~ 90 seconds). The output of
EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested
Loop Left Join" and all of the scans are of type "Seq Scan". I have
refrained from posting the query and the query plan since these are
80K and 100K apiece but if anyone wants to see them I can certainly
forward them on.

Well, if most tables are small then a seq-scan makes sense. Does it looklike it's estimating the number of rows badly anywhere? I'm not sure thelist will accept attachments that large - is it possible to upload themsomewhere accessible?

My (uninformed) suspicion is that the optimiser has failed over to
the default plan on the basis of the number of tables in the join. My
question is, is there anyone out there using PostgreSQL with this
size of schema? Is there anything that can be done to bring about the
order of magnitude increase in speed that I need?

Well - the genetic planner must surely be kicking in here (see therun-time configuration chapter of the manuals, query-planning,geqo_threshold). However, I'm not sure how much leeway there is inplanning a largely left-joined query.

It could be there's some overhead in the executor that's only noticablewith hundreds of tables involved, you're running at about 0.25 secs perjoin.

I take it you have no control over the schema or query, so there's notmuch fiddling you can do. You've tried sort_mem, so there are only twothings I can think of:1. Try the various enable_xxx config settings and see if disablingseq-scan or the relevant join-type does anything (I'm not sure it will)

2. Try against 8.0 - there may be some improvement there.

Other people on this list have experience on larger systems than me, sothey may be able to help more.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Query plan for very large number of joins

Reply via email to