Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane <[EMAIL PROTECTED]> confessed:

> Robert Creager <[EMAIL PROTECTED]> writes:
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
> 
> > ...
> > ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
> >   ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> > Hash Cond: ("outer".pair_id = "inner".pair_id)
> > ...
> 
> It chose a nested loop here because it was only expecting 67 rows out of
> the next-lower join, and so it thought it would only need 67 repetitions
> of the index probe into obs_v_file_id_index.  I'm suspicious that that
> estimate was way low and so the nestloop is taking forever.  You might
> try "SET enable_nestloop = off" as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151 different
pair_id's in the pairs table, which equates to 302 entries in the files table
(part of the query), which moves on to 533592 entries in the obs_v table and
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532, color_groups
147. Only the obs_v and obs_i tables have data of any significant quantities
with 10M rows apiece.  The trigger hitting the catalog table (875499 entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on an
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available memory
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
->  Hash Join  (cost=10307.51..100266886.39 rows=77113 width=59)
  Hash Cond: ("outer".file_id = "inner".file_id)
  ->  Seq Scan on obs_v  (cost=0.00..213854.50 rows=10390650 width=5
1)  ->  Hash  (cost=10307.34..10307.34 rows=67 width=12)
->  Hash Join  (cost=10250.69..10307.34 rows=67
width=12)  Hash Cond: ("outer".pair_id =
"inner".pair_id)  ->  Seq Scan on zero_pair zp 
(cost=0.00..43.32 rows=2532 width=8)  ->  Hash 
(cost=10250.40..10250.40 rows=118 width=12) 
  ->  Hash Join  (cost=10004.80..10250.40 rows=118 width=12)
 Hash Cond: ("outer".night_id = "inner".night_id)   
  ->  Seq Scan on files f  (cost=0.00..199.28
rows=9028 width=12)  ->  Hash 
(cost=10004.80..10004.80rows=1 width=8) 
  ->  Nested Loop  (cost=1.00..10004.80 rows=1 width=8) 
->  Seq Scan on color_groups cg 
(cost=0.00..2.84 rows=1 width=8)
   Filter: (175 = group_id) 
->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)  
 Filter: (group_id = 175)



-- 
 20:48:23 up 5 days, 23:14,  2 users,  load average: 2.56, 2.91, 2.78
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


color.explain
Description: Binary data


pgp9Z2BOS5FKm.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 21:41:50 -0500),
Kevin Barnard <[EMAIL PROTECTED]> confessed:

> On Tue, 28 Sep 2004 20:21:40 -0600, Robert Creager
> <[EMAIL PROTECTED]> wrote:
> > 
> > The trigger keeps another table (catalog) up to date with the information
> > from the obs_v and obs_i tables.  There are no direct insert/update/delete's
> > on the catalog table, only though the trigger.
> > 
> 
> It's possible that the update to catalog is what is really taking a
> long time.  You might wish to try and explain that query just to make
> sure.  You might also wish to disable to trigger just to rule it out. 
> Does catalog have any triggers on it?  Does it have any foreign keys?

A select on the catalog is really quick (54ms on a random query - ~1M entries).  The 
updates use the index.  The catalog table has no triggers or foreign keys.  The 
trigger on the obs_? tables manages the catalog table.

tassiv=# \d catalog
Table "public.catalog"
  Column  |   Type   |Modifiers
--+--+-
 star_id  | integer  | not null default nextval('"star_id_seq"'::text)
 loc_count| integer  | default 0
 loc  | spoint   | not null
 ra_sum   | double precision | default 0
 ra_sigma | real | default 0
 ra_sum_square| double precision | default 0
 dec_sum  | double precision | default 0
 dec_sigma| real | default 0
 dec_sum_square   | double precision | default 0
 mag_u_count  | integer  | default 0
 mag_u| real | default 99
 mag_u_sum| double precision | default 0
 mag_u_sigma  | real | default 0
 mag_u_sum_square | double precision | default 0
 mag_b_count  | integer  | default 0
 mag_b| real | default 99
 mag_b_sum| double precision | default 0
 mag_b_sigma  | real | default 0
 mag_b_sum_square | double precision | default 0
 mag_v_count  | integer  | default 0
 mag_v| real | default 99
 mag_v_sum| double precision | default 0
 mag_v_sigma  | real | default 0
 mag_v_sum_square | double precision | default 0
 mag_r_count  | integer  | default 0
 mag_r| real | default 99
 mag_r_sum| double precision | default 0
 mag_r_sigma  | real | default 0
 mag_r_sum_square | double precision | default 0
 mag_i_count  | integer  | default 0
 mag_i| real | default 99
 mag_i_sum| double precision | default 0
 mag_i_sigma  | real | default 0
 mag_i_sum_square | double precision | default 0
Indexes:
"catalog_pkey" primary key, btree (star_id)
"catalog_ra_decl_index" gist (loc)


-- 
 21:44:49 up 6 days, 11 min,  2 users,  load average: 2.03, 2.17, 2.39
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpDWj4ILoBx5.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Kevin Barnard
On Tue, 28 Sep 2004 20:21:40 -0600, Robert Creager
<[EMAIL PROTECTED]> wrote:
> 
> The trigger keeps another table (catalog) up to date with the information from the 
> obs_v and obs_i tables.  There are no direct insert/update/delete's on the catalog 
> table, only though the trigger.
> 

It's possible that the update to catalog is what is really taking a
long time.  You might wish to try and explain that query just to make
sure.  You might also wish to disable to trigger just to rule it out. 
Does catalog have any triggers on it?  Does it have any foreign keys?

I've shot myself in the foot on this before which is the only reason I
ask about it.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 16:55:13 +0200),
Gaetano Mendola <[EMAIL PROTECTED]> confessed:

> Robert Creager wrote:
> > Help?
> > 
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
> > 
> > Can it be helped?
> 
> 
> When I see this usually means that tables are full of
> dead rows. Did you vacuum you DB. Which version are you
> using ?
> 

Gee, the two questions I realized I forgot to answer going into work ;-)  I  run
pg_autovacuum, and it's working.  Even ran a FULL ANALYZE, no help.  The version
is 7.4.1.

Cheers,
Rob 

-- 
 20:22:11 up 5 days, 22:48,  2 users,  load average: 2.16, 2.18, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpqHwDKGGx6C.pgp
Description: PGP signature


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager
When grilled further on (Tue, 28 Sep 2004 09:28:47 -0500),
Kevin Barnard <[EMAIL PROTECTED]> confessed:

> What does observations_trigger do?
> 

The trigger keeps another table (catalog) up to date with the information from the 
obs_v and obs_i tables.  There are no direct insert/update/delete's on the catalog 
table, only though the trigger.

-- 
 19:56:54 up 5 days, 22:23,  2 users,  load average: 2.46, 2.27, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


pgpuGE1HUtiSw.pgp
Description: PGP signature


Re: [PERFORM] Caching of Queries

2004-09-28 Thread Matthew Nuzum
I could spend a week or two tweaking the performance of my database servers
and probably make some sizeable improvements, but I'm not going to.

Why? Because PostgreSQL screams as it is.

I would make sure that if the consensus is to add some sort of caching that
it be done only if there is no hit to current performance and stability.
That being said, I think that server side caching has major buzz and there's
nothing wrong with adding features that sell.

I will disagree with 3 points made on the argument against caching.
Specifically, the benefit of doing caching on the db server is that the
benefits may be reaped by multiple clients where as caching on the client
side must be done by each client and may not be as effective.

So what if the caching has a slight chance of returning stale results?  Just
make sure people know about it in advance.  There are some things where
stale results are no big deal and if I can easily benefit from an aggressive
caching system, I will (and I do now with the adodb caching library, but
like I said, caching has to be done for each client).  In fact, I'm all for
using a low-tech cache expiration algorithm to keep complexity down.

Finally, if the caching is not likely to help (or may even hurt) simple
queries but is likely to help complex queries then fine, make sure people
know about it and let them decide if they can benefit. 

Sorry if I'm beating a dead horse or playing the devil's advocate.  Just
felt compelled to chime in.

-- 
Matthew Nuzum  + "Man was born free, and everywhere
www.bearfruit.org  :  he is in chains," Rousseau
+~~+ "Then you will know the truth, and 
the TRUTH will set you free," Jesus Christ (John 8:32 NIV)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Monday, September 27, 2004 1:19 AM
To: Neil Conway
Cc: Aaron Werman; Scott Kirkwood; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Caching of Queries

Neil Conway <[EMAIL PROTECTED]> writes:
> I think the conclusion of past discussions about this feature is that
> it's a bad idea. Last I checked, MySQL has to clear the *entire* query
> cache when a single DML statement modifying the table in question is
> issued.

Do they actually make a rigorous guarantee that the cached result is
still accurate when/if it is returned to the client?  (That's an honest
question --- I don't know how MySQL implements this.)

IIRC, in our past threads on this topic, it was suggested that if you
can tolerate not-necessarily-up-to-date results, you should be doing
this sort of caching on the client side and not in the DB server at all.
I wouldn't try that in a true "client" scenario, but when the DB client
is application-server middleware, it would make some sense to cache in
the application server.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Interest in perf testing?

2004-09-28 Thread Josh Berkus
Folks,

I'm beginning a series of tests on OSDL's Scalable Test Platform in order to 
determine some recommended settings for many of the new PostgreSQL.conf 
parameters as well as pg_autovacuum.

Is anyone else interested in helping me with this? 

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Gaetano Mendola
Robert Creager wrote:
Help?
Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's 
been running for more than 10 hours.
Can it be helped?

When I see this usually means that tables are full of
dead rows. Did you vacuum you DB. Which version are you
using ?
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Tom Lane
Robert Creager <[EMAIL PROTECTED]> writes:
> Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's 
> been running for more than 10 hours.

> ...
> ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
>   ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> Hash Cond: ("outer".pair_id = "inner".pair_id)
> ...

It chose a nested loop here because it was only expecting 67 rows out of
the next-lower join, and so it thought it would only need 67 repetitions
of the index probe into obs_v_file_id_index.  I'm suspicious that that
estimate was way low and so the nestloop is taking forever.  You might
try "SET enable_nestloop = off" as a crude way of avoiding that trap.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Caching of Queries

2004-09-28 Thread Aaron Werman

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Aaron Werman" <[EMAIL PROTECTED]>
Cc: "Iain" <[EMAIL PROTECTED]>; "Jim C. Nasby" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Tuesday, September 28, 2004 9:58 AM
Subject: Re: [PERFORM] Caching of Queries


> "Aaron Werman" <[EMAIL PROTECTED]> writes:
> > I imagine a design where a shared plan cache would consist of the plans,
> > indexed by a statement hash and again by dependant objects.  A statement
to
> > be planned would be hashed and matched to the cache. DDL would need to
> > synchronously destroy all dependant plans. If each plan maintains a
validity
>   ^
> > flag, changing the cache wouldn't have to block so I don't see where
there
>^^
> > would be contention.
>
> You have contention to access a shared data structure *at all* -- for
> instance readers must lock out writers.  Or didn't you notice the self-
> contradictions in what you just said?
>
> Our current scalability problems dictate reducing such contention, not
> adding whole new sources of it.

You're right - that seems unclear. What I meant is that there can be a
global hash table that is never locked, and the hashes point to chains of
plans that are only locally locked for maintenance, such as gc and chaining
hash collisions. If maintenance was relatively rare and only local, my
assumption is that it wouldn't have global impact.

The nice thing about plan caching is that it can be sloppy, unlike block
cache, because it is only an optimization tweak. So, for example, if the
plan has atomic refererence times or counts there is no need to block, since
overwriting is not so bad. If the multiprocessing planner chains the same
plan twice, the second one would ultimately age out

/Aaron

>
> regards, tom lane
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Kevin Barnard
What does observations_trigger do?



On Tue, 28 Sep 2004 08:19:57 -0600, Robert Creager
<[EMAIL PROTECTED]> wrote:
>
> Help?
>
> Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's 
> been running for more than 10 hours.
>
> Can it be helped?
>
> UPDATE obs_v
> SET mag = obs_v.imag + zp.zero_v + cg.color_v * (obs_v.imag - i.imag),
> use = true
> FROM color_groups AS cg, zero_pair AS zp, obs_i AS i, files AS f, groups AS g
> WHERE  obs_v.star_id = i.star_id
>AND obs_v.file_id = f.file_id
>AND cg.group_id = g.group_id
>AND g.night_id = f.night_id
>AND g.group_id = $group_id
>AND zp.pair_id = f.pair_id
>
> Hash Join  (cost=130079.22..639663.94 rows=1590204 width=63)
>   Hash Cond: ("outer".star_id = "inner".star_id)
>   ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
>   ->  Hash  (cost=129094.19..129094.19 rows=77211 width=59)
> ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
>   ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> Hash Cond: ("outer".pair_id = "inner".pair_id)
> ->  Seq Scan on zero_pair zp  (cost=0.00..43.32 rows=2532 
> width=8)
> ->  Hash  (cost=250.40..250.40 rows=118 width=12)
>   ->  Hash Join  (cost=4.80..250.40 rows=118 width=12)
> Hash Cond: ("outer".night_id = "inner".night_id)
> ->  Seq Scan on files f  (cost=0.00..199.28 
> rows=9028 width=12)
> ->  Hash  (cost=4.80..4.80 rows=1 width=8)
>   ->  Nested Loop  (cost=0.00..4.80 rows=1 
> width=8)
> ->  Seq Scan on color_groups cg  
> (cost=0.00..2.84 rows=1 width=8)
>   Filter: (171 = group_id)
> ->  Seq Scan on groups g  
> (cost=0.00..1.95 rows=1 width=8)
>   Filter: (group_id = 171)
>   ->  Index Scan using obs_v_file_id_index on obs_v  (cost=0.00..1893.23 
> rows=2317 width=51)
> Index Cond: (obs_v.file_id = "outer".file_id)
>
> Table definitions:
>
> tassiv=# \d color_groups
>   Table "public.color_groups"
> Column|  Type   |   Modifiers
> --+-+---
>  group_id | integer | not null default 
> nextval('"color_groups_group_id_seq"'::text)
>  color_u  | real|
>  color_b  | real|
>  color_v  | real|
>  color_r  | real|
>  color_i  | real|
>  max_residual | real|
> Indexes:
> "color_groups_pkey" primary key, btree (group_id)
> "color_group_group_id_index" btree (group_id)
>
> tassiv=# \d zero_pair
>Table "public.zero_pair"
>  Column  |  Type   | Modifiers
> -+-+---
>  pair_id | integer | not null
>  zero_u  | real| default 0
>  zero_b  | real| default 0
>  zero_v  | real| default 0
>  zero_r  | real| default 0
>  zero_i  | real| default 0
> Indexes:
> "zero_pair_pkey" primary key, btree (pair_id)
> Foreign-key constraints:
> "$1" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE
>
> tassiv=# \d obs_v
> Table "public.obs_v"
>  Column  |  Type   |   Modifiers
> -+-+
>  x   | real| not null
>  y   | real| not null
>  imag| real| not null
>  smag| real| not null
>  loc | spoint  | not null
>  obs_id  | integer | not null default nextval('"obs_id_seq"'::text)
>  file_id | integer | not null
>  use | boolean | default false
>  solve   | boolean | default false
>  star_id | integer |
>  mag | real|
> Indexes:
> "obs_v_file_id_index" btree (file_id)
> "obs_v_loc_index" gist (loc)
> "obs_v_obs_id_index" btree (obs_id)
> "obs_v_star_id_index" btree (star_id)
> "obs_v_use_index" btree (use)
> Foreign-key constraints:
> "obs_v_files_constraint" FOREIGN KEY (file_id) REFERENCES files(file_id) ON 
> DELETE CASCADE
> "obs_v_star_id_constraint" FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON 
> DELETE SET NULL
> Triggers:
> obs_v_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_v FOR EACH ROW EXECUTE 
> PROCEDURE observations_trigger
> ()
>
> tassiv=# \d files
>   Table "public.files"
>   Column  |Type |   Modifiers
> --+-+---
>  file_id  | integer | not null default 
> nextval('"files_file_id_seq"'::text)
>  night_id | integer |
>  pair_id  | integer

[PERFORM] This query is still running after 10 hours...

2004-09-28 Thread Robert Creager

Help?

Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's 
been running for more than 10 hours.

Can it be helped?

UPDATE obs_v
SET mag = obs_v.imag + zp.zero_v + cg.color_v * (obs_v.imag - i.imag),
use = true
FROM color_groups AS cg, zero_pair AS zp, obs_i AS i, files AS f, groups AS g
WHERE  obs_v.star_id = i.star_id
   AND obs_v.file_id = f.file_id
   AND cg.group_id = g.group_id
   AND g.night_id = f.night_id
   AND g.group_id = $group_id
   AND zp.pair_id = f.pair_id

Hash Join  (cost=130079.22..639663.94 rows=1590204 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=129094.19..129094.19 rows=77211 width=59)
->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
  ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
Hash Cond: ("outer".pair_id = "inner".pair_id)
->  Seq Scan on zero_pair zp  (cost=0.00..43.32 rows=2532 width=8)
->  Hash  (cost=250.40..250.40 rows=118 width=12)
  ->  Hash Join  (cost=4.80..250.40 rows=118 width=12)
Hash Cond: ("outer".night_id = "inner".night_id)
->  Seq Scan on files f  (cost=0.00..199.28 rows=9028 
width=12)
->  Hash  (cost=4.80..4.80 rows=1 width=8)
  ->  Nested Loop  (cost=0.00..4.80 rows=1 width=8)
->  Seq Scan on color_groups cg  
(cost=0.00..2.84 rows=1 width=8)
  Filter: (171 = group_id)
->  Seq Scan on groups g  (cost=0.00..1.95 
rows=1 width=8)
  Filter: (group_id = 171)
  ->  Index Scan using obs_v_file_id_index on obs_v  (cost=0.00..1893.23 
rows=2317 width=51)
Index Cond: (obs_v.file_id = "outer".file_id)

Table definitions:

tassiv=# \d color_groups
  Table "public.color_groups"
Column|  Type   |   Modifiers  
 
--+-+---
 group_id | integer | not null default nextval('"color_groups_group_id_seq"'::text)
 color_u  | real| 
 color_b  | real| 
 color_v  | real| 
 color_r  | real| 
 color_i  | real| 
 max_residual | real| 
Indexes:
"color_groups_pkey" primary key, btree (group_id)
"color_group_group_id_index" btree (group_id)

tassiv=# \d zero_pair
   Table "public.zero_pair"
 Column  |  Type   | Modifiers 
-+-+---
 pair_id | integer | not null
 zero_u  | real| default 0
 zero_b  | real| default 0
 zero_v  | real| default 0
 zero_r  | real| default 0
 zero_i  | real| default 0
Indexes:
"zero_pair_pkey" primary key, btree (pair_id)
Foreign-key constraints:
"$1" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE

tassiv=# \d obs_v
Table "public.obs_v"
 Column  |  Type   |   Modifiers
-+-+
 x   | real| not null
 y   | real| not null
 imag| real| not null
 smag| real| not null
 loc | spoint  | not null
 obs_id  | integer | not null default nextval('"obs_id_seq"'::text)
 file_id | integer | not null
 use | boolean | default false
 solve   | boolean | default false
 star_id | integer | 
 mag | real| 
Indexes:
"obs_v_file_id_index" btree (file_id)
"obs_v_loc_index" gist (loc)
"obs_v_obs_id_index" btree (obs_id)
"obs_v_star_id_index" btree (star_id)
"obs_v_use_index" btree (use)
Foreign-key constraints:
"obs_v_files_constraint" FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE 
CASCADE
"obs_v_star_id_constraint" FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON 
DELETE SET NULL
Triggers:
obs_v_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_v FOR EACH ROW EXECUTE 
PROCEDURE observations_trigger
()

tassiv=# \d files
  Table "public.files"
  Column  |Type |   Modifiers  
 
--+-+---
 file_id  | integer | not null default 
nextval('"files_file_id_seq"'::text)
 night_id | integer | 
 pair_id  | integer | 
 name | character varying   | not null
 date | timestamp without time zone | 
Indexes:
"files_pkey" primary key, btree (file_id)
"files_name_key" unique, btree (name)
"files_id_index" btree (file_id, night_id, pair_id)
Fore

Re: [PERFORM] Caching of Queries

2004-09-28 Thread Tom Lane
"Aaron Werman" <[EMAIL PROTECTED]> writes:
> I imagine a design where a shared plan cache would consist of the plans,
> indexed by a statement hash and again by dependant objects.  A statement to
> be planned would be hashed and matched to the cache. DDL would need to
> synchronously destroy all dependant plans. If each plan maintains a validity
  ^
> flag, changing the cache wouldn't have to block so I don't see where there
   ^^
> would be contention.

You have contention to access a shared data structure *at all* -- for
instance readers must lock out writers.  Or didn't you notice the self-
contradictions in what you just said?

Our current scalability problems dictate reducing such contention, not
adding whole new sources of it.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Caching of Queries

2004-09-28 Thread Aaron Werman

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Iain" <[EMAIL PROTECTED]>
Cc: "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, September 27, 2004 11:17 PM
Subject: Re: [PERFORM] Caching of Queries


> "Iain" <[EMAIL PROTECTED]> writes:
> > I can only tell you (roughly) how it works wth Oracle,
>
> Which unfortunately has little to do with how it works with Postgres.
> This "latches" stuff is irrelevant to us.

Latches are the Oracle term for semaphores. Both Oracle and pg use
semaphores and spin locks to serialize activity in critical sections. I
believe that the point that blocking/queuing reduces scalability is valid.

>
> In practice, any repetitive planning in PG is going to be consulting
> catalog rows that it draws from the backend's local catalog caches.
> After the first read of a given catalog row, the backend won't need
> to re-read it unless the associated table has a schema update.  (There
> are some other cases, like a VACUUM FULL of the catalog the rows came
> from, but in practice catalog cache entries don't change often in most
> scenarios.)  We need place only one lock per table referenced in order
> to interlock against schema updates; not one per catalog row used.
>
> The upshot of all this is that any sort of shared plan cache is going to
> create substantially more contention than exists now --- and that's not
> even counting the costs of managing the cache, ie deciding when to throw
> away entries.

I imagine a design where a shared plan cache would consist of the plans,
indexed by a statement hash and again by dependant objects.  A statement to
be planned would be hashed and matched to the cache. DDL would need to
synchronously destroy all dependant plans. If each plan maintains a validity
flag, changing the cache wouldn't have to block so I don't see where there
would be contention.

>
> A backend-local plan cache would avoid the contention issues, but would
> of course not allow amortizing planning costs across multiple backends.
>
> I'm personally dubious that sharing planning costs is a big deal.
> Simple queries generally don't take that long to plan.  Complicated
> queries do, but I think the reusability odds go down with increasing
> query complexity.
>

I think both the parse and planning are major tasks if the transaction rate
is high. Simple queries can easily take much longer to plan than execute, so
this is a scalability concern. Caching complicated queries is valuable -
apps seem to have lots of similar queries because they are intimately
related to the data model.

> regards, tom lane
>
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings