Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-11 Thread Sean Chittenden
> >NB> shared_buffers = 1/16th of total memory
> >NB> effective_cache_size = 80% of the supposed kernel cache.
> 
> I think Sean(?) mentioned this one for FreeBSD (Bash code):

sh, not bash.  :)

> echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
> 
> I've used it for my dedicated servers.  Is this calculation correct?

Yes, or it's real close at least.  vfs.hibufspace is the amount of
kernel space that's used for caching IO operations (minus the
necessary space taken for the kernel).  If you're real paranoid, you
could do some kernel profiling and figure out how much of the cache is
actually disk IO and multiply the above by some percentage, say 80%?
I haven't found it necessary to do so yet.  Since hibufspace is all IO
and caching any net activity is kinda pointless and I assume that 100%
of it is used for a disk cache and don't use a multiplier.  The 8192,
however, is the size of a PG page, so, if you tweak PG's page size,
you have to change this constant (*grumbles*).

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] sql performance and cache

2003-10-11 Thread Christopher Kings-Lynne

PostgreSQL does not have, and has never had a query cache - so nothing 
you do is going to make that second query faster.
Let me clarify that.  PostgreSQL will of course cache the disk pages 
used in getting the data for your query, which is why the second time 
you run it, it is 3 seconds faster.

However, it does not cache the _results_ of the query.  Each time you 
run it, it will be fully re-evaluated.

The btree error you give is bad and I'm sure the more experienced list 
members will want you to dig into it for them.

Chris



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-11 Thread Ron Johnson
On Sat, 2003-10-11 at 05:22, Harald Fuchs wrote:
> In article <[EMAIL PROTECTED]>,
> Rod Taylor <[EMAIL PROTECTED]> writes:
> 
> >> Would anyone object to a patch that exports the blocksize via a
> >> readonly GUC?  Too many tunables are page dependant, which is
> >> infuriating when copying configs from DB to DB.  I wish pgsql had some
> >> notion of percentages for values that end with a '%'.
> 
> > Rather than showing the block size, how about we change the tunables to
> > be physical sizes rather than block based?
> 
> > effective_cache_size = 1.5GB
> > shared_buffers = 25MB
> 
> Amen!  Being forced to set config values in some obscure units rather
> than bytes is an ugly braindamage which should be easy to fix.

But it's too user-friendly to do it this way!

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

When Swedes start committing terrorism, I'll become suspicious of
Scandanavians.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Christopher Kings-Lynne

 Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
time=84.00..12323.00 rows=67 loops=1)
The planner estimate doesn't seem to match reality in that particular 
step.  Are you sure you've run:

ANALYZE oscar_node;
ANALYZE oscar_point;
And you could even run VACUUM FULL on them just to make sure.

Does that make any difference?

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] sql performance and cache

2003-10-11 Thread Chris Faulkner
Hello all

I have two very similar queries which I need to execute. They both have
exactly the same from / where conditions. When I execute the first, it takes
about 16 seconds. The second is executed almost immediately after, it takes
13 seconds. In short, I'd like to know why the query result isn't being
cached and any ideas on how to improve the execution.

The first query attempts to find the maximum size of an array in the result
set- the field is called "level". IT contains anything between 1 and 10
integers. I just need to know what the largest size is. I do this to find
out the maximum size of the "level" array.

"max(replace(split_part(array_dims(level),':',2),']','')::int)"

I know this is big and ugly but is there any better way of doing it ?

The second query just returns the result set - it has exactly the same
FROM/Where clause.

OK - so I could execute the query once, and get the maximum size of the
array and the result set in one. I know what I am doing is less than optimal
but I had expected the query results to be cached. So the second execution
would be very quick. So why aren't they ? I have increased my cache size -
shared_buffers is 2000 and I have doubled the default max_fsm... settings
(although I am not sure what they do). sort_mem is 8192.

The from / where is

FROM oscar_node N, oscar_point P
where N."GEOM_ID_OF_POINT" = P."POINT_ID"
and N."TILE_REF" = P."TILE_REF"
and N."TILE_REF" in ('TQ27NE','TQ28SE','TQ37NW','TQ38SW')
and P."TILE_REF" in ('TQ27NE','TQ28SE','TQ37NW','TQ38SW')
and P."FEAT_CODE" = 3500
and P.wkb_geometry && GeometryFromText('BOX3D(529540.0 179658.88,530540.0
180307.12)'::box3d,-1)

oscar_node and oscar_point both have about 3m rows. PK on oscar_node is
composite of "TILE_REF" and "NODE_ID". PK on oscar_point is "TILE_REF" and
"POINT_ID". The tables are indexed on feat_code and I have an index on
wkb_geometry. (This is a GIST index). I have increased the statistics size
and done the analyze command.

Here is my explain plan

 Nested Loop  (cost=0.00..147.11 rows=1 width=148)
   Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
   ->  Index Scan using gidx_oscar_point on oscar_point p  (cost=0.00..61.34
rows=1 width=57)
 Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
0,530540 180307.12 0)'::geometry)
 Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar)) AND ("FEAT_CODE" = 3500))
   ->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
rows=2 width=91)
 Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
 Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar))


I am seeing this message in my logs.

"bt_fixroot: not valid old root page"

Maybe this is relevant to my performance problems.

I know this has been a long message but I would really appreciate any
performance tips.

Thanks


Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] sql performance and cache

2003-10-11 Thread Christopher Kings-Lynne

I have two very similar queries which I need to execute. They both have
exactly the same from / where conditions. When I execute the first, it takes
about 16 seconds. The second is executed almost immediately after, it takes
13 seconds. In short, I'd like to know why the query result isn't being
cached and any ideas on how to improve the execution.


OK - so I could execute the query once, and get the maximum size of the
array and the result set in one. I know what I am doing is less than optimal
but I had expected the query results to be cached. So the second execution
would be very quick. So why aren't they ? I have increased my cache size -
shared_buffers is 2000 and I have doubled the default max_fsm... settings
(although I am not sure what they do). sort_mem is 8192.
PostgreSQL does not have, and has never had a query cache - so nothing 
you do is going to make that second query faster.

Perhaps you are confusing it with the MySQL query cache?

Chris

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 10:43, Chris Faulkner wrote:
> Hello all
>
> I have two very similar queries which I need to execute. They both have
> exactly the same from / where conditions. When I execute the first, it
> takes about 16 seconds. The second is executed almost immediately after, it
> takes 13 seconds. In short, I'd like to know why the query result isn't
> being cached and any ideas on how to improve the execution.

The short answer is that PG doesn't cache query results. The only way it could 
do so safely is to lock all tables you access to make sure that no other 
process changes them. That would effectively turn PG into a single-user DB in 
short notice.

> The first query attempts to find the maximum size of an array in the result
> set- the field is called "level". IT contains anything between 1 and 10
> integers. I just need to know what the largest size is. I do this to find
> out the maximum size of the "level" array.
>
> "max(replace(split_part(array_dims(level),':',2),']','')::int)"
>
> I know this is big and ugly but is there any better way of doing it ?
>
> The second query just returns the result set - it has exactly the same
> FROM/Where clause.

I assume these two queries are linked? If you rely on the max size being 
unchanged and have more than one process using the database, you should make 
sure you lock the rows in question.

> OK - so I could execute the query once, and get the maximum size of the
> array and the result set in one. I know what I am doing is less than
> optimal but I had expected the query results to be cached. So the second
> execution would be very quick. So why aren't they ? I have increased my
> cache size - shared_buffers is 2000 and I have doubled the default
> max_fsm... settings (although I am not sure what they do). sort_mem is
> 8192.

PG will cache the underlying data, but not the results. The values you are 
changing are used to hold table/index rows etc. This means the second query 
shouldn't need to access the disk if the rows it requires are cached.

There is a discussion of the postgresql.conf file and how to tune it at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN 
ANALYSE of either/both queries to the performance list. I'd drop the sql list 
when we're just talking about performance.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Chris Faulkner
Hello

Thanks for the reply.

> The short answer is that PG doesn't cache query results. The only
> way it could
> do so safely is to lock all tables you access to make sure that no other
> process changes them. That would effectively turn PG into a
> single-user DB in
> short notice.

I am not sure I agree with you. I have done similar things with Oracle and
found that the second query will execute much more quickly than the first.
It could be made to work in at least two scenarios

- as a user/application perspective - you accept that the result might not
be up-to-date and take what comes back. This would be acceptable in my case
because I know that the tables will not change.
OR
- the database could cache the result set. If some of the data is changed by
another query or session, then the database flushes the result set out of
the cache.

> I assume these two queries are linked? If you rely on the max size being
> unchanged and have more than one process using the database, you
> should make
> sure you lock the rows in question.

I can rely on the max size remaining the same. As I mentioned above, the
tables are entirely read only. The data will not be updated or deleted by
anyone - I don't need to worry about that. The data will be updated en masse
once every 3 months.

> There is a discussion of the postgresql.conf file and how to tune it at:
>   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Thanks for that.

> Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
> ANALYSE of either/both queries to the performance list. I'd drop
> the sql list
> when we're just talking about performance.

To be honest, my main concern was about the cache. If the second one could
use a cache amd execute in 2 seconds, that would be better that reducing the
execution of each individual query by 30% or so.

Thanks for the offer of help on this one. explain analyze gives me the same
as the last message - did you want verbose ?

 Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
time=84.00..12323.00 rows=67 loops=1)
   Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
   ->  Index Scan using gidx_oscar_point on oscar_point p  (cost=0.00..61.34
rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
 Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
0,530540 1
80307.12 0)'::geometry)
 Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bp
char) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" = 'TQ38SW'::bpchar))
AND
 ("FEAT_CODE" = 3500))
   ->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
 Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
 Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
'TQ38SW'::bpchar))
 Total runtime: 12325.00 msec
(9 rows)

Thanks


Chris



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Christopher Kings-Lynne

 Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
time=84.00..12323.00 rows=67 loops=1)
The planner estimate doesn't seem to match reality in that particular 
step.  Are you sure you've run:

ANALYZE oscar_node;
ANALYZE oscar_point;
And you could even run VACUUM FULL on them just to make sure.

Does that make any difference?

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton
On Saturday 11 October 2003 12:12, Chris Faulkner wrote:
> Hello
>
> Thanks for the reply.
>
> > The short answer is that PG doesn't cache query results. The only
> > way it could
> > do so safely is to lock all tables you access to make sure that no other
> > process changes them. That would effectively turn PG into a
> > single-user DB in
> > short notice.
>
> I am not sure I agree with you. I have done similar things with Oracle and
> found that the second query will execute much more quickly than the first.
> It could be made to work in at least two scenarios

I'm guessing because the underlying rows and perhaps the plan are cached, 
rather than the results. If you cached the results of the first query you'd 
only have the max length, not your other data anyway.

[snip]

> > I assume these two queries are linked? If you rely on the max size being
> > unchanged and have more than one process using the database, you
> > should make
> > sure you lock the rows in question.
>
> I can rely on the max size remaining the same. As I mentioned above, the
> tables are entirely read only. The data will not be updated or deleted by
> anyone - I don't need to worry about that. The data will be updated en
> masse once every 3 months.

Hmm - might be worth adding a column for your array length and pre-calculating 
if your data is basically static.

> > There is a discussion of the postgresql.conf file and how to tune it at:
> >   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
>
> Thanks for that.
>
> > Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
> > ANALYSE of either/both queries to the performance list. I'd drop
> > the sql list
> > when we're just talking about performance.
>
> To be honest, my main concern was about the cache. If the second one could
> use a cache amd execute in 2 seconds, that would be better that reducing
> the execution of each individual query by 30% or so.

I'm puzzled as to why they aren't both below 2 seconds to start with - you're 
not dealing with that many rows.

> Thanks for the offer of help on this one. explain analyze gives me the same
> as the last message - did you want verbose ?

Nope, this is what I need. Verbose prints pages of stuff that only the 
developers would be interested in. This one actually runs the query and gives 
you a second set of figures showing times.

>  Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
> time=84.00..12323.00 rows=67 loops=1)
>Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
>->  Index Scan using gidx_oscar_point on oscar_point p 
> (cost=0.00..61.34 rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
>  Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
> 0,530540 1
> 80307.12 0)'::geometry)
>  Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bp
> char) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar)) AND
>  ("FEAT_CODE" = 3500))

This next bit is the issue. It's joining on TILE_REF and then filtering by 
your three static values. That's taking 67 * 150ms = 10.05secs

>->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
> rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
>  Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
>  Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar))

Now if you look at the first set of figures, it's estimating 2 rows rather 
than the 4797 you're actually getting. That's probably why it's chosen to 
join then filter rather than the other way around.

I'd suggest the following:
1. VACUUM FULL on the table in question if you haven't done so since the last 
update/reload. If you aren't doing this after every bulk upload, you probably 
should be.
2. VACUUM ANALYSE/ANALYSE the table.
3. Check the tuning document I mentioned and make sure your settings are at 
least reasonable. They don't have to be perfect - that last 10% takes 
forever, but if they are badly wrong it can cripple you.
4. PG should now have up-to-date stats and a reasonable set of config 
settings. If it's still getting its row estimates wrong, we'll have to look 
at the statistics its got.

If we reach the statistics tinkering stage, it might be better to wait til 
Monday if you can - more people on the list then.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-11 Thread Nick Barr
Josh Berkus wrote:

shared_buffers = 1/16th of total memory
effective_cache_size = 80% of the supposed kernel cache.
   

But only if it's a dedicated DB machine.   If it's not, all memory values 
should be cut in half.
 

What I would prefer would be an interactive script which would, by asking the 
user simple questions and system scanning, collect all the information 
necessary to set:

max_connections
shared_buffers
sort_mem
vacuum_mem
effective_cache_size
random_page_cost
max_fsm_pages
checkpoint_segments & checkpoint_timeout
tcp_ip
and on the OS, it should set:
shmmax & shmmall
and should offer to create a chron job which does appropriate frequency VACUUM 
ANALYZE.
 

I reckon do a system scan first, and parse the current PostgreSQL conf 
file to figure out what the settings are. Also back it up with a date 
and time appended to the end to make sure there is a backup before 
overwriting the real conf file. Then a bunch of questions. What sort of 
questions would need to be asked and which parameters would these 
questions affect? So far, and from my limited understanding of the .conf 
file, I reckon there should be the following

Here is your config of your hardware as detected. Is this correct ?

   This could potentially be several questions, i.e. one for proc, mem, 
os, hdd etc
   Would affect shared_buffers, sort_mem, effective_cache_size, 
random_page_cost

How was PostgreSQL compiled?

   This would be parameters such as the block size and a few other 
compile time parameters. If we can get to some of these read-only 
parameters than that would make this step easier, certainly for the new 
recruits amongst us.

Is PostgreSQL the only thing being run on this computer?

   Then my previous assumptions about shared_buffers and 
effective_cache_size would be true.

If shmmax and shmmall are too small, then:

PostgreSQL requires some more shared memory to cache some tables, x Mb, 
do you want to increase your OS kernel parameters?

   Tweak shmmax and shmmall

How are the clients going to connect?

   i.e. TCP or Unix sockets

How many clients can connect to this database at once?

   Affects max_connections

How many databases and how many tables in each database are going to be 
present?

   Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout

Do you want to vacuum you database regularly?

   Initial question for cron job

It is recomended that you vacuum analyze every night, do you want to do 
this?
It is also recomended that you vacuum full every month, do you want to 
do this?



Thoughts?

Nick

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Another weird one with an UPDATE

2003-10-11 Thread David Griffiths




I am running an update-query to benchmark various databases; the 
postgres version is,
 
UPDATE user_account SET last_name = 'abc'WHERE user_account_id IN 
(SELECT user_account_id FROM commercial_entity, commercial_service WHERE 
yw_account_id IS NULLAND commercial_entity.commercial_entity_id = 
commercial_service.commercial_entity_id);
 
 
The inner query (the select), run by itself, takes 
about a second. Add the outer query (the update-portion), and the query dies. 
The machine has been vacuum-analzyed. Here is the 
explain-analyze:
 
benchtest=# EXPLAIN ANALYZE UPDATE user_account SET 
last_name = 'abc'benchtest-# WHERE user_account_id IN (SELECT 
user_account_id FROM commercial_entity, commercial_service WHERE yw_account_id 
IS NULLbenchtest(# AND commercial_entity.commercial_entity_id = 
commercial_service.commercial_entity_id);
 Seq Scan on user_account  
(cost=0.00..813608944.88 rows=36242 width=718) (actual 
time=15696258.98..16311130.29 rows=3075 loops=1)   Filter: 
(subplan)   SubPlan ->  
Materialize  (cost=11224.77..11224.77 rows=86952 width=36) (actual 
time=0.06..106.40 rows=84831 
loops=72483)   
->  Merge Join  (cost=0.00..11224.77 rows=86952 width=36) (actual 
time=0.21..1845.13 rows=85158 
loops=1) 
Merge Cond: ("outer".commercial_entity_id = 
"inner".commercial_entity_id) 
->  Index Scan using commercial_entity_pkey on commercial_entity  
(cost=0.00..6787.27 rows=77862 width=24) (actual time=0.06..469.56 rows=78132 
loops=1)   
Filter: (yw_account_id IS 
NULL) 
->  Index Scan using comm_serv_comm_ent_id_i on commercial_service  
(cost=0.00..2952.42 rows=88038 width=12) (actual time=0.03..444.80 
rows=88038 loops=1) Total runtime: 16332976.21 msec(10 
rows)
 
Here are the relevant parts of the 
schema:
 
 
 
USER_ACCOUNT
 
    
Column 
|    
Type 
|  
Modifiers---+-+- user_account_id   
| 
numeric(10,0)   
| not 
null first_name    
| character varying(100)  
| last_name 
| character varying(100)  |Indexes: 
user_account_pkey primary key btree 
(user_account_id), 
usr_acc_last_name_i btree (last_name),Foreign Key constraints: $1 FOREIGN 
KEY (lang_id) REFERENCES lang(lang_id) ON UPDATE NO ACTION ON DELETE NO 
ACTION, 
$2 FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON 
DELETE NO 
ACTION, 
$3 FOREIGN KEY (user_role_id) REFERENCES user_role(user_role_id) ON UPDATE NO 
ACTION ON DELETE NO ACTION 
 
 
COMMERCIAL_ENTITY
 
  
Column   
|    
Type 
|  
Modifiers---+-+- commercial_entity_id  
| 
numeric(10,0)   
| not 
null yw_account_id 
| 
numeric(10,0)   
|Indexes: commercial_entity_pkey primary key btree 
(commercial_entity_id), 
comm_ent_yw_acc_id_i btree (yw_account_id)Foreign Key constraints: $1 
FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON 
DELETE NO 
ACTION, 
$2 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON 
UPDATE NO ACTION ON DELETE NO ACTION
 
 
 
COMMERCIAL_SERVICE 
 
    
Column    | 
Type  | 
Modifiers--+---+--- commercial_entity_id 
| numeric(10,0) | not 
null service_type_id  | numeric(10,0) | 
not 
null source_id    
| numeric(10,0) | not nullIndexes: commercial_service_pkey primary key btree 
(commercial_entity_id, 
service_type_id), 
comm_serv_comm_ent_id_i btree 
(commercial_entity_id), 
comm_serv_serv_type_id_i btree 
(service_type_id), 
comm_serv_source_id_i btree (source_id)Foreign Key constraints: $1 FOREIGN 
KEY (commercial_entity_id) REFERENCES commercial_entity(commercial_entity_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$2 FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON 
DELETE NO 
ACTION, 
$3 FOREIGN KEY (service_type_id) REFERENCES service_type(service_type_id) ON 
UPDATE NO ACTION ON DELETE NO ACTION
 
 
Here is the postgres.conf (or the variables that 
are not commented out):
 
tcpip_socket = truemax_connections = 
500
shared_buffers = 
32768  # min 
max_connections*2 or 16, 8KB eachwal_buffers = 
128   
# min 4, typically 8KB each
sort_mem = 
4096 
# min 64, size in KBeffective_cache_size = 
5    # typically 8KB each
 
Is it a problem with "IN"?
 
David


Re: [PERFORM] Another weird one with an UPDATE

2003-10-11 Thread David Griffiths



Sorry - just found the FAQ (http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22) 
on how IN is very slow.
 
So I rewrote the query:
 
\o ./data/temp.txt
SELECT current_timestamp;
UPDATE user_account SET last_name = 'abc'WHERE 
EXISTS (SELECT ua.user_account_id FROM user_account ua, commercial_entity ce, 
commercial_service csWHERE ua.user_account_id = ce.user_account_id AND 
ce.commercial_entity_id = cs.commercial_entity_id);
SELECT current_timestamp;
\o
 
EXISTS is kind of a weird statement, and it doesn't 
appear to be identical (the number of rows updated was 72,000 rather than 3500). 
It also took 4 minutes to execute.
 
Is there any way around this other than breaking 
the query into two? As in:
 
pstmt1 = conn.preprareStatement("SELECT 
ua.user_account_id FROM user_account ua, commercial_entity ce, 
commercial_service csWHERE ua.user_account_id = ce.user_account_id AND 
ce.commercial_entity_id = cs.commercial_entity_id");
rset = pstmt1.executeQuery();
while (rset.next())
{
    pstmt2 = 
conn.prepareStatement("UPDATE user_account SET last_name = 'abc' WHERE 
user_account_id = ?");
    pstmt2.setLong(1, 
rset.getLong(1));
...
}
 
Unfort, that will be alot of data moved from 
Postgres->middle-tier (Weblogic/Resin), which is inefficient.
 
Anyone see another solution?
 
David.

  - Original Message - 
  From: 
  David 
  Griffiths 
  To: [EMAIL PROTECTED] 
  
  Sent: Saturday, October 11, 2003 12:44 
  PM
  Subject: [PERFORM] Another weird one with 
  an UPDATE
  
  
  I am running an update-query to benchmark various databases; the 
  postgres version is,
   
  UPDATE user_account SET last_name = 'abc'WHERE 
  user_account_id IN (SELECT user_account_id FROM commercial_entity, 
  commercial_service WHERE yw_account_id IS NULLAND 
  commercial_entity.commercial_entity_id = 
  commercial_service.commercial_entity_id);
   
   
  The inner query (the select), run by itself, 
  takes about a second. Add the outer query (the update-portion), and the query 
  dies. The machine has been vacuum-analzyed. Here is the 
  explain-analyze:
   
  benchtest=# EXPLAIN ANALYZE UPDATE user_account 
  SET last_name = 'abc'benchtest-# WHERE user_account_id IN (SELECT 
  user_account_id FROM commercial_entity, commercial_service WHERE yw_account_id 
  IS NULLbenchtest(# AND commercial_entity.commercial_entity_id = 
  commercial_service.commercial_entity_id);
   Seq Scan on user_account  
  (cost=0.00..813608944.88 rows=36242 width=718) (actual 
  time=15696258.98..16311130.29 rows=3075 loops=1)   Filter: 
  (subplan)   SubPlan ->  
  Materialize  (cost=11224.77..11224.77 rows=86952 width=36) (actual 
  time=0.06..106.40 rows=84831 
  loops=72483)   
  ->  Merge Join  (cost=0.00..11224.77 rows=86952 width=36) (actual 
  time=0.21..1845.13 rows=85158 
  loops=1) 
  Merge Cond: ("outer".commercial_entity_id = 
  "inner".commercial_entity_id) 
  ->  Index Scan using commercial_entity_pkey on commercial_entity  
  (cost=0.00..6787.27 rows=77862 width=24) (actual time=0.06..469.56 rows=78132 
  loops=1)   
  Filter: (yw_account_id IS 
  NULL) 
  ->  Index Scan using comm_serv_comm_ent_id_i on 
  commercial_service  (cost=0.00..2952.42 rows=88038 width=12) (actual 
  time=0.03..444.80 rows=88038 loops=1) Total runtime: 16332976.21 
  msec(10 rows)
   
  Here are the relevant parts of the 
  schema:
   
   
   
  USER_ACCOUNT
   
      
  Column 
  |    
  Type 
  |  
  Modifiers---+-+- user_account_id   
  | 
  numeric(10,0)   
  | not 
  null first_name    
  | character varying(100)  
  | last_name 
  | character varying(100)  |Indexes: 
  user_account_pkey primary key btree 
  (user_account_id), 
  usr_acc_last_name_i btree (last_name),Foreign Key constraints: $1 FOREIGN 
  KEY (lang_id) REFERENCES lang(lang_id) ON UPDATE NO ACTION ON DELETE NO 
  ACTION, 
  $2 FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON 
  DELETE NO 
  ACTION, 
  $3 FOREIGN KEY (user_role_id) REFERENCES user_role(user_role_id) ON UPDATE NO 
  ACTION ON DELETE NO ACTION 
   
   
  COMMERCIAL_ENTITY
   
    
  Column   
  |    
  Type 
  |  
  Modifiers---+-+- commercial_entity_id  
  | 
  numeric(10,0)   
  | not 
  null yw_account_id 
  | 
  numeric(10,0)   
  |Indexes: commercial_entity_pkey primary key btree 
  (commercial_entity_id), 
  comm_ent_yw_acc_id_i btree (yw_account_id)Foreign Key constraints: $1 
  FOREIGN KEY (source_id) REFERENCES source(source_id

Re: [PERFORM] Another weird one with an UPDATE

2003-10-11 Thread Stephan Szabo
On Sat, 11 Oct 2003, David Griffiths wrote:

> Sorry - just found the FAQ (
> http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22
>  ) on how
> IN is very slow.
>
> So I rewrote the query:
>
> \o ./data/temp.txt
> SELECT current_timestamp;
> UPDATE user_account SET last_name = 'abc'
> WHERE EXISTS (SELECT ua.user_account_id FROM user_account ua,
> commercial_entity ce, commercial_service cs
> WHERE ua.user_account_id = ce.user_account_id AND
> ce.commercial_entity_id = cs.commercial_entity_id);
> SELECT current_timestamp;

I don't think that's the query you want.  You're not binding the subselect
to the outer values of user_account.

I think you want something like:
UPDATE user_account SET last_name = 'abc'
 WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs
 WHERE user_account.user_account_id = ce.user_account_id AND
 ce.commercial_entity_id = cs.commercial_entity_id);

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-11 Thread Christopher Kings-Lynne

If shmmax and shmmall are too small, then:

PostgreSQL requires some more shared memory to cache some tables, x Mb, 
do you want to increase your OS kernel parameters?

   Tweak shmmax and shmmall
Note that this still requires a kernel recompile on FreeBSD :(

Chris

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Another weird one with an UPDATE

2003-10-11 Thread David Griffiths
Thanks - that worked.

David
- Original Message -
From: "Stephan Szabo" <[EMAIL PROTECTED]>
To: "David Griffiths" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, October 11, 2003 3:34 PM
Subject: Re: [PERFORM] Another weird one with an UPDATE


> On Sat, 11 Oct 2003, David Griffiths wrote:
>
> > Sorry - just found the FAQ (
> > http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22
> >  ) on how
> > IN is very slow.
> >
> > So I rewrote the query:
> >
> > \o ./data/temp.txt
> > SELECT current_timestamp;
> > UPDATE user_account SET last_name = 'abc'
> > WHERE EXISTS (SELECT ua.user_account_id FROM user_account ua,
> > commercial_entity ce, commercial_service cs
> > WHERE ua.user_account_id = ce.user_account_id AND
> > ce.commercial_entity_id = cs.commercial_entity_id);
> > SELECT current_timestamp;
>
> I don't think that's the query you want.  You're not binding the subselect
> to the outer values of user_account.
>
> I think you want something like:
> UPDATE user_account SET last_name = 'abc'
>  WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs
>  WHERE user_account.user_account_id = ce.user_account_id AND
>  ce.commercial_entity_id = cs.commercial_entity_id);

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly