[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Erik Norvelle
Folks:

I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting).  The table is called indethom, and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in.

This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called s2.sectiones which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas.  I will then eliminate this information from the original indethom table, replacing it with the foreign key I am currently generating.

** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations.  I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much.

Here's the query I am running: 
update indethom 
set query_counter = nextval('s2.query_counter_seq'),   -- Just for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where 
s2.sectiones.nomeoper = indethom.nomeoper 
and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b 
and s2.sectiones.refere2a = indethom.refere2a  and s2.sectiones.refere2b = indethom.refere2b 
and s2.sectiones.refere3a = indethom.refere3a  and s2.sectiones.refere3b = indethom.refere3b 
and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b);

Here´s the query plan:
QUERY PLAN
-
Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03 rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)

Note:  I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve.

Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table s2.sectiones
Column  | Type | Modifiers 
--+--+---
nomeoper | character(3) | 
refere1a | character(2) | 
refere1b | character(2) | 
refere2a | character(2) | 
refere2b | character(2) | 
refere3a | character(2) | 
refere3b | character(2) | 
refere4a | character(2) | 
refere4b | character(2) | 
clavis   | integer  | 
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b)

Finally, here is the structure of indethom (some non-relevant columns not shown):
it=> \d indethom
Table public.indethom
Column | Type  | Modifiers 
---+---+---
numeoper  | smallint  | not null
nomeoper  | character(3)  | not null
editcrit  | character(1)  | 
refere1a  | character(2)  | 
refere1b  | character(2)  | 
refere2a  | character(2)  | 
refere2b  | character(2)  | 
refere3a  | character(2)  | 
refere3b  | character(2)  | 
refere4a  | character(2)  | 
refere4b  | character(2)  | 
refere5a  | character(2)  | not null
refere5b  | smallint  | not null
referen6  | smallint  | not null
... several columns skipped ...
verbum| character varying(22) | not null
... other columns skipped ...
poslinop  | integer   | not null
posverli  | smallint  | not null
posverop  | integer   | not null
clavis| integer   | not null
articref  | integer   | 
sectref   | integer   | 
query_counter | integer   | 
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)

Thanks for your assistance!
-Erik Norvelle

Re: [PERFORM] Update performance ... is 200,000 updates per hour

2003-12-02 Thread Stephan Szabo

On Tue, 2 Dec 2003, Erik Norvelle wrote:

 ** My question has to do with whether or not I am getting maximal speed
 out of PostgreSQL, or whether I need to perform further optimizations.
 I am currently getting about 200,000 updates per hour, and updating the
 entire 10 million rows thus requires 50 hours, which seems a bit much.

Well, it doesn't entirely surprise me much given the presumably 10 million
iterations of the index scan that it's doing. Explain analyze output (even
over a subset of the indethom table by adding a where clause) would
probably help to get better info.

I'd suggest seeing if something like:
update indethom set query_counter=...,sectref=s.clavis
 FROM s2.sectiones s where
  s2.sectiones.nomeoper = indethom.nomeoper and ...;
tries a join that might give a better plan.


 Here's the query I am running:
 update indethom
   set query_counter = nextval('s2.query_counter_seq'),   -- Just
 for keeping track of how fast the query is running
   sectref = (select clavis from s2.sectiones where
   s2.sectiones.nomeoper = indethom.nomeoper
   and s2.sectiones.refere1a = indethom.refere1a and
 s2.sectiones.refere1b = indethom.refere1b
   and s2.sectiones.refere2a = indethom.refere2a  and
 s2.sectiones.refere2b = indethom.refere2b
   and s2.sectiones.refere3a = indethom.refere3a  and
 s2.sectiones.refere3b = indethom.refere3b
   and s2.sectiones.refere4a = indethom.refere4a and
 s2.sectiones.refere4b = indethom.refere4b);

 Here´s the query plan:
  QUERY PLAN
 
 -
   Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
 SubPlan
   -  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03
 rows=1 width=4)
 Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
 (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a =
 $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
 (4 rows)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Tom Lane
Erik Norvelle [EMAIL PROTECTED] writes:
 update indethom
   set query_counter =3D nextval('s2.query_counter_seq'),   -- Just=
 =20=20
 for keeping track of how fast the query is running
   sectref =3D (select clavis from s2.sectiones where
   s2.sectiones.nomeoper =3D indethom.nomeoper
   and s2.sectiones.refere1a =3D indethom.refere1a and=20=20
 s2.sectiones.refere1b =3D indethom.refere1b
   and s2.sectiones.refere2a =3D indethom.refere2a  and=20=20
 s2.sectiones.refere2b =3D indethom.refere2b
   and s2.sectiones.refere3a =3D indethom.refere3a  and=20=20
 s2.sectiones.refere3b =3D indethom.refere3b
   and s2.sectiones.refere4a =3D indethom.refere4a and=20=20
 s2.sectiones.refere4b =3D indethom.refere4b);

This is effectively forcing a nestloop-with-inner-indexscan join.  You
might be better off with

update indethom
set query_counter = nextval('s2.query_counter_seq'),
sectref = sectiones.clavis
from s2.sectiones
where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and  
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a  and  
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a  and  
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and  
s2.sectiones.refere4b = indethom.refere4b;

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Greg Stark
Erik Norvelle [EMAIL PROTECTED] writes:

 Here's the query I am running:
 update indethom
   set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track 
 of how fast the query is running
   sectref = (select clavis from s2.sectiones where
   s2.sectiones.nomeoper = indethom.nomeoper
   and s2.sectiones.refere1a = indethom.refere1a and 
 s2.sectiones.refere1b = indethom.refere1b
   and s2.sectiones.refere2a = indethom.refere2a and 
 s2.sectiones.refere2b = indethom.refere2b
   and s2.sectiones.refere3a = indethom.refere3a and 
 s2.sectiones.refere3b = indethom.refere3b
   and s2.sectiones.refere4a = indethom.refere4a and 
 s2.sectiones.refere4b = indethom.refere4b);
 
 Here´s the query plan:
 QUERY PLAN
 -
  Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
  - Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND 
 (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND 
 (refere4a = $7) AND (refere4b = $8))
 (4 rows)

Firstly, you might try running vacuum full on both tables. If there are tons
of extra dead records that are left-over they could be slowing down the
update.

This isn't the fastest possible plan but it's pretty good.

You might be able to get it somewhat faster using the non-standard from
clause on the update statement.

update indethom
   set sectref = clavis
  from sectiones
 where sectiones.nomeoper = indethom.nomeoper
   and sectiones.refere1a = indethom.refere1a
   and sectiones.refere1b = indethom.refere1b
   and sectiones.refere2a = indethom.refere2a
   and sectiones.refere2b = indethom.refere2b
   and sectiones.refere3a = indethom.refere3a
   and sectiones.refere3b = indethom.refere3b
   and sectiones.refere4a = indethom.refere4a
   and sectiones.refere4b = indethom.refere4b

This might be able to use a merge join which will take longer to get started
because it has to sort both tables, but might finish faster.

You might also try just paring the index down to just the two or three most
useful columns. Is it common that something matches refere1a and refere1b but
doesn't match the remaining? A 8-column index is a lot of overhead. I'm not
sure how much that effects lookup times but it might be substantial.


-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Robert Treat
On Mon, 2003-12-01 at 16:44, Jared Carr wrote:
 I am currently working on optimizing some fairly time consuming queries 
 on a decently large
 dataset.
 
 The Following is the query in question.
 
 SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
 FROM quotes AS q, zips AS z, cars AS c
 WHERE
 z.zip = q.zip AND
 c.car_id = q.car_id AND
 z.state != 'AA' AND
 z.state != 'AE' AND
 z.state != 'AP' AND
 z.state = 'WA'
  ORDER BY date_time;
 

This wont completely solve your problem, but z.state = 'WA' would seem
to be mutually exclusive of the != AA|AE|AP.  While it's not much, it is
extra overhead there doesn't seem to be any need for...

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark
Jared Carr [EMAIL PROTECTED] writes:

 Furthermore noticed that in the following query plan it is doing the
 sequential scan on quotes first, and then doing the sequential on zips. IMHO
 this should be the other way around, since the result set for zips is
 considerably smaller especially give that we are using a where clause to
 limit the number of items returned from zips, so it would seem that it would
 be faster to scan zips then join onto quotes, but perhaps it needs to do the
 sequential scan on both regardless.

-  Hash Join  (cost=1088.19..56382.58 rows=4058 width=62) (actual 
time=86.111..1834.682 rows=10193 loops=1)
  Hash Cond: ((outer.zip)::text = (inner.zip)::text)
  -  Seq Scan on quotes q  (cost=0.00..10664.25 rows=336525 width=27) (actual 
 time=0.098..658.905 rows=336963 loops=1)
  -  Hash  (cost=1086.90..1086.90 rows=516 width=52) (actual time=85.798..85.798 
 rows=0 loops=1)
-  Seq Scan on zips z  (cost=0.00..1086.90 rows=516 width=52) (actual 
 time=79.532..84.151 rows=718 loops=1)
  Filter: ((state)::text = 'WA'::text)

You're misreading it. Hash join is done by reading in one table into a hash
table, then reading the other table looking up entries in the hash table. The
zips are being read into the hash table which is appropriate if it's the
smaller table.


 Of course still there is the holy grail of getting it to actually use 
 the indexes. :P

  Merge Cond: (outer.?column7? = inner.?column5?)

Well it looks like you have something strange going on. What data type is
car_id in each table? 


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Jeremiah Jahn
Thanks to all, I had already run analyze.  But the STATISTICS setting
seems to have worked. I'm just not sure what it did..? Would anyone care
to explain. 


On Mon, 2003-12-01 at 13:47, Josh Berkus wrote:
 Jeremiah,
 
  I've attached the Analyze below. I have no idea why the db thinks there
  is only 1 judge named simth. Is there some what I can inform the DB
  about this. In actuality, there aren't any judges named smith at the
  moment, but there are 22K people named smith.
 
 No, Hannu meant that you may need to run the following command:
 
 ANALYZE actor;
 
 ... to update the database statistics on the actors table.   That is a 
 maintainence task that needs to be run periodically.
 
 If that doesn't fix the bad plan, then the granularity of statistics on the 
 full_name column needs updating; I suggest:
 
 ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
 ANALYZE actor;
 
 And if it's still choosing  a slow nested loop, up the stats to 250.
-- 
Jeremiah Jahn [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Jared Carr
Greg Stark wrote:


Merge Cond: (outer.?column7? = inner.?column5?)
   

Well it looks like you have something strange going on. What data type is
car_id in each table? 

 

car_id is a varchar(10) in both tables.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark
Jared Carr [EMAIL PROTECTED] writes:

 Greg Stark wrote:
 
 
  Merge Cond: (outer.?column7? = inner.?column5?)
 
 
 Well it looks like you have something strange going on. What data type is
  car_id in each table?
 car_id is a varchar(10) in both tables.

Well for some reason it's being cast to a text to do the merge.

What version of postgres is this btw? The analyzes look like 7.4?

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Jared Carr
Greg Stark wrote:

Jared Carr [EMAIL PROTECTED] writes:

 

Greg Stark wrote:

   

   Merge Cond: (outer.?column7? = inner.?column5?)

   

Well it looks like you have something strange going on. What data type is
car_id in each table?
 

car_id is a varchar(10) in both tables.
   

Well for some reason it's being cast to a text to do the merge.

What version of postgres is this btw? The analyzes look like 7.4?

 

Yes, this is 7.4.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] autovacuum daemon stops doing work after about an hour

2003-12-02 Thread Vivek Khera
I took advantage of last weekend to upgrade from 7.2.4 to 7.4.0 on a
new faster box.

Now I'm trying to implement pg_autovacuum.  It seems to work ok, but
after about an hour or so, it does nothing.  The process still is
running, but nothing is sent to the log file.

I'm running the daemon as distributed with PG 7.4 release as follows:

pg_autovacuum -d4 -V 0.15 -A 1 -U postgres -L /var/tmp/autovacuum.log -D

the last few lines of the log are:

[2003-12-02 11:43:58 AM] VACUUM ANALYZE public.msg_recipients
[2003-12-02 12:24:33 PM] select relfilenode,reltuples,relpages from pg_class where 
relfilenode=18588239
[2003-12-02 12:24:33 PM]   table name: vkmlm.public.msg_recipients
[2003-12-02 12:24:33 PM]  relfilenode: 18588239;   relisshared: 0
[2003-12-02 12:24:33 PM]  reltuples: 9;  relpages: 529132
[2003-12-02 12:24:33 PM]  curr_analyze_count:  1961488; cur_delete_count:   1005040
[2003-12-02 12:24:33 PM]  ins_at_last_analyze: 1961488; del_at_last_vacuum: 1005040
[2003-12-02 12:24:33 PM]  insert_threshold:509; delete_threshold1001
[2003-12-02 12:24:33 PM] Performing: VACUUM ANALYZE public.user_list
[2003-12-02 12:24:33 PM] VACUUM ANALYZE public.user_list
[2003-12-02 12:43:19 PM] select relfilenode,reltuples,relpages from pg_class where 
relfilenode=18588202
[2003-12-02 12:43:19 PM]   table name: vkmlm.public.user_list
[2003-12-02 12:43:19 PM]  relfilenode: 18588202;   relisshared: 0
[2003-12-02 12:43:19 PM]  reltuples: 9;  relpages: 391988
[2003-12-02 12:43:19 PM]  curr_analyze_count:  1159843; cur_delete_count:   1118540
[2003-12-02 12:43:19 PM]  ins_at_last_analyze: 1159843; del_at_last_vacuum: 1118540
[2003-12-02 12:43:19 PM]  insert_threshold:509; delete_threshold1001

Then it just sits there.  I started it at 11:35am, and it is now
3:30pm.

I did the same last night at about 10:58pm, and it ran and did work until
11:57pm, then sat there until I killed/restarted pg_autovacuum this
morning at 11:35.  The process is not using any CPU time.

I just killed/restarted it and it found work to do on my busy tables
which I'd expect.

I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway
Josh Berkus [EMAIL PROTECTED] writes:
 1) to keep it working, you will probably need to run ANALZYE more
often than you have been;

I'm not sure why this would be the case -- can you elaborate?

 4) Currently, pg_dump does *not* back up statistics settings.

Yes, it does.

-Neil


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark

Jared Carr [EMAIL PROTECTED] writes:

 Greg Stark wrote:
 
  Well it looks like you have something strange going on. What data type is
  car_id in each table?
 
 car_id is a varchar(10) in both tables.

Huh. The following shows something strange. It seems joining on two varchars
no longer works well. Instead the optimizer has to convert both columns to
text.

I know some inter-type comparisons were removed a while ago, but I would not
have thought that would effect varchar-varchar comparisons. I think this is
pretty bad.


test=# create table a (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index a_pkey for table a
CREATE TABLE
test=# create table b (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index b_pkey for table b
CREATE TABLE
test=# select * from a,b where a.x=b.x;
 x | x 
---+---
(0 rows)

test=# explain select * from a,b where a.x=b.x;
QUERY PLAN
--
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: (outer.?column2? = inner.?column2?)
   -  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (a.x)::text
 -  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   -  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (b.x)::text
 -  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# create table a2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index a2_pkey for table a2
CREATE TABLE
test=# create table b2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index b2_pkey for table b2
CREATE TABLE
test=# explain select * from a2,b2 where a2.x=b2.x;
QUERY PLAN 
---
 Hash Join  (cost=22.50..57.51 rows=1001 width=64)
   Hash Cond: (outer.x = inner.x)
   -  Seq Scan on a2  (cost=0.00..20.00 rows=1000 width=32)
   -  Hash  (cost=20.00..20.00 rows=1000 width=32)
 -  Seq Scan on b2  (cost=0.00..20.00 rows=1000 width=32)
(5 rows)


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark

Greg Stark [EMAIL PROTECTED] writes:

 Huh. The following shows something strange. 

Worse, with enable_hashjoin off it's even more obvious something's broken:


test=# set enable_hashjoin = off;
SET
test=# explain select * from a,b where a.x=b.x;
QUERY PLAN
--
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: (outer.?column2? = inner.?column2?)
   -  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (a.x)::text
 -  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   -  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (b.x)::text
 -  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# explain select * from a2,b2 where a2.x=b2.x;
 QUERY PLAN  
-
 Merge Join  (cost=0.00..63.04 rows=1001 width=64)
   Merge Cond: (outer.x = inner.x)
   -  Index Scan using a2_pkey on a2  (cost=0.00..24.00 rows=1000 width=32)
   -  Index Scan using b2_pkey on b2  (cost=0.00..24.00 rows=1000 width=32)
(4 rows)

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus
Neil,

  1) to keep it working, you will probably need to run ANALZYE more
 often than you have been;
 
 I'm not sure why this would be the case -- can you elaborate?

For the more granular stats to be useful, they have to be accurate; otherwise 
you'll go back to a nestloop as soon as the query planner encounters a value 
that it doens't think is in the table at all. 

 
  4) Currently, pg_dump does *not* back up statistics settings.
 
 Yes, it does.

Oh, good.   Was this a 7.4 improvement?   I missed that in the changelogs 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Huh. The following shows something strange. It seems joining on two varchars
 no longer works well. Instead the optimizer has to convert both columns to
 text.

Define no longer works well.  varchar doesn't have its own comparison
operators anymore, but AFAIK that makes no difference.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] A question on the query planner

2003-12-02 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Define no longer works well.

 Well it seems to completely bar the use of a straight merge join between two
 index scans:

Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
there's something busted here.  Will get back to you ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-02 Thread Christopher Browne
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] belched out:
 We would be recommending to our ct. on the use of Postgresql db as
 compared to MS SQL Server. We are targetting to use Redhat Linux ES
 v2.1, Postgresql v7.3.4 and Postgresql ODBC 07.03.0100.

 We would like to know the minimum specs required for our below
 target. The minimum specs is referring to no. of CPU, memory,
 harddisk capacity, RAID technology etc. And also the Postgresql
 parameters and configuration to run such a system.

 1) We will be running 2 x Postgresql db  in the machine.

 2) Total number of connections to be around 100. The connections
 from the clients machines will be in ODBC and socket connections.

 3) Estimated number of transactions to be written into the
 Postgresql db is around 15000 records per day.

 The growth rate in terms of number of connections is around 10% per
 year and the data retention is kept on average at least for 18
 months for the 2 databases.

 Are there any reference books or sites that I can tap on for the
 above requirement?

Perhaps the best reference on detailed performance information is the
General Bits documents.

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

These don't point particularly at minimal hardware requirements, but
rather at how to configure the DBMS to best reflect what hardware you
have.  But there's some degree to which you can work backwards...

If you'll need to support 100 concurrent connections, then minimum
shared_buffers is 200, which implies 1600K of RAM required for shared
buffers.

100 connections probably implies around 100MB of memory for the
backend processes to support the connections.

That all points to the notion that you'd more than probably get
half-decent performance if you had a mere 256MB of RAM, which is about
$50 worth these days.

None of it sounds terribly challenging; 15K records per day is 625
records per hour which represents an INSERT every 6 seconds.  Even if
that has to fit into an 8 hour day, that's still not a high number of
transactions per second.  That _sounds like_ an application that could
work on old, obsolete hardware.  I would imagine that my old Intel
Pentium Pro 200 might cope with the load, in much the way that that
server is more than capable of supporting a web server that would
serve a local workgroup.  (I only have 64MB of RAM on that box, which
would be a mite low, but it's an _ancient_ server...)

The only thing that makes me a little suspicious that there's
something funny about the prescription is your indication of having
100 concurrent users, which is really rather heavyweight in comparison
with the comparatively tiny number of transactions.  Is this for some
sort of departmental application?  Where there's a lot of manual
data entry, so that each user would generate a transaction every 3-4
minutes?  That actually sounds about right...

Let me suggest that the cost driver in this will _not_ be the cost
of the hardware to support the database itself; it will instead be in
having redundant hardware and backup hardware to ensure reliability.

It would seem likely that just about any sort of modern hardware would
be pretty adequate to the task.  You can hardly _buy_ a system with
less than Gigahertz-speed CPUs, 40GB of disk, and 256MB of RAM.
Upgrade to have 2 SCSI disks, 512MB (or more, which is better) of RAM,
and the cost of a suitable system still won't be outrageous.

Double it, buying a standby server, and the cost still oughtn't be
real scary.  And if the application is important, you _should_ have a
standby server, irrespective of what software you might be running.
-- 
(reverse (concatenate 'string moc.enworbbc @ enworbbc))
http://www3.sympatico.ca/cbbrowne/x.html
Rules of the Evil Overlord #199. I will not make alliances with those
more powerful than myself. Such a person would only double-cross me in
my moment  of glory.  I will make  alliances with those  less powerful
than myself. I will then  double-cross them in their moment of glory.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match