date:20031202

Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-02 Thread Christopher Browne

After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] belched out:
> We would be recommending to our ct. on the use of Postgresql db as
> compared to MS SQL Server. We are targetting to use Redhat Linux ES
> v2.1, Postgresql v7.3.4 and Postgresql ODBC 07.03.0100.
>
> We would like to know the minimum specs required for our below
> target. The minimum specs is referring to no. of CPU, memory,
> harddisk capacity, RAID technology etc. And also the Postgresql
> parameters and configuration to run such a system.
>
> 1) We will be running 2 x Postgresql db  in the machine.
>
> 2) Total number of connections to be around 100. The connections
> from the clients machines will be in ODBC and socket connections.
>
> 3) Estimated number of transactions to be written into the
> Postgresql db is around 15000 records per day.
>
> The growth rate in terms of number of connections is around 10% per
> year and the data retention is kept on average at least for 18
> months for the 2 databases.
>
> Are there any reference books or sites that I can tap on for the
> above requirement?

Perhaps the best reference on detailed performance information is the
"General Bits" documents.





These don't point particularly at minimal hardware requirements, but
rather at how to configure the DBMS to best reflect what hardware you
have.  But there's some degree to which you can work backwards...

If you'll need to support 100 concurrent connections, then minimum
shared_buffers is 200, which implies 1600K of RAM required for shared
buffers.

100 connections probably implies around 100MB of memory for the
backend processes to support the connections.

That all points to the notion that you'd more than probably get
half-decent performance if you had a mere 256MB of RAM, which is about
$50 worth these days.

None of it sounds terribly challenging; 15K records per day is 625
records per hour which represents an INSERT every 6 seconds.  Even if
that has to fit into an 8 hour day, that's still not a high number of
transactions per second.  That _sounds like_ an application that could
work on old, obsolete hardware.  I would imagine that my old Intel
Pentium Pro 200 might cope with the load, in much the way that that
server is more than capable of supporting a web server that would
serve a local workgroup.  (I only have 64MB of RAM on that box, which
would be a mite low, but it's an _ancient_ server...)

The only thing that makes me a little suspicious that there's
something funny about the prescription is your indication of having
100 concurrent users, which is really rather heavyweight in comparison
with the comparatively tiny number of transactions.  Is this for some
sort of "departmental application"?  Where there's a lot of manual
data entry, so that each user would generate a transaction every 3-4
minutes?  That actually sounds about right...

Let me suggest that the "cost driver" in this will _not_ be the cost
of the hardware to support the database itself; it will instead be in
having redundant hardware and backup hardware to ensure reliability.

It would seem likely that just about any sort of modern hardware would
be pretty adequate to the task.  You can hardly _buy_ a system with
less than Gigahertz-speed CPUs, 40GB of disk, and 256MB of RAM.
Upgrade to have 2 SCSI disks, 512MB (or more, which is better) of RAM,
and the cost of a suitable system still won't be outrageous.

Double it, buying a standby server, and the cost still oughtn't be
real scary.  And if the application is important, you _should_ have a
standby server, irrespective of what software you might be running.
-- 
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/x.html
Rules of the Evil Overlord #199. "I will not make alliances with those
more powerful than myself. Such a person would only double-cross me in
my moment  of glory.  I will make  alliances with those  less powerful
than myself. I will then  double-cross them in their moment of glory."


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Define "no longer works well".

> Well it seems to completely bar the use of a straight merge join between two
> index scans:

Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
there's something busted here.  Will get back to you ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] Minimum hardware requirements for Postgresql db

2003-12-02 Thread CHEWTC

Dear all

We would be recommending to our ct. on the use of Postgresql db as compared
to MS SQL Server. We are targetting to use Redhat Linux ES v2.1, Postgresql
v7.3.4 and Postgresql ODBC 07.03.0100.

We would like to know the minimum specs required for our below target. The
minimum specs is referring to no. of CPU, memory, harddisk capacity, RAID
technology etc. And also the Postgresql parameters and configuration to run
such a  system.

1) We will be running 2 x Postgresql db  in the machine.

2) Total number of connections to be around 100. The connections from the
clients machines will be in ODBC and socket connections.

3) Estimated number of transactions to be written into the Postgresql db is
around 15000 records per day.


The growth rate in terms of number of connections is around 10% per year
and the data retention is kept on average at least for 18 months for the 2
databases.

Are there any reference books or sites that I can tap on for the above
requirement?


Thank you,
REgards.





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne

4) Currently, pg_dump does *not* back up statistics settings.


Is this a TODO?
Oops - sorry I thought you meant 'pg_dump does not back up statistics'. 
   Probably still should be a TODO :)

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne

4) Currently, pg_dump does *not* back up statistics settings.
Is this a TODO?

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Define "no longer works well".  varchar doesn't have its own comparison
> operators anymore, but AFAIK that makes no difference.


Well it seems to completely bar the use of a straight merge join between two
index scans:


test=# set enable_seqscan = off;
SET

test=# explain select * from a,b where a.x=b.x;
QUERY PLAN 
---
 Nested Loop  (cost=1.00..12188.86 rows=1001 width=64)
   ->  Seq Scan on a  (cost=1.00..10020.00 rows=1000 width=32)
   ->  Index Scan using b_pkey on b  (cost=0.00..2.16 rows=1 width=32)
 Index Cond: (("outer".x)::text = (b.x)::text)
(4 rows)

test=# explain select * from a2,b2 where a2.x=b2.x;
 QUERY PLAN  
-
 Merge Join  (cost=0.00..63.04 rows=1001 width=64)
   Merge Cond: ("outer".x = "inner".x)
   ->  Index Scan using a2_pkey on a2  (cost=0.00..24.00 rows=1000 width=32)
   ->  Index Scan using b2_pkey on b2  (cost=0.00..24.00 rows=1000 width=32)
(4 rows)


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> Huh. The following shows something strange. It seems joining on two varchars
> no longer works well. Instead the optimizer has to convert both columns to
> text.

Define "no longer works well".  varchar doesn't have its own comparison
operators anymore, but AFAIK that makes no difference.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway

Josh Berkus <[EMAIL PROTECTED]> writes:
> Oh, good.   Was this a 7.4 improvement?

No, it was in 7.3

-Neil


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus

Neil,

> > 1) to keep it working, you will probably need to run ANALZYE more
> >often than you have been;
> 
> I'm not sure why this would be the case -- can you elaborate?

For the more granular stats to be useful, they have to be accurate; otherwise 
you'll go back to a nestloop as soon as the query planner encounters a value 
that it doens't think is in the table at all. 

> 
> > 4) Currently, pg_dump does *not* back up statistics settings.
> 
> Yes, it does.

Oh, good.   Was this a 7.4 improvement?   I missed that in the changelogs 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark


Greg Stark <[EMAIL PROTECTED]> writes:

> Huh. The following shows something strange. 

Worse, with enable_hashjoin off it's even more obvious something's broken:


test=# set enable_hashjoin = off;
SET
test=# explain select * from a,b where a.x=b.x;
QUERY PLAN
--
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (a.x)::text
 ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (b.x)::text
 ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# explain select * from a2,b2 where a2.x=b2.x;
 QUERY PLAN  
-
 Merge Join  (cost=0.00..63.04 rows=1001 width=64)
   Merge Cond: ("outer".x = "inner".x)
   ->  Index Scan using a2_pkey on a2  (cost=0.00..24.00 rows=1000 width=32)
   ->  Index Scan using b2_pkey on b2  (cost=0.00..24.00 rows=1000 width=32)
(4 rows)

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark


Jared Carr <[EMAIL PROTECTED]> writes:

> Greg Stark wrote:
> 
> > Well it looks like you have something strange going on. What data type is
> > car_id in each table?
> >
> car_id is a varchar(10) in both tables.

Huh. The following shows something strange. It seems joining on two varchars
no longer works well. Instead the optimizer has to convert both columns to
text.

I know some inter-type comparisons were removed a while ago, but I would not
have thought that would effect varchar-varchar comparisons. I think this is
pretty bad.


test=# create table a (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "a_pkey" for table "a"
CREATE TABLE
test=# create table b (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "b_pkey" for table "b"
CREATE TABLE
test=# select * from a,b where a.x=b.x;
 x | x 
---+---
(0 rows)

test=# explain select * from a,b where a.x=b.x;
QUERY PLAN
--
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (a.x)::text
 ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
 Sort Key: (b.x)::text
 ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# create table a2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "a2_pkey" for table "a2"
CREATE TABLE
test=# create table b2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "b2_pkey" for table "b2"
CREATE TABLE
test=# explain select * from a2,b2 where a2.x=b2.x;
QUERY PLAN 
---
 Hash Join  (cost=22.50..57.51 rows=1001 width=64)
   Hash Cond: ("outer".x = "inner".x)
   ->  Seq Scan on a2  (cost=0.00..20.00 rows=1000 width=32)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=32)
 ->  Seq Scan on b2  (cost=0.00..20.00 rows=1000 width=32)
(5 rows)


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway

Josh Berkus <[EMAIL PROTECTED]> writes:
> 1) to keep it working, you will probably need to run ANALZYE more
>often than you have been;

I'm not sure why this would be the case -- can you elaborate?

> 4) Currently, pg_dump does *not* back up statistics settings.

Yes, it does.

-Neil


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[PERFORM] autovacuum daemon stops doing work after about an hour

2003-12-02 Thread Vivek Khera

I took advantage of last weekend to upgrade from 7.2.4 to 7.4.0 on a
new faster box.

Now I'm trying to implement pg_autovacuum.  It seems to work ok, but
after about an hour or so, it does nothing.  The process still is
running, but nothing is sent to the log file.

I'm running the daemon as distributed with PG 7.4 release as follows:

pg_autovacuum -d4 -V 0.15 -A 1 -U postgres -L /var/tmp/autovacuum.log -D

the last few lines of the log are:

[2003-12-02 11:43:58 AM] VACUUM ANALYZE "public"."msg_recipients"
[2003-12-02 12:24:33 PM] select relfilenode,reltuples,relpages from pg_class where 
relfilenode=18588239
[2003-12-02 12:24:33 PM]   table name: vkmlm."public"."msg_recipients"
[2003-12-02 12:24:33 PM]  relfilenode: 18588239;   relisshared: 0
[2003-12-02 12:24:33 PM]  reltuples: 9;  relpages: 529132
[2003-12-02 12:24:33 PM]  curr_analyze_count:  1961488; cur_delete_count:   1005040
[2003-12-02 12:24:33 PM]  ins_at_last_analyze: 1961488; del_at_last_vacuum: 1005040
[2003-12-02 12:24:33 PM]  insert_threshold:509; delete_threshold1001
[2003-12-02 12:24:33 PM] Performing: VACUUM ANALYZE "public"."user_list"
[2003-12-02 12:24:33 PM] VACUUM ANALYZE "public"."user_list"
[2003-12-02 12:43:19 PM] select relfilenode,reltuples,relpages from pg_class where 
relfilenode=18588202
[2003-12-02 12:43:19 PM]   table name: vkmlm."public"."user_list"
[2003-12-02 12:43:19 PM]  relfilenode: 18588202;   relisshared: 0
[2003-12-02 12:43:19 PM]  reltuples: 9;  relpages: 391988
[2003-12-02 12:43:19 PM]  curr_analyze_count:  1159843; cur_delete_count:   1118540
[2003-12-02 12:43:19 PM]  ins_at_last_analyze: 1159843; del_at_last_vacuum: 1118540
[2003-12-02 12:43:19 PM]  insert_threshold:509; delete_threshold1001

Then it just sits there.  I started it at 11:35am, and it is now
3:30pm.

I did the same last night at about 10:58pm, and it ran and did work until
11:57pm, then sat there until I killed/restarted pg_autovacuum this
morning at 11:35.  The process is not using any CPU time.

I just killed/restarted it and it found work to do on my busy tables
which I'd expect.

I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Jared Carr

Greg Stark wrote:

Jared Carr <[EMAIL PROTECTED]> writes:

 

Greg Stark wrote:

   

   Merge Cond: ("outer"."?column7?" = "inner"."?column5?")

   

Well it looks like you have something strange going on. What data type is
car_id in each table?
 

car_id is a varchar(10) in both tables.
   

Well for some reason it's being cast to a text to do the merge.

What version of postgres is this btw? The analyzes look like 7.4?

 

Yes, this is 7.4.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark

Jared Carr <[EMAIL PROTECTED]> writes:

> Greg Stark wrote:
> 
> >
> >> Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
> >>
> >
> >Well it looks like you have something strange going on. What data type is
> > car_id in each table?
> car_id is a varchar(10) in both tables.

Well for some reason it's being cast to a text to do the merge.

What version of postgres is this btw? The analyzes look like 7.4?

-- 
greg

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus

Jeremiah,

> Thanks to all, I had already run analyze.  But the STATISTICS setting
> seems to have worked. I'm just not sure what it did..? Would anyone care
> to explain.

The STATISTICS setting improves the granularity of statistics kept by the 
query planner on that column; increasing the granularity (i.e. more random 
samples) can significantly improve things in cases where you have data whose 
distribution is significantly skewed.Certainly whenever you see the query 
planner using a slow nestloop becuase of a bad row-return estimate, it is one 
of the first things to try.

Its drawbacks are 4-fold:
1) to keep it working, you will probably need to run ANALZYE more often than 
you have been;
2) these ANALYZEs will take longer, and have the annoying side effect of 
flooring your CPU while they do;
3) You will have to be sure that your vacuum plan includes vacuuming the 
pg_statistic table as the database superuser, as that table will be getting 
updated more often.
4) Currently, pg_dump does *not* back up statistics settings.  So you will 
need to save a script which does this in preparation for having to restore 
your database.

Which is why the stats are set low by default.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Jared Carr

Greg Stark wrote:


Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
   

Well it looks like you have something strange going on. What data type is
car_id in each table? 

 

car_id is a varchar(10) in both tables.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Jeremiah Jahn

Thanks to all, I had already run analyze.  But the STATISTICS setting
seems to have worked. I'm just not sure what it did..? Would anyone care
to explain. 


On Mon, 2003-12-01 at 13:47, Josh Berkus wrote:
> Jeremiah,
> 
> > I've attached the Analyze below. I have no idea why the db thinks there
> > is only 1 judge named simth. Is there some what I can inform the DB
> > about this. In actuality, there aren't any judges named smith at the
> > moment, but there are 22K people named smith.
> 
> No, Hannu meant that you may need to run the following command:
> 
> ANALYZE actor;
> 
> ... to update the database statistics on the actors table.   That is a 
> maintainence task that needs to be run periodically.
> 
> If that doesn't fix the bad plan, then the granularity of statistics on the 
> full_name column needs updating; I suggest:
> 
> ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
> ANALYZE actor;
> 
> And if it's still choosing  a slow nested loop, up the stats to 250.
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Greg Stark

Jared Carr <[EMAIL PROTECTED]> writes:

> Furthermore noticed that in the following query plan it is doing the
> sequential scan on quotes first, and then doing the sequential on zips. IMHO
> this should be the other way around, since the result set for zips is
> considerably smaller especially give that we are using a where clause to
> limit the number of items returned from zips, so it would seem that it would
> be faster to scan zips then join onto quotes, but perhaps it needs to do the
> sequential scan on both regardless.

>->  Hash Join  (cost=1088.19..56382.58 rows=4058 width=62) (actual 
>time=86.111..1834.682 rows=10193 loops=1)
>  Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
>  ->  Seq Scan on quotes q  (cost=0.00..10664.25 rows=336525 width=27) (actual 
> time=0.098..658.905 rows=336963 loops=1)
>  ->  Hash  (cost=1086.90..1086.90 rows=516 width=52) (actual time=85.798..85.798 
> rows=0 loops=1)
>->  Seq Scan on zips z  (cost=0.00..1086.90 rows=516 width=52) (actual 
> time=79.532..84.151 rows=718 loops=1)
>  Filter: ((state)::text = 'WA'::text)

You're misreading it. Hash join is done by reading in one table into a hash
table, then reading the other table looking up entries in the hash table. The
zips are being read into the hash table which is appropriate if it's the
smaller table.


> Of course still there is the holy grail of getting it to actually use 
> the indexes. :P

>  Merge Cond: ("outer"."?column7?" = "inner"."?column5?")

Well it looks like you have something strange going on. What data type is
car_id in each table? 


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Jared Carr

Robert Treat wrote:

On Mon, 2003-12-01 at 16:44, Jared Carr wrote:
 

I am currently working on optimizing some fairly time consuming queries 
on a decently large
dataset.

The Following is the query in question.

SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
   FROM quotes AS q, zips AS z, cars AS c
   WHERE
   z.zip = q.zip AND
   c.car_id = q.car_id AND
   z.state != 'AA' AND
   z.state != 'AE' AND
   z.state != 'AP' AND
   z.state = 'WA'
ORDER BY date_time;
   

This wont completely solve your problem, but z.state = 'WA' would seem
to be mutually exclusive of the != AA|AE|AP.  While it's not much, it is
extra overhead there doesn't seem to be any need for...
Robert Treat
 

That is an excellent point, unfortunately it doesn't change the query 
plan at all.

Furthermore noticed that in the following query plan it is doing the 
sequential scan on quotes first, and
then doing the sequential on zips.  IMHO this should be the other way 
around, since the result set for
zips is considerably smaller especially give that we are using a where 
clause to limit the number of items
returned from zips, so it would seem that it would be faster to scan 
zips then join onto quotes, but perhaps
it needs to do the sequential scan on both regardless.

Of course still there is the holy grail of getting it to actually use 
the indexes. :P

   QUERY PLAN
---
Sort  (cost=57812.71..57822.86 rows=4058 width=80) (actual 
time=2522.826..2529.237 rows=4581 loops=1)
  Sort Key: q.date_time
  ->  Merge Join  (cost=57473.20..57569.50 rows=4058 width=80) (actual 
time=2360.656..2451.987 rows=4581 loops=1)
Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
->  Sort  (cost=56625.79..56635.93 rows=4058 width=62) (actual 
time=2077.209..2085.095 rows=4735 loops=1)
  Sort Key: (q.car_id)::text
  ->  Hash Join  (cost=1088.19..56382.58 rows=4058 
width=62) (actual time=86.111..1834.682 rows=10193 loops=1)
Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
->  Seq Scan on quotes q  (cost=0.00..10664.25 
rows=336525 width=27) (actual time=0.098..658.905 rows=336963 loops=1)
->  Hash  (cost=1086.90..1086.90 rows=516 width=52) 
(actual time=85.798..85.798 rows=0 loops=1)
  ->  Seq Scan on zips z  (cost=0.00..1086.90 
rows=516 width=52) (actual time=79.532..84.151 rows=718 loops=1)
Filter: ((state)::text = 'WA'::text)
->  Sort  (cost=847.41..870.91 rows=9401 width=37) (actual 
time=282.896..300.082 rows=11950 loops=1)
  Sort Key: (c.car_id)::text
  ->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401 
width=37) (actual time=0.102..43.516 rows=9401 loops=1)



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] A question on the query planner

2003-12-02 Thread Robert Treat

On Mon, 2003-12-01 at 16:44, Jared Carr wrote:
> I am currently working on optimizing some fairly time consuming queries 
> on a decently large
> dataset.
> 
> The Following is the query in question.
> 
> SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
> FROM quotes AS q, zips AS z, cars AS c
> WHERE
> z.zip = q.zip AND
> c.car_id = q.car_id AND
> z.state != 'AA' AND
> z.state != 'AE' AND
> z.state != 'AP' AND
> z.state = 'WA'
>  ORDER BY date_time;
> 

This wont completely solve your problem, but z.state = 'WA' would seem
to be mutually exclusive of the != AA|AE|AP.  While it's not much, it is
extra overhead there doesn't seem to be any need for...

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Greg Stark

Erik Norvelle <[EMAIL PROTECTED]> writes:

> Here's the query I am running:
> update indethom
>   set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track 
> of how fast the query is running
>   sectref = (select clavis from s2.sectiones where
>   s2.sectiones.nomeoper = indethom.nomeoper
>   and s2.sectiones.refere1a = indethom.refere1a and 
> s2.sectiones.refere1b = indethom.refere1b
>   and s2.sectiones.refere2a = indethom.refere2a and 
> s2.sectiones.refere2b = indethom.refere2b
>   and s2.sectiones.refere3a = indethom.refere3a and 
> s2.sectiones.refere3b = indethom.refere3b
>   and s2.sectiones.refere4a = indethom.refere4a and 
> s2.sectiones.refere4b = indethom.refere4b);
> 
> Here´s the query plan:
> QUERY PLAN
> -
>  Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
>SubPlan
>  -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4)
>Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND 
> (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND 
> (refere4a = $7) AND (refere4b = $8))
> (4 rows)

Firstly, you might try running "vacuum full" on both tables. If there are tons
of extra dead records that are left-over they could be slowing down the
update.

This isn't the fastest possible plan but it's pretty good.

You might be able to get it somewhat faster using the non-standard "from"
clause on the update statement.

update indethom
   set sectref = clavis
  from sectiones
 where sectiones.nomeoper = indethom.nomeoper
   and sectiones.refere1a = indethom.refere1a
   and sectiones.refere1b = indethom.refere1b
   and sectiones.refere2a = indethom.refere2a
   and sectiones.refere2b = indethom.refere2b
   and sectiones.refere3a = indethom.refere3a
   and sectiones.refere3b = indethom.refere3b
   and sectiones.refere4a = indethom.refere4a
   and sectiones.refere4b = indethom.refere4b

This might be able to use a merge join which will take longer to get started
because it has to sort both tables, but might finish faster.

You might also try just paring the index down to just the two or three most
useful columns. Is it common that something matches refere1a and refere1b but
doesn't match the remaining? A 8-column index is a lot of overhead. I'm not
sure how much that effects lookup times but it might be substantial.

-- 
greg

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Tom Lane

Erik Norvelle <[EMAIL PROTECTED]> writes:
> update indethom
>   set query_counter =3D nextval('s2.query_counter_seq'),   -- Just=
> =20=20
> for keeping track of how fast the query is running
>   sectref =3D (select clavis from s2.sectiones where
>   s2.sectiones.nomeoper =3D indethom.nomeoper
>   and s2.sectiones.refere1a =3D indethom.refere1a and=20=20
> s2.sectiones.refere1b =3D indethom.refere1b
>   and s2.sectiones.refere2a =3D indethom.refere2a  and=20=20
> s2.sectiones.refere2b =3D indethom.refere2b
>   and s2.sectiones.refere3a =3D indethom.refere3a  and=20=20
> s2.sectiones.refere3b =3D indethom.refere3b
>   and s2.sectiones.refere4a =3D indethom.refere4a and=20=20
> s2.sectiones.refere4b =3D indethom.refere4b);

This is effectively forcing a nestloop-with-inner-indexscan join.  You
might be better off with

update indethom
set query_counter = nextval('s2.query_counter_seq'),
sectref = sectiones.clavis
from s2.sectiones
where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and  
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a  and  
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a  and  
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and  
s2.sectiones.refere4b = indethom.refere4b;

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Update performance ... is 200,000 updates per hour

2003-12-02 Thread Stephan Szabo


On Tue, 2 Dec 2003, Erik Norvelle wrote:

> ** My question has to do with whether or not I am getting maximal speed
> out of PostgreSQL, or whether I need to perform further optimizations.
> I am currently getting about 200,000 updates per hour, and updating the
> entire 10 million rows thus requires 50 hours, which seems a bit much.

Well, it doesn't entirely surprise me much given the presumably 10 million
iterations of the index scan that it's doing. Explain analyze output (even
over a subset of the indethom table by adding a where clause) would
probably help to get better info.

I'd suggest seeing if something like:
update indethom set query_counter=...,sectref=s.clavis
 FROM s2.sectiones s where
  s2.sectiones.nomeoper = indethom.nomeoper and ...;
tries a join that might give a better plan.


> Here's the query I am running:
> update indethom
>   set query_counter = nextval('s2.query_counter_seq'),   -- Just
> for keeping track of how fast the query is running
>   sectref = (select clavis from s2.sectiones where
>   s2.sectiones.nomeoper = indethom.nomeoper
>   and s2.sectiones.refere1a = indethom.refere1a and
> s2.sectiones.refere1b = indethom.refere1b
>   and s2.sectiones.refere2a = indethom.refere2a  and
> s2.sectiones.refere2b = indethom.refere2b
>   and s2.sectiones.refere3a = indethom.refere3a  and
> s2.sectiones.refere3b = indethom.refere3b
>   and s2.sectiones.refere4a = indethom.refere4a and
> s2.sectiones.refere4b = indethom.refere4b);
>
> Here´s the query plan:
>  QUERY PLAN
> 
> -
>   Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
> SubPlan
>   ->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03
> rows=1 width=4)
> Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
> (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a =
> $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
> (4 rows)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Erik Norvelle

Folks:

I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting).  The table is called "indethom", and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in.

This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called "s2.sectiones" which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas.  I will then eliminate this information from the original "indethom" table, replacing it with the foreign key I am currently generating.

** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations.  I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much.

Here's the query I am running: 
update indethom 
set query_counter = nextval('s2.query_counter_seq'),   -- Just for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where 
s2.sectiones.nomeoper = indethom.nomeoper 
and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b 
and s2.sectiones.refere2a = indethom.refere2a  and s2.sectiones.refere2b = indethom.refere2b 
and s2.sectiones.refere3a = indethom.refere3a  and s2.sectiones.refere3b = indethom.refere3b 
and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b);

Here´s the query plan:
QUERY PLAN
-
Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03 rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)

Note:  I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve.

Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table "s2.sectiones"
Column  | Type | Modifiers 
--+--+---
nomeoper | character(3) | 
refere1a | character(2) | 
refere1b | character(2) | 
refere2a | character(2) | 
refere2b | character(2) | 
refere3a | character(2) | 
refere3b | character(2) | 
refere4a | character(2) | 
refere4b | character(2) | 
clavis   | integer  | 
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b)

Finally, here is the structure of indethom (some non-relevant columns not shown):
it=> \d indethom
Table "public.indethom"
Column | Type  | Modifiers 
---+---+---
numeoper  | smallint  | not null
nomeoper  | character(3)  | not null
editcrit  | character(1)  | 
refere1a  | character(2)  | 
refere1b  | character(2)  | 
refere2a  | character(2)  | 
refere2b  | character(2)  | 
refere3a  | character(2)  | 
refere3b  | character(2)  | 
refere4a  | character(2)  | 
refere4b  | character(2)  | 
refere5a  | character(2)  | not null
refere5b  | smallint  | not null
referen6  | smallint  | not null
... several columns skipped ...
verbum| character varying(22) | not null
... other columns skipped ...
poslinop  | integer   | not null
posverli  | smallint  | not null
posverop  | integer   | not null
clavis| integer   | not null
articref  | integer   | 
sectref   | integer   | 
query_counter | integer   | 
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)

Thanks for your assistance!
-Erik Norvelle

Re: [PERFORM] Minimum hardware requirements for Postgresql db

Re: [PERFORM] A question on the query planner

[PERFORM] Minimum hardware requirements for Postgresql db

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] A question on the query planner

Re: [PERFORM] A question on the query planner

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] A question on the query planner

Re: [PERFORM] A question on the query planner

Re: [PERFORM] cross table indexes or something?

[PERFORM] autovacuum daemon stops doing work after about an hour

Re: [PERFORM] A question on the query planner

Re: [PERFORM] A question on the query planner

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] A question on the query planner

Re: [PERFORM] cross table indexes or something?

Re: [PERFORM] A question on the query planner

Re: [PERFORM] A question on the query planner

Re: [PERFORM] A question on the query planner

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

Re: [PERFORM] Update performance ... is 200,000 updates per hour

[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

25 matches

Site Navigation

Mail list logo

Footer information