Re: [PERFORM] autovacuum daemon stops doing work after about an hour

2003-12-03 Thread Matthew T. O'Connor
On Tue, 2003-12-02 at 15:37, Vivek Khera wrote:
> Now I'm trying to implement pg_autovacuum.  It seems to work ok, but
> after about an hour or so, it does nothing.  The process still is
> running, but nothing is sent to the log file.
> 
> I'm running the daemon as distributed with PG 7.4 release as follows:
> 
> pg_autovacuum -d4 -V 0.15 -A 1 -U postgres -L /var/tmp/autovacuum.log -D
> 
> the last few lines of the log are:
> 
> [2003-12-02 11:43:58 AM] VACUUM ANALYZE "public"."msg_recipients"
> [2003-12-02 12:24:33 PM] select relfilenode,reltuples,relpages from pg_class where 
> relfilenode=18588239
> [2003-12-02 12:24:33 PM]   table name: vkmlm."public"."msg_recipients"
> [2003-12-02 12:24:33 PM]  relfilenode: 18588239;   relisshared: 0
> [2003-12-02 12:24:33 PM]  reltuples: 9;  relpages: 529132
> [2003-12-02 12:24:33 PM]  curr_analyze_count:  1961488; cur_delete_count:   
> 1005040
> [2003-12-02 12:24:33 PM]  ins_at_last_analyze: 1961488; del_at_last_vacuum: 
> 1005040
> [2003-12-02 12:24:33 PM]  insert_threshold:509; delete_threshold1001
> [2003-12-02 12:24:33 PM] Performing: VACUUM ANALYZE "public"."user_list"
> [2003-12-02 12:24:33 PM] VACUUM ANALYZE "public"."user_list"
> [2003-12-02 12:43:19 PM] select relfilenode,reltuples,relpages from pg_class where 
> relfilenode=18588202
> [2003-12-02 12:43:19 PM]   table name: vkmlm."public"."user_list"
> [2003-12-02 12:43:19 PM]  relfilenode: 18588202;   relisshared: 0
> [2003-12-02 12:43:19 PM]  reltuples: 9;  relpages: 391988
> [2003-12-02 12:43:19 PM]  curr_analyze_count:  1159843; cur_delete_count:   
> 1118540
> [2003-12-02 12:43:19 PM]  ins_at_last_analyze: 1159843; del_at_last_vacuum: 
> 1118540
> [2003-12-02 12:43:19 PM]  insert_threshold:509; delete_threshold1001
> 
> Then it just sits there.  I started it at 11:35am, and it is now
> 3:30pm.

Weird Alphabetically speaking, is vkmlm."public"."user_list" be the
last table in the last schema in the last database?  You are running
with -d4, so you would get a message about going to sleep shortly after
dealing with the last table, but you didn't get the sleep message, so I
don't think the problem is that pg_autovacuum is sleeping for an
inordinate amount time.

> I did the same last night at about 10:58pm, and it ran and did work until
> 11:57pm, then sat there until I killed/restarted pg_autovacuum this
> morning at 11:35.  The process is not using any CPU time.
> 
> I just killed/restarted it and it found work to do on my busy tables
> which I'd expect.

when you kill it, do you get a core file?  Could you do a backtrace and
see where pg_autovacuum is hung up?

> I'm running Postgres 7.4 release on FreeBSD 4.9-RELEASE.

I don't run FreeBSD, so I haven't tested with FreeBSD.  Recently Craig
Boston reported and submitted a patch for a crash on FreeBSD, but that
doesn't sound like your problem.  Could be some other type of platform
dependent problem. 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-03 Thread CHEWTC

Dear all


Sorry for my mistake on the 15000 recs per day.

In fact, this server is planned as a OLTP database server for a retailer.
Our intention is either to setup 1 or 2 Postgresql db in the server.

The proper sizing info for the 1st Postgresql db should be:

No. of item master : 200,000
(This item master grows at 0.5% daily).

No. of transactions from Point-of-Sales machines: 25,000

Plus other tables, the total sizing that I estimated is 590,000 records
daily.

The 2nd Postgresql db will be used by end users on client machines linked
via ODBC, doing manual data entry.
This will house the item master, loyalty card master and other Historical
data to be kept for at least 1.5 years.

Therefore total sizing for this db is around 165,000,000 recs at any time.

In summary, the single machine must be able to take up around 100 users
connections via both socket and ODBC. And house the above number of
records.


Thank you,
REgards.




   
  
  Christopher Browne   
  
  <[EMAIL PROTECTED]> To:   [EMAIL PROTECTED]  

  Sent by:   cc:   
  
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] Minimum 
hardware requirements for Postgresql db 
  tgresql.org  
  
   
  
   
  
  03/12/2003 12:44 PM  
  
   
  
   
  




After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] belched out:
> We would be recommending to our ct. on the use of Postgresql db as
> compared to MS SQL Server. We are targetting to use Redhat Linux ES
> v2.1, Postgresql v7.3.4 and Postgresql ODBC 07.03.0100.
>
> We would like to know the minimum specs required for our below
> target. The minimum specs is referring to no. of CPU, memory,
> harddisk capacity, RAID technology etc. And also the Postgresql
> parameters and configuration to run such a system.
>
> 1) We will be running 2 x Postgresql db  in the machine.
>
> 2) Total number of connections to be around 100. The connections
> from the clients machines will be in ODBC and socket connections.
>
> 3) Estimated number of transactions to be written into the
> Postgresql db is around 15000 records per day.
>
> The growth rate in terms of number of connections is around 10% per
> year and the data retention is kept on average at least for 18
> months for the 2 databases.
>
> Are there any reference books or sites that I can tap on for the
> above requirement?

Perhaps the best reference on detailed performance information is the
"General Bits" documents.





These don't point particularly at minimal hardware requirements, but
rather at how to configure the DBMS to best reflect what hardware you
have.  But there's some degree to which you can work backwards...

If you'll need to support 100 concurrent connections, then minimum
shared_buffers is 200, which implies 1600K of RAM required for shared
buffers.

100 connections probably implies around 100MB of memory for the
backend processes to support the connections.

That all points to the notion that you'd more than probably get
half-decent performance if you had a mere 256MB of RAM, which is about
$50 worth these days.

None of it sounds terribly challenging; 15K records per day is 625
records per hour which represents an INSERT every 6 seconds.  Even if
that has to fit into an 8 hour day, that's still not a high number of
transactions per second.  That _sounds like_ an application that could
work on old, obsolete hardware.  I would imagine that my old Intel
Pentium Pro 200 might cope with the load, in much the way that that
server is more than capable of supporting a web server that would
serve a local workgroup.  (

Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Sean Shanny
Gaetano,

I don't believe we have ever run the system without it turned on.  
Another switch to fiddle with. :-)

--sean

Gaetano Mendola wrote:

Sean Shanny wrote:

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, 
Adaptec PERC3/Di,  configuration.  I believe they are 10k drives.  
Files system is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP 
with bigmem turned on.  This box is used only for the warehouse.  All 
the ETL work is done on this machine as well.  DB version is 
postgreSQL 7.4.


Are you experiencing improvment using the hyper-threading ?

Regards
Gaetano Mendola



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Gaetano Mendola
Sean Shanny wrote:

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec 
PERC3/Di,  configuration.  I believe they are 10k drives.  Files system 
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem 
turned on.  This box is used only for the warehouse.  All the ETL work 
is done on this machine as well.  DB version is postgreSQL 7.4.
Are you experiencing improvment using the hyper-threading ?

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Gaetano Mendola
Tom Lane wrote:

Hmmm ... [squints] ... it's not supposed to do that ...


The attached patch seems to make it better.
I guess is too late for 7.3.5.

:-(

Any chance for 7.4.1 ?



Regards
Gaetano Mendola
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Jared Carr
Greg Stark wrote:

Jared Carr <[EMAIL PROTECTED]> writes:

 

The patch definitely makes things more consistent...unfortunately it is more
consistent toward the slower execution times. Of course I am looking at this
simply from a straight performance standpoint and not a viewpoint of what
*should* be happening. At any rate here are the query plans with the various
settings.
   

The optimizer seems to be at least considering reasonable plans now. It seems
from the estimates that you need to rerun analyze. You might try "vacuum full
analyze" to be sure.
Also, you might try raising effective_cache_size and/or lowering
random_page_size (it looks like something around 2 might help).
 

Yep, I had forgotten to run vacuum since I had patched it :P. The 
overall performance is definitely better,
I will go ahead and tweak the server settings and see what I can get.  
Thanks again for all the help.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Bruce Momjian
Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> Define "no longer works well".
> 
> > Well it seems to completely bar the use of a straight merge join between two
> > index scans:
> 
> Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
> there's something busted here.  Will get back to you ...

LOL, but I am not sure why.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Fred Moyer
> We are running into issues with IO saturation obviously.  Since this
> thing is only going to get bigger we are looking for some advice on 
> how to accommodate DB's of this size.

> Second and more radical, has anyone run 
> postgreSQL on the new Apple G5 with an XRaid system?  This seems like 
> a great value combination.  Fast CPU, wide bus, Fibre Channel IO, 
> 2.5TB all for ~17k.

If you are going for I/O performance you are best off with one of the
Xserve competitors listed at http://www.apple.com/xserve/raid/.  The
Xserve is based on IDE drives which have a lower seek time (say 8.9 ms)
compared to scsi (3.6 ms for seagate cheetah).  For small random
read/write operations (like databases) this will give you a noticable
improvement in performance over ide drives.  Also make sure to get as
many drives as possible, more spindles equals better I/O performance.

> I keep see references to terabyte postgreSQL installations, I was
> wondering if anyone on this list is in charge of one of those and can 
> offer some advice on how to position ourselves hardware wise.

I've gone to about half terabyte size and all I can say is you should
plan for at least one quarter to one half a rack of drivespace (assuming
14 drives per 4u that's 42 to 84 drives).  Do yourself a favor and get
more rather than less, you will really appreciate it.  I averaged about
2 mb/s average per drive via the raid controller stats on 14 drive array
during I/O bound seek and update operations in 2 raid 10 arrays (half
xlogs and half data).  That comes out to around 2 hours for a terabyte
with 70 drives assuming a constant scaling.  You may be able to get more
or less depending on your setup and query workload.

> Thanks.
>
> --sean
>
>
> ---(end of 
> broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>  subscribe-nomail command to [EMAIL PROTECTED] so that your
>  message can get through to the mailing list cleanly
>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] sequence overhead

2003-12-03 Thread Robert Treat
Just wondering if anyone has done any testing on the amount of overhead
for insert you might gain by adding a serial column to a table. I'm 
thinking of adding a few to some tables that get an average of 30 - 40
inserts per second, sometimes bursting over 100 inserts per second and
wondering if there will be any noticeable impact. 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Eric Soroos
Sean

 Second and more radical, has anyone run postgreSQL on the new Apple 
G5 with an XRaid system?  This seems like a great value combination.  
Fast CPU, wide bus, Fibre Channel IO, 2.5TB all for ~17k.

I keep see references to terabyte postgreSQL installations, I was 
wondering if anyone on this list is in charge of one of those and can 
offer some advice on how to position ourselves hardware wise.
From my (admittedly low end) OSX experience, you just don't have the 
filesystem options on OSX that you have on linux, from the noatime 
mount, filesystem types, and the raid options. I also feel that the 
software stack is a bit more mature and tested on the linux side of 
things.

I doubt that the g5 hardware is that much faster than what you have 
right now. The raid hardware might be a good deal for you even on a 
linux platform. There are reports of it 'just working' with x86 linux 
hardware.

eric

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Sean Shanny
I should also add that we have already done a ton of tuning based on the 
archives of this list so we are not starting from scratch here.

Thanks.

--sean

Sean Shanny wrote:

To all,

We are building a data warehouse composed of essentially click stream 
data.  The DB is growing fairly quickly as to be expected, currently 
at 90GB for one months data.  The idea is to keep 6 months detailed 
data on line and then start aggregating older data to summary tables.  
We have 2 fact tables currently, one with about 68 million rows and 
the other with about 210 million rows.  Numerous dimension tables 
ranging from a dozen rows to millions.

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, 
Adaptec PERC3/Di,  configuration.  I believe they are 10k drives.  
Files system is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP 
with bigmem turned on.  This box is used only for the warehouse.  All 
the ETL work is done on this machine as well.  DB version is 
postgreSQL 7.4.

We are running into issues with IO saturation obviously.  Since this 
thing is only going to get bigger we are looking for some advice on 
how to accommodate DB's of this size.

First question is do we gain anything by moving the RH Enterprise 
version of Linux in terms of performance, mainly in the IO realm as we 
are not CPU bound at all?  Second and more radical, has anyone run 
postgreSQL on the new Apple G5 with an XRaid system?  This seems like 
a great value combination.  Fast CPU, wide bus, Fibre Channel IO, 
2.5TB all for ~17k.

I keep see references to terabyte postgreSQL installations, I was 
wondering if anyone on this list is in charge of one of those and can 
offer some advice on how to position ourselves hardware wise.

Thanks.

--sean

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Sean Shanny
To all,

We are building a data warehouse composed of essentially click stream 
data.  The DB is growing fairly quickly as to be expected, currently at 
90GB for one months data.  The idea is to keep 6 months detailed data on 
line and then start aggregating older data to summary tables.  We have 2 
fact tables currently, one with about 68 million rows and the other with 
about 210 million rows.  Numerous dimension tables ranging from a dozen 
rows to millions.

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec 
PERC3/Di,  configuration.  I believe they are 10k drives.  Files system 
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem 
turned on.  This box is used only for the warehouse.  All the ETL work 
is done on this machine as well.  DB version is postgreSQL 7.4.

We are running into issues with IO saturation obviously.  Since this 
thing is only going to get bigger we are looking for some advice on how 
to accommodate DB's of this size.

First question is do we gain anything by moving the RH Enterprise 
version of Linux in terms of performance, mainly in the IO realm as we 
are not CPU bound at all?  Second and more radical, has anyone run 
postgreSQL on the new Apple G5 with an XRaid system?  This seems like a 
great value combination.  Fast CPU, wide bus, Fibre Channel IO, 2.5TB 
all for ~17k.

I keep see references to terabyte postgreSQL installations, I was 
wondering if anyone on this list is in charge of one of those and can 
offer some advice on how to position ourselves hardware wise.

Thanks.

--sean

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Update performance ... Recommended configuration

2003-12-03 Thread Rod Taylor
> shared_buffers = 128# min max_connections*2 or 16, 8KB each

Try 1500.

> sort_mem = 65535# min 64, size in KB

I'd pull this in. You only have 640MB ram, which means about 8 large
sorts to swap.

How about 16000?

> fsync = false

I presume you understand the risks involved with this setting and
dataloss.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Update performance ... Recommended configuration changes?

2003-12-03 Thread erik

Thanks to Greg Stark, Tom Lane and Stephan Szabo for their advice on 
rewriting my query... the revised query plan claims it should only take 
about half the time my original query did.

Now for a somewhat different question:  How might I improve my DB 
performance by adjusting the various parameters in postgresql.conf and 
kernel config?  Again, TKA.

Here's what I've currently got (hardware, kernel config. and 
postgresql.conf)

Hardware: Mac iBook, G3 900Mhz, 640MB memory (This is my research machine :p 
)
OS:  OS X 10.2.6
Postgresql version: 7.3.2
Kernel Config:
sysctl -w kern.sysv.shmmax=4194304
sysctl -w kern.sysv.shmmin=1
sysctl -w kern.sysv.shmmni=32
sysctl -w kern.sysv.shmseg=8
sysctl -w kern.sysv.shmall=1024

= Snip of postgresql.conf =

#
#   Shared Memory Size
#
shared_buffers = 128# min max_connections*2 or 16, 8KB each
max_fsm_relations = 2000# min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 2   # min 1000, fsm is free space map, ~6 bytes
max_locks_per_transaction = 128 # min 10
wal_buffers = 16# min 4, typically 8KB each
#
#   Non-shared Memory Sizes
#
sort_mem = 65535# min 64, size in KB
vacuum_mem = 8192   # min 1024, size in KB

#
#   Write-ahead log (WAL)
#
#checkpoint_segments = 3# in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300   # range 30-3600, in seconds
#
#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000
#
fsync = false
#wal_sync_method = fsync# the default varies across platforms:
#   # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0  # range 0-16

== End Snip ===

Saludos,
Erik Norvelle



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-03 Thread Al Hulaton
1) We will be running 2 x Postgresql db  in the machine.

2) Total number of connections to be around 100. The connections from the
clients machines will be in ODBC and socket connections.
3) Estimated number of transactions to be written into the Postgresql db is
around 15000 records per day.
Assuming this server will be dedicated to PostgreSQL only, the needs 
outlined above are modest.

As was pointed out in other posts, a simple sub-ghz machine with 512mb 
of ram is more than enough, but I'd slap on a gig only because RAM is 
cheaper now. If the database on this server is crucial, I'd look at 
setting up a UPS, RAID (at this level, even software-based RAID will do 
fine, RAID 5 preferably) and investing in a backup/replicator solution.

--
Best,
Al Hulaton|  Sr. Account Engineer  |  Command Prompt, Inc.
503.667.4564  |  [EMAIL PROTECTED]
Home of Mammoth Replicator for PostgreSQL
Managed PostgreSQL, Linux services and consulting
Read and Search O'Reilly's 'Practical PostgreSQL' at
http://www.commandprompt.com
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Greg Stark
Jared Carr <[EMAIL PROTECTED]> writes:

> The patch definitely makes things more consistent...unfortunately it is more
> consistent toward the slower execution times. Of course I am looking at this
> simply from a straight performance standpoint and not a viewpoint of what
> *should* be happening. At any rate here are the query plans with the various
> settings.

The optimizer seems to be at least considering reasonable plans now. It seems
from the estimates that you need to rerun analyze. You might try "vacuum full
analyze" to be sure.

Also, you might try raising effective_cache_size and/or lowering
random_page_size (it looks like something around 2 might help).

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-03 Thread Greg Stark

"scott.marlowe" <[EMAIL PROTECTED]> writes:

> > 3) Estimated number of transactions to be written into the Postgresql db is
> > around 15000 records per day.
> > 
> > The growth rate in terms of number of connections is around 10% per year
> > and the data retention is kept on average at least for 18 months for the 2
> > databases.

> Like another poster pointed out, this is a walk in the park for 
> postgresql.  My workstation (1.1GHz celeron, 40 gig IDE drive, 512 Meg 
> memory) could handle this load while still being my workstation.

Well there's some info missing. Like what would you actually be _doing_ with
these data?

15,000 inserts per day is nothing. But after 18 months that's over 5M records
not including the 10% growth rate. 5M records isn't really all that much but
it's enough that it's possible to write slow queries against it.

If you're doing big batch updates or complex reports against the data that
will be more interesting than the inserts.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-03 Thread scott.marlowe
On Wed, 3 Dec 2003 [EMAIL PROTECTED] wrote:

> Dear all
> 
> We would be recommending to our ct. on the use of Postgresql db as compared
> to MS SQL Server. We are targetting to use Redhat Linux ES v2.1, Postgresql
> v7.3.4 and Postgresql ODBC 07.03.0100.
> 
> We would like to know the minimum specs required for our below target. The
> minimum specs is referring to no. of CPU, memory, harddisk capacity, RAID
> technology etc. And also the Postgresql parameters and configuration to run
> such a  system.
> 
> 1) We will be running 2 x Postgresql db  in the machine.
> 
> 2) Total number of connections to be around 100. The connections from the
> clients machines will be in ODBC and socket connections.
> 
> 3) Estimated number of transactions to be written into the Postgresql db is
> around 15000 records per day.
> 
> 
> The growth rate in terms of number of connections is around 10% per year
> and the data retention is kept on average at least for 18 months for the 2
> databases.
> 
> Are there any reference books or sites that I can tap on for the above
> requirement?

Like another poster pointed out, this is a walk in the park for 
postgresql.  My workstation (1.1GHz celeron, 40 gig IDE drive, 512 Meg 
memory) could handle this load while still being my workstation.
:-)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Jared Carr
Tom Lane wrote:

Hmmm ... [squints] ... it's not supposed to do that ...
   

The attached patch seems to make it better.

 

The patch definitely makes things more consistent...unfortunately it is 
more
consistent toward the slower execution times.  Of course I am looking at 
this
simply from a straight performance standpoint and not a viewpoint of 
what *should*
be happening. At any rate here are the query plans with the various 
settings.

Default Settings:

QUERY PLAN

Sort  (cost=15290.20..15300.34 rows=4058 width=80) (actual 
time=2944.650..2951.292 rows=4672 loops=1)
  Sort Key: q.date_time
  ->  Hash Join  (cost=13529.79..15046.99 rows=4058 width=80) (actual 
time=2678.033..2873.475 rows=4672 loops=1)
Hash Cond: (("outer".car_id)::text = ("inner".car_id)::text)
->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401 width=37) 
(actual time=19.887..50.971 rows=9401 loops=1)
->  Hash  (cost=13475.65..13475.65 rows=4058 width=62) (actual 
time=2643.377..2643.377 rows=0 loops=1)
  ->  Hash Join  (cost=1088.19..13475.65 rows=4058 
width=62) (actual time=86.739..2497.558 rows=10284 loops=1)
Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
->  Seq Scan on quotes q  (cost=0.00..10664.25 
rows=336525 width=27) (actual time=0.223..1308.561 rows=340694 loops=1)
->  Hash  (cost=1086.90..1086.90 rows=516 width=52) 
(actual time=84.329..84.329 rows=0 loops=1)
  ->  Seq Scan on zips z  (cost=0.00..1086.90 
rows=516 width=52) (actual time=78.363..82.901 rows=718 loops=1)
Filter: ((state)::text = 'WA'::text)
Total runtime: 2955.366 ms

SET enable_seqscan=false;

   
QUERY PLAN
---
Sort  (cost=103557.82..103567.97 rows=4058 width=80) (actual 
time=1015.122..1021.750 rows=4673 loops=1)
  Sort Key: q.date_time
  ->  Merge Join  (cost=102734.94..103314.61 rows=4058 width=80) 
(actual time=802.908..941.520 rows=4673 loops=1)
Merge Cond: ("outer"."?column7?" = ("inner".car_id)::text)
->  Sort  (cost=102734.94..102745.08 rows=4058 width=62) 
(actual time=802.112..812.755 rows=4827 loops=1)
  Sort Key: (q.car_id)::text
  ->  Nested Loop  (cost=0.00..102491.73 rows=4058 
width=62) (actual time=148.535..555.653 rows=10285 loops=1)
->  Index Scan using zip_zips_index on zips z  
(cost=0.00..1272.69 rows=516 width=52) (actual time=148.243..155.577 
rows=718 loops=1)
  Filter: ((state)::text = 'WA'::text)
->  Index Scan using zip_quotes_index on quotes q  
(cost=0.00..195.55 rows=48 width=27) (actual time=0.042..0.454 rows=14 
loops=718)
  Index Cond: (("outer".zip)::text = (q.zip)::text)
->  Index Scan using cars_car_id_btree_index on cars c  
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.220..46.910 
rows=12019 loops=1)
Total runtime: 1027.339 ms

There is still a 3x decrease in execution time here, but it is overall 
slower than before the
patch was applied.

SET enable_mergejoin = false; AND SET enable_seqscan = false;

   
QUERY PLAN
---
Sort  (cost=104586.15..104596.29 rows=4058 width=80) (actual 
time=887.719..894.358 rows=4673 loops=1)
  Sort Key: q.date_time
  ->  Hash Join  (cost=102545.88..104342.94 rows=4058 width=80) (actual 
time=593.710..815.541 rows=4673 loops=1)
Hash Cond: (("outer".car_id)::text = ("inner".car_id)::text)
->  Index Scan using cars_car_id_btree_index on cars c  
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.182..37.306 
rows=9401 loops=1)
->  Hash  (cost=102491.73..102491.73 rows=4058 width=62) 
(actual time=593.040..593.040 rows=0 loops=1)
  ->  Nested Loop  (cost=0.00..102491.73 rows=4058 
width=62) (actual time=146.647..551.975 rows=10285 loops=1)
->  Index Scan using zip_zips_index on zips z  
(cost=0.00..1272.69 rows=516 width=52) (actual time=146.378..153.767 
rows=718 loops=1)
  Filter: ((state)::text = 'WA'::text)
->  Index Scan using zip_quotes_index on quotes q  
(cost=0.00..195.55 rows=48 width=27) (actual time=0.044..0.464 rows=14 
loops=718)
  Index Cond: (("outer".zip)::text = (q.zip)::text)
Total runtime: 898.438 ms

Again a d

Re: [PERFORM] A question on the query planner

2003-12-03 Thread Tom Lane
> Hmmm ... [squints] ... it's not supposed to do that ...

The attached patch seems to make it better.

regards, tom lane


Index: src/backend/optimizer/path/costsize.c
===
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/path/costsize.c,v
retrieving revision 1.115
diff -c -r1.115 costsize.c
*** src/backend/optimizer/path/costsize.c   5 Oct 2003 22:44:25 -   1.115
--- src/backend/optimizer/path/costsize.c   3 Dec 2003 17:40:58 -
***
*** 1322,1327 
--- 1322,1331 
float4 *numbers;
int nnumbers;
  
+   /* Ignore any binary-compatible relabeling */
+   if (var && IsA(var, RelabelType))
+   var = (Var *) ((RelabelType *) var)->arg;
+ 
/*
 * Lookup info about var's relation and attribute; if none available,
 * return default estimate.
Index: src/backend/optimizer/path/pathkeys.c
===
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/path/pathkeys.c,v
retrieving revision 1.53
diff -c -r1.53 pathkeys.c
*** src/backend/optimizer/path/pathkeys.c   4 Aug 2003 02:40:00 -   1.53
--- src/backend/optimizer/path/pathkeys.c   3 Dec 2003 17:40:58 -
***
*** 25,36 
  #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
  #include "parser/parse_func.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  
  
! static PathKeyItem *makePathKeyItem(Node *key, Oid sortop);
  static List *make_canonical_pathkey(Query *root, PathKeyItem *item);
  static Var *find_indexkey_var(Query *root, RelOptInfo *rel,
  AttrNumber varattno);
--- 25,37 
  #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
+ #include "parser/parse_expr.h"
  #include "parser/parse_func.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  
  
! static PathKeyItem *makePathKeyItem(Node *key, Oid sortop, bool checkType);
  static List *make_canonical_pathkey(Query *root, PathKeyItem *item);
  static Var *find_indexkey_var(Query *root, RelOptInfo *rel,
  AttrNumber varattno);
***
*** 41,50 
   *create a PathKeyItem node
   */
  static PathKeyItem *
! makePathKeyItem(Node *key, Oid sortop)
  {
PathKeyItem *item = makeNode(PathKeyItem);
  
item->key = key;
item->sortop = sortop;
return item;
--- 42,70 
   *create a PathKeyItem node
   */
  static PathKeyItem *
! makePathKeyItem(Node *key, Oid sortop, bool checkType)
  {
PathKeyItem *item = makeNode(PathKeyItem);
  
+   /*
+* Some callers pass expressions that are not necessarily of the same
+* type as the sort operator expects as input (for example when dealing
+* with an index that uses binary-compatible operators).  We must relabel
+* these with the correct type so that the key expressions will be seen
+* as equal() to expressions that have been correctly labeled.
+*/
+   if (checkType)
+   {
+   Oid lefttype,
+   righttype;
+ 
+   op_input_types(sortop, &lefttype, &righttype);
+   if (exprType(key) != lefttype)
+   key = (Node *) makeRelabelType((Expr *) key,
+  
lefttype, -1,
+  
COERCE_DONTCARE);
+   }
+ 
item->key = key;
item->sortop = sortop;
return item;
***
*** 70,78 
  {
Expr   *clause = restrictinfo->clause;
PathKeyItem *item1 = makePathKeyItem(get_leftop(clause),
!
restrictinfo->left_sortop);
PathKeyItem *item2 = makePathKeyItem(get_rightop(clause),
!
restrictinfo->right_sortop);
List   *newset,
   *cursetlink;
  
--- 90,100 
  {
Expr   *clause = restrictinfo->clause;
PathKeyItem *item1 = makePathKeyItem(get_leftop(clause),
!
restrictinfo->left_sortop,
!
false);
PathKeyItem *item2 = makePathKeyItem(get_rightop(clause),
!
restrictinfo->right_sortop,
!
false);
List   *newset,
   *cu