Re: [PERFORM] Minimum hardware requirements for Postgresql db

2003-12-03 Thread Greg Stark

scott.marlowe [EMAIL PROTECTED] writes:

  3) Estimated number of transactions to be written into the Postgresql db is
  around 15000 records per day.
  
  The growth rate in terms of number of connections is around 10% per year
  and the data retention is kept on average at least for 18 months for the 2
  databases.

 Like another poster pointed out, this is a walk in the park for 
 postgresql.  My workstation (1.1GHz celeron, 40 gig IDE drive, 512 Meg 
 memory) could handle this load while still being my workstation.

Well there's some info missing. Like what would you actually be _doing_ with
these data?

15,000 inserts per day is nothing. But after 18 months that's over 5M records
not including the 10% growth rate. 5M records isn't really all that much but
it's enough that it's possible to write slow queries against it.

If you're doing big batch updates or complex reports against the data that
will be more interesting than the inserts.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Update performance ... Recommended configuration changes?

2003-12-03 Thread erik

Thanks to Greg Stark, Tom Lane and Stephan Szabo for their advice on 
rewriting my query... the revised query plan claims it should only take 
about half the time my original query did.

Now for a somewhat different question:  How might I improve my DB 
performance by adjusting the various parameters in postgresql.conf and 
kernel config?  Again, TKA.

Here's what I've currently got (hardware, kernel config. and 
postgresql.conf)

Hardware: Mac iBook, G3 900Mhz, 640MB memory (This is my research machine :p 
)
OS:  OS X 10.2.6
Postgresql version: 7.3.2
Kernel Config:
sysctl -w kern.sysv.shmmax=4194304
sysctl -w kern.sysv.shmmin=1
sysctl -w kern.sysv.shmmni=32
sysctl -w kern.sysv.shmseg=8
sysctl -w kern.sysv.shmall=1024

= Snip of postgresql.conf =

#
#   Shared Memory Size
#
shared_buffers = 128# min max_connections*2 or 16, 8KB each
max_fsm_relations = 2000# min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 2   # min 1000, fsm is free space map, ~6 bytes
max_locks_per_transaction = 128 # min 10
wal_buffers = 16# min 4, typically 8KB each
#
#   Non-shared Memory Sizes
#
sort_mem = 65535# min 64, size in KB
vacuum_mem = 8192   # min 1024, size in KB

#
#   Write-ahead log (WAL)
#
#checkpoint_segments = 3# in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300   # range 30-3600, in seconds
#
#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000
#
fsync = false
#wal_sync_method = fsync# the default varies across platforms:
#   # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0  # range 0-16

== End Snip ===

Saludos,
Erik Norvelle



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Update performance ... Recommended configuration

2003-12-03 Thread Rod Taylor
 shared_buffers = 128# min max_connections*2 or 16, 8KB each

Try 1500.

 sort_mem = 65535# min 64, size in KB

I'd pull this in. You only have 640MB ram, which means about 8 large
sorts to swap.

How about 16000?

 fsync = false

I presume you understand the risks involved with this setting and
dataloss.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Sean Shanny
To all,

We are building a data warehouse composed of essentially click stream 
data.  The DB is growing fairly quickly as to be expected, currently at 
90GB for one months data.  The idea is to keep 6 months detailed data on 
line and then start aggregating older data to summary tables.  We have 2 
fact tables currently, one with about 68 million rows and the other with 
about 210 million rows.  Numerous dimension tables ranging from a dozen 
rows to millions.

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec 
PERC3/Di,  configuration.  I believe they are 10k drives.  Files system 
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem 
turned on.  This box is used only for the warehouse.  All the ETL work 
is done on this machine as well.  DB version is postgreSQL 7.4.

We are running into issues with IO saturation obviously.  Since this 
thing is only going to get bigger we are looking for some advice on how 
to accommodate DB's of this size.

First question is do we gain anything by moving the RH Enterprise 
version of Linux in terms of performance, mainly in the IO realm as we 
are not CPU bound at all?  Second and more radical, has anyone run 
postgreSQL on the new Apple G5 with an XRaid system?  This seems like a 
great value combination.  Fast CPU, wide bus, Fibre Channel IO, 2.5TB 
all for ~17k.

I keep see references to terabyte postgreSQL installations, I was 
wondering if anyone on this list is in charge of one of those and can 
offer some advice on how to position ourselves hardware wise.

Thanks.

--sean

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] sequence overhead

2003-12-03 Thread Robert Treat
Just wondering if anyone has done any testing on the amount of overhead
for insert you might gain by adding a serial column to a table. I'm 
thinking of adding a few to some tables that get an average of 30 - 40
inserts per second, sometimes bursting over 100 inserts per second and
wondering if there will be any noticeable impact. 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Fred Moyer
 We are running into issues with IO saturation obviously.  Since this
 thing is only going to get bigger we are looking for some advice on 
 how to accommodate DB's of this size.
snip
 Second and more radical, has anyone run 
 postgreSQL on the new Apple G5 with an XRaid system?  This seems like 
 a great value combination.  Fast CPU, wide bus, Fibre Channel IO, 
 2.5TB all for ~17k.
snip
If you are going for I/O performance you are best off with one of the
Xserve competitors listed at http://www.apple.com/xserve/raid/.  The
Xserve is based on IDE drives which have a lower seek time (say 8.9 ms)
compared to scsi (3.6 ms for seagate cheetah).  For small random
read/write operations (like databases) this will give you a noticable
improvement in performance over ide drives.  Also make sure to get as
many drives as possible, more spindles equals better I/O performance.

 I keep see references to terabyte postgreSQL installations, I was
 wondering if anyone on this list is in charge of one of those and can 
 offer some advice on how to position ourselves hardware wise.

I've gone to about half terabyte size and all I can say is you should
plan for at least one quarter to one half a rack of drivespace (assuming
14 drives per 4u that's 42 to 84 drives).  Do yourself a favor and get
more rather than less, you will really appreciate it.  I averaged about
2 mb/s average per drive via the raid controller stats on 14 drive array
during I/O bound seek and update operations in 2 raid 10 arrays (half
xlogs and half data).  That comes out to around 2 hours for a terabyte
with 70 drives assuming a constant scaling.  You may be able to get more
or less depending on your setup and query workload.

 Thanks.

 --sean


 ---(end of 
 broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] A question on the query planner

2003-12-03 Thread Bruce Momjian
Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  Define no longer works well.
 
  Well it seems to completely bar the use of a straight merge join between two
  index scans:
 
 Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
 there's something busted here.  Will get back to you ...

LOL, but I am not sure why.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Has anyone run on the new G5 yet

2003-12-03 Thread Gaetano Mendola
Sean Shanny wrote:

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in 
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec 
PERC3/Di,  configuration.  I believe they are 10k drives.  Files system 
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem 
turned on.  This box is used only for the warehouse.  All the ETL work 
is done on this machine as well.  DB version is postgreSQL 7.4.
Are you experiencing improvment using the hyper-threading ?

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster