Re: [PERFORM] Reading data in bulk - help?

2003-09-10 Thread Dennis Bjorklund
On Wed, 10 Sep 2003, Chris Huston wrote:

 Sometime before then I need to figure out how to cut the fetch times 
 from the now 200ms to something like 10ms.

You didn't say anything about Joshs first point of adjusting
postgresql.conf to match your machine. Settings like effective_cache_size
you almost always want to increase from the default setting, also shared 
memory.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Reading data in bulk - help?

2003-09-10 Thread William Yu
1) Memory - clumsily adjusted shared_buffer - tried three values: 64, 
128, 256 with no discernible change in performance. Also adjusted, 
clumsily, effective_cache_size to 1000, 2000, 4000 - with no discernible 
change in performance. I looked at the Admin manual and googled around 
for how to set these values and I confess I'm clueless here. I have no 
idea how many kernel disk page buffers are used nor do I understand what 
the shared memory buffers are used for (although the postgresql.conf 
file hints that it's for communication between multiple connections). 
Any advice or pointers to articles/docs is appreciated.
The standard procedure is 1/4 of your memory for shared_buffers. Easiest 
way to calculate would be ###MB / 32 * 1000. E.g. if you have 256MB of 
memory, your shared_buffers should be 256 / 32 * 1000 = 8000.

The remaining memory you have leftover should be marked as OS cache 
via the effective_cache_size setting. I usually just multiply the 
shared_buffers value by 3 on systems with a lot of memory. With less 
memory, OS/Postgres/etc takes up a larger percentage of memory so values 
of 2 or 2.5 would be more accurate.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Reading data in bulk - help?

2003-09-10 Thread Josh Berkus
Chris,

 1) Memory - clumsily adjusted shared_buffer - tried three values: 64,
 128, 256 with no discernible change in performance. Also adjusted,
 clumsily, effective_cache_size to 1000, 2000, 4000 - with no
 discernible change in performance. I looked at the Admin manual and
 googled around for how to set these values and I confess I'm clueless
 here. I have no idea how many kernel disk page buffers are used nor do
 I understand what the shared memory buffers are used for (although
 the postgresql.conf file hints that it's for communication between
 multiple connections). Any advice or pointers to articles/docs is
 appreciated.

You want values *much* higher than that.   How much RAM do you have?  See:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

For example, if you have 512mb RAM, I'd crank up the shared buffers to 8000. 
the sort_mem to 8mb, and the effective_cache_size to 24,000.

 3) RAID - haven't tried it - but I'm guessing that the speed
 improvement from a RAID 5 may be on the order of 10x

Probably not ... more like 1.5x - 2.0x, but that's still a significant help, 
yes?  Also, the advantage will get better the more your data grows.

  - which I can
 likely get from using something like HDF. 

HDF sucks for I/O speed.XServe will become a much more significant option 
in the market when Apple can bring themselves to abandon HDF, and adopt XFS 
or something.  This is part of your problem.

 Since the data is unlikely to
 grow beyond 10-20gig, a fast drive and firewire ought to give me the
 performance I need.

Not sure about that.   Is Firewire really faster for I/O than modern SCSI or 
233mhz ATA?I don't do much Mac anymore, but I'd the impression that 
Firewire was mainly for peripherals   

What is important for your app in terms of speed is to get the data coming 
from multiple drives over multiple channels.   Were it a PC, I'd recommend a 
motherboard with 4 IDE channels or Serial ATA, and spreading the data over 4 
drives via RAID 0 or RAID 5, and adding dual processors.  Then you could use 
multiple postgres connections to read different parts of the table 
simultaneously.

  I know experimentally that the current machine can
 sustain a 20MB/s transfer rate which is 20-30x the speed of these
 queries.

That is interesting.   Adjust your PostgreSQL.conf and see what results you 
get.   It's possible that PostgreSQL is convinced that you have little or no 
RAM because of your .conf settings, and is swapping stuff to temp file on 
disk.

 4) I'd previously commented out the output/writing steps from the app -
 to isolate read performance.

OK.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Reading data in bulk - help?

2003-09-09 Thread Josh Berkus
Chris,

 I've got an application that needs to chunk through ~2GB of data. The 
 data is ~7000 different sets of 300 records each. I put all of the data 
 into a postgres database but that doesn't look like its going to work 
 because of how the data lives on the disk.

Your problem is curable through 4 steps:

1) adjust your postgresql.conf to appropriate levels for memory usage.

2) if those sets of 300 are blocks in some contiguous order, then cluster them 
to force their physical ordering on disk to be the same order you want to 
read them in.   This will require you to re-cluster whenever you change a 
significant number of records, but from the sound of it that happens in 
batches.

3) Get better disks, preferrably a RAID array, or just very fast scsi if the 
database is small.If you're budget-constrained, Linux software raid (or 
BSD raid) on IDE disks is cheap.  What kind of RAID depends on what else 
you'll be doing with the app; RAID 5 is better for read-only access, RAID 1+0 
is better for read-write.

4) Make sure that you aren't dumping the data to the same disk postgreSQL 
lives on!   Preferably, make sure that your swap partition is on a different 
disk/array from postgresql.   If the computing app is complex and requires 
disk reads aside from postgres data, you should make sure that it lives on 
yet another disk.   Or you can simplify this with a good, really large 
multi-channel RAID array.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly