Re: [PERFORM] Reading data in bulk - help?
On Wed, 10 Sep 2003, Chris Huston wrote: Sometime before then I need to figure out how to cut the fetch times from the now 200ms to something like 10ms. You didn't say anything about Joshs first point of adjusting postgresql.conf to match your machine. Settings like effective_cache_size you almost always want to increase from the default setting, also shared memory. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Reading data in bulk - help?
1) Memory - clumsily adjusted shared_buffer - tried three values: 64, 128, 256 with no discernible change in performance. Also adjusted, clumsily, effective_cache_size to 1000, 2000, 4000 - with no discernible change in performance. I looked at the Admin manual and googled around for how to set these values and I confess I'm clueless here. I have no idea how many kernel disk page buffers are used nor do I understand what the shared memory buffers are used for (although the postgresql.conf file hints that it's for communication between multiple connections). Any advice or pointers to articles/docs is appreciated. The standard procedure is 1/4 of your memory for shared_buffers. Easiest way to calculate would be ###MB / 32 * 1000. E.g. if you have 256MB of memory, your shared_buffers should be 256 / 32 * 1000 = 8000. The remaining memory you have leftover should be marked as OS cache via the effective_cache_size setting. I usually just multiply the shared_buffers value by 3 on systems with a lot of memory. With less memory, OS/Postgres/etc takes up a larger percentage of memory so values of 2 or 2.5 would be more accurate. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Reading data in bulk - help?
Chris, 1) Memory - clumsily adjusted shared_buffer - tried three values: 64, 128, 256 with no discernible change in performance. Also adjusted, clumsily, effective_cache_size to 1000, 2000, 4000 - with no discernible change in performance. I looked at the Admin manual and googled around for how to set these values and I confess I'm clueless here. I have no idea how many kernel disk page buffers are used nor do I understand what the shared memory buffers are used for (although the postgresql.conf file hints that it's for communication between multiple connections). Any advice or pointers to articles/docs is appreciated. You want values *much* higher than that. How much RAM do you have? See: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html For example, if you have 512mb RAM, I'd crank up the shared buffers to 8000. the sort_mem to 8mb, and the effective_cache_size to 24,000. 3) RAID - haven't tried it - but I'm guessing that the speed improvement from a RAID 5 may be on the order of 10x Probably not ... more like 1.5x - 2.0x, but that's still a significant help, yes? Also, the advantage will get better the more your data grows. - which I can likely get from using something like HDF. HDF sucks for I/O speed.XServe will become a much more significant option in the market when Apple can bring themselves to abandon HDF, and adopt XFS or something. This is part of your problem. Since the data is unlikely to grow beyond 10-20gig, a fast drive and firewire ought to give me the performance I need. Not sure about that. Is Firewire really faster for I/O than modern SCSI or 233mhz ATA?I don't do much Mac anymore, but I'd the impression that Firewire was mainly for peripherals What is important for your app in terms of speed is to get the data coming from multiple drives over multiple channels. Were it a PC, I'd recommend a motherboard with 4 IDE channels or Serial ATA, and spreading the data over 4 drives via RAID 0 or RAID 5, and adding dual processors. Then you could use multiple postgres connections to read different parts of the table simultaneously. I know experimentally that the current machine can sustain a 20MB/s transfer rate which is 20-30x the speed of these queries. That is interesting. Adjust your PostgreSQL.conf and see what results you get. It's possible that PostgreSQL is convinced that you have little or no RAM because of your .conf settings, and is swapping stuff to temp file on disk. 4) I'd previously commented out the output/writing steps from the app - to isolate read performance. OK. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Reading data in bulk - help?
Chris, I've got an application that needs to chunk through ~2GB of data. The data is ~7000 different sets of 300 records each. I put all of the data into a postgres database but that doesn't look like its going to work because of how the data lives on the disk. Your problem is curable through 4 steps: 1) adjust your postgresql.conf to appropriate levels for memory usage. 2) if those sets of 300 are blocks in some contiguous order, then cluster them to force their physical ordering on disk to be the same order you want to read them in. This will require you to re-cluster whenever you change a significant number of records, but from the sound of it that happens in batches. 3) Get better disks, preferrably a RAID array, or just very fast scsi if the database is small.If you're budget-constrained, Linux software raid (or BSD raid) on IDE disks is cheap. What kind of RAID depends on what else you'll be doing with the app; RAID 5 is better for read-only access, RAID 1+0 is better for read-write. 4) Make sure that you aren't dumping the data to the same disk postgreSQL lives on! Preferably, make sure that your swap partition is on a different disk/array from postgresql. If the computing app is complex and requires disk reads aside from postgres data, you should make sure that it lives on yet another disk. Or you can simplify this with a good, really large multi-channel RAID array. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly