Re: [PERFORM] Reading data in bulk - help?

Chris Huston Wed, 10 Sep 2003 14:03:56 -0700

On Wednesday, Sep 10, 2003, at 11:16 America/Denver, Josh Berkus wrote:

What about explaining your disk setup? Or mentioning postgresql.conf? For somebody who wants help, you're ignoring a lot of advice and questions.

Personally, I'm not going to be of any further help until you report back on the other 3 of 4 options.

EEEK! Peace. Sorry I didn't include that info in the response.

1) Memory - clumsily adjusted shared_buffer - tried three values: 64, 128, 256 with no discernible change in performance. Also adjusted, clumsily, effective_cache_size to 1000, 2000, 4000 - with no discernible change in performance. I looked at the Admin manual and googled around for how to set these values and I confess I'm clueless here. I have no idea how many kernel disk page buffers are used nor do I understand what the "shared memory buffers" are used for (although the postgresql.conf file hints that it's for communication between multiple connections). Any advice or pointers to articles/docs is appreciated.

2) Clustering - tried it - definite improvement - thanks for the tip

3) RAID - haven't tried it - but I'm guessing that the speed improvement from a RAID 5 may be on the order of 10x - which I can likely get from using something like HDF. Since the data is unlikely to grow beyond 10-20gig, a fast drive and firewire ought to give me the performance I need. I know experimentally that the current machine can sustain a 20MB/s transfer rate which is 20-30x the speed of these queries. (If there's any concern about my enthusiasm for postgres - no worries - I've been very happy with it on several projects - it might not be the right tool for this kind of job - but I haven't come to that conclusion yet.)

4) I'd previously commented out the output/writing steps from the app - to isolate read performance.

On Wednesday, Sep 10, 2003, at 05:47 America/Denver, Magnus Naeslund(w) wrote:

How are you fetching the data? If you are using cursors, be sure to fetch a substatial bit at a time so that youre not punished by latency. I got a big speedup when i changed my original clueless code to fetch 64 rows in a go instead of only one.

That's an excellent question... I hadn't thought about it. I'm using a JDBC connection... I have no idea (yet) how the results are moving between postgres and the client app. I'm testing once with the app and the DB on the same machine (to remove network latency) and once with db/app on separate machines. However, I wonder if postgres is blocking on network io (even if it's the loopback interface) and not on disk?!

I'll definitely look into it. Maybe I'll try a loop in psql and see what the performance looks like. Thanks Magnus.

On Wednesday, Sep 10, 2003, at 07:05 America/Denver, Sean McCorkle wrote:

    I ended up solving the problem by going "retro" and using the
    quasi-database functions of unix and flat files:  grep, sort,
    uniq and awk.

That's an cool KISS approach. If I end up moving out of postgres I'll speed test this approach against HDF. Thanks.


This is a very helpful list,
- Chris


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [PERFORM] Reading data in bulk - help?

Reply via email to