Re: [PERFORM] Mostly read performance (2 replies)

Jeffrey Tenny Fri, 12 Aug 2005 15:57:11 -0700

(Pardon my replying two two replies at once, I only get the digest andthis was easier).


Michael Stone wrote:
[...]

Well, that's what you'd expect.  But a first time 70MB fetch on a freshly 
rebooted system took just as long as all secondary times.  (Took over a minute 
to fetch, which is too long for my needs, at least on secondary attempts).



If the query involves a table scan and the data set is larger than your
available memory, you'll need a full scan every time. If you do a table
scan and the table fits in RAM, subsequent runs should be faster. If you
have an index and only need to look at a subset of the table, subsequent
runs should be faster. Without knowing more about your queries it's not
clear what your situation is.

I must amend my original statement. I'm not using a parameterizedstatement. The system is effectively fetching file content stored inthe database for portions of one or more files. It attempts to batchthe records being fetched into as few non-parameterized queries aspossible, while balancing the rowset retrieval memory impact.

Currently that means it will request up to 16K records in a query thatis assembled using a combination of IN (recids...) , BETWEEN ranges, and

UNION ALL for multiple file IDs.  I do this to minimize the latency of

dbclient/dbserver requests, while at the same time capping the maximumdata returned by a DBIO to about 1.2MB per maximum retrieved record set.

(I'm trying not to pound the java app server via jdbc memory usage).
There's an ORDER BY on the file id column too.

It sounds like a simple enough thing to do, but this "pieces of manyfiles in a database" problem is actually pretty hard to optimize.Fetching all records for all files, even though I don't need allrecords, is both inefficient and likely to use too much memory.Fetching 1 file at a time is likely to result in too many queries(latency overhead). So right now I err on the side of large but recordlimited queries. That let's me process many files in one query, unlessthe pieces of the files I need are substantial.(I've been burned by trying to use setFetchSize so many times it isn'tfunny, I never count on that any more).

An index is in place to assist with record selection, I'll double checkthat it's being used. It's a joint index on file-id andrecord-id-within-the-file. I'll check to be sure it's being used.


------------------------


Greg Stark wrote:
[...]

What is your shared_buffers setting? Perhaps you have it set way too high or
way too low?

I generally run with the conservative installation default. I did someexperimenting with larger values but didn't see any improvement (andyes, I restarted postmaster). This testing was done a while ago, Idon't have the numbers in memory any more so I can't tell you what theywere.


Also, you probably should post the "explain analyze" output of the actual
query you're trying to optimize. Even if you're not looking for a better plan
having hard numbers is better than guessing.


A good suggestion.  I'll look into it.

And the best way to tell if the data is cached is having a "vmstat 1" running
in another window. Start the query and look at the bi/bo columns. If you see
bi spike upwards then it's reading from disk.


Another good suggestion.

I'll look into getting further data from the above suggestions.

I'm also looking into getting a gig or two of ram to make sure thatisn't an issue.

The basis of my post originally was to make sure that, all things beingequal, there's no reason those disk I/Os on behalf of the databaseshouldn't be cached by the operating/file system so that repeated readsmight benefit from in-memory data.



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [PERFORM] Mostly read performance (2 replies)

Reply via email to