On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > I just put together a system with 6GB of ram on a 14 disk raid 10 array.
> > When I run my usual big painful queries, I get very little to know
> > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
> > most of the time. the new devel box sits at around 250MB.
> >
> > I've switched to an 8.0 system on the new devel box, but the .conf
> > really didn't change. Index usage is the same. Something seems wrong and
> > I'm not sure why.
> >
> How big is your actual database on disk? And how much of it is actually
> touched by your queries?
The DB is about 60GB. About 10GB is actually used in real queries,
versus get me this single record with this ID. I have a large query that
finds court cases based on certain criteria that is name based. I get a
full seq scan on the name table in about 7 seconds, This table has about
6 million names (most being 'smith, something'). The index scan takes
much less time of course, once it's been cached (somewhere but not
apparently memory). The really query can take 60 seconds on a first run.
And 1.3 seconds on a second run. I'm very happy with the cached results,
just not really sure where that caching is happening since it doesn't
show up as memory usage. I do know that the caching that happens seems
to be independent of the DB. I can restart the DB and my speeds are
still the same as the cached second query. Is there some way to
pre-cache some of the tables/files on the file system? If I switch my
query to search for 'jones%' instead of 'smith%', I take a hit. But if I
then rerun the smith search, I still get cached speed. I only have two
tables essentially names and events that have to do any real work ie.
not very atomic data. I'd love to be able to force these two tables into
a cache somewhere. This is a linux system (RHEL ES4) by the way. 
> It seems that your tough queries might only be exercising a portion of
> the database. If you really want to make memory usage increase try
> something like:
> find . -type f -print0 | xargs -0 cat >/dev/null
> Which should read all the files. After doing that, does the memory usage
> increase?
> >
> > any thoughts,
> > -jj-
> >
> >
> > shared_buffers = 32768          # min 16, at least max_connections*2, 8KB 
> > each
> > work_mem = 2097151              # min 64, size in KB
> This seems awfully high. 2GB Per sort? This might actually be flushing
> some of your ram, since it would get allocated and filled, and then
> freed when finished. Remember, depending on what you are doing, this
> amount can get allocated more than once per query.
What's a good way to determine the optimal size?

> > maintenance_work_mem = 819200   # min 1024, size in KB
> > max_fsm_pages = 80000           # min max_fsm_relations*16, 6 bytes each
> > checkpoint_segments = 30        # in logfile segments, min 1, 16MB each
> > effective_cache_size = 3600000   <-----this is a little out of control, but 
> > would it have any real effect?
> It should just tell the planner that it is more likely to have buffers
> in cache, so index scans are slightly cheaper than they would otherwise be.
> > random_page_cost = 2            # units are one sequential page fetch cost
> > log_min_duration_statement = 10000 # -1 is disabled, in milliseconds.
> > lc_messages = 'C'               # locale for system error message strings
> > lc_monetary = 'C'               # locale for monetary formatting
> > lc_numeric = 'C'                # locale for number formatting
> > lc_time = 'C'                   # locale for time formatting
> >
> John
> =:->
"Now this is a totally brain damaged algorithm.  Gag me with a
                -- P. Buhr, Computer Science 354

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not

Reply via email to