At 06:22 PM 9/1/2005, Matthew Sackman wrote:
On Thu, Sep 01, 2005 at 06:05:43PM -0400, Ron wrote:
>
> Since I assume you are not going to run anything with the string
> "unstable" in its name in production (?!), why not try a decent
> production ready distro like SUSE 9.x and see how pg 8.0.3 runs on a
> OS more representative of what you are likely (or at least what is
> safe...) to run in production?
Well, you see, as ever, it's a bit complicated. The company I'm doing
the development for has been subcontracted to do it and the contractor was
contracted by the actual "client". So there are two companies involved
in addition to the "client". Sadly, the "client" actually has dictated
things like "it will be deployed on FreeBSD and thou shall not argue".
At least get them to promise they will use a release the BSD folks
mark "stable"!
At this point in time, I actually have very little information about the
specification of the boxen that'll be running this application. This is
something I'm hoping to solve very soon. The worst part of it is that
I'm not going have direct (ssh) access to the box and all configuration
changes will most likely have to be relayed through techies at the
"client" so fine tuning this is going to be a veritable nightmare.
IME, what you have actually just said is "It will not be possible to
safely fine tune the DB unless or until I have direct access; and/or
someone who does have direct access is correctly trained."
Ick.
> >> > I need to get to the stage where I can run queries such as:
> >> > select street, locality_1, locality_2, city from address
> >> > where (city = 'Nottingham' or locality_2 = 'Nottingham'
> >> > or locality_1 = 'Nottingham')
> >> > and upper(substring(street from 1 for 1)) = 'A'
> >> > group by street, locality_1, locality_2, city
> >> > order by street
> >> > limit 20 offset 0
> >>
> >> This might be a lot quicker than pulling all the records like in
> >your example
> >> queries...
> >
> >Yes, that certainly does seem to be the case - around 4 seconds. But I
> >need it to be 10 times faster (or thereabouts) otherwise I have big
> >problems!
>
> *beats drum* Get it in RAM, Get it in RAM, ...
Ok, but I currently have 2 million rows. When this launches in a couple
of weeks, it'll launch with 5 million+ and then gain > a million a year.
At my previously mentioned optimum of 85B per row, 2M rows is
170MB. 5M rows is 425MB. Assuming the gain of 1M rows per year,
that's +85MB per year for this table.
Up to 2GB DIMMs are currently standard, and 4GB DIMMs are just in the
process of being introduced. Mainboards with anything from 4 to 16
DIMM slots are widely available.
IOW, given the description you've provided this DB should _always_
fit in RAM. Size the production system such that the entire DB fits
into RAM during ordinary operation with an extra 1GB of RAM initially
tossed on as a safety measure and the client will be upgrading the HW
because it's obsolete before they run out of room in RAM.
I think the upshot of this all is 4GB RAM as a minimum and judicious use
of normalization so as to avoid more expensive string comparisons and
reduce table size is my immediate plan (along with proper configuration
of pg).
My suggestion is only slightly different. Reduce table size(s) and
up the RAM to the point where the whole DB fits comfortably in RAM.
You've got the rare opportunity to build a practical Memory Resident
Database. It should run like a banshee when you're done. I'd love
to see the benches on the final product.
Ron Peacetree
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings