On Thu, Sep 01, 2005 at 06:05:43PM -0400, Ron wrote:
> > Selection from the database is, hence the indexes.
> A DB _without_ indexes that fits into RAM during ordinary operation
> may actually be faster than a DB _with_ indexes that does
> not. Fitting the entire DB into RAM during ordinary operation if at
> all possible should be the first priority with a small data mine-like
> application such as you've described.
That makes sense.
> Also normalization is _not_ always a good thing for data mining like
> apps. Having most or everything you need in one place in a compact
> and regular format is usually more effective for data mines than "Nth
> Order Normal Form" optimization to the degree usually found in
> textbooks using OLTP-like examples.
> >Ok, I did try 8.0 when I started this and found that the server bind
> >parameters (both via DBD::Pg (with pg_prepare_server => 1) and via JDBC
> >(various versions I tried)) failed - the parameters were clearly not
> >being substituted. This was Postgresql 8.0 from Debian unstable. That
> >was a couple of weeks ago and I've not been back to check whether its
> >been fixed. Anyway, because of these problems I dropped back to 7.4.
> Since I assume you are not going to run anything with the string
> "unstable" in its name in production (?!), why not try a decent
> production ready distro like SUSE 9.x and see how pg 8.0.3 runs on a
> OS more representative of what you are likely (or at least what is
> safe...) to run in production?
Well, you see, as ever, it's a bit complicated. The company I'm doing
the development for has been subcontracted to do it and the contractor was
contracted by the actual "client". So there are two companies involved
in addition to the "client". Sadly, the "client" actually has dictated
things like "it will be deployed on FreeBSD and thou shall not argue".
At this point in time, I actually have very little information about the
specification of the boxen that'll be running this application. This is
something I'm hoping to solve very soon. The worst part of it is that
I'm not going have direct (ssh) access to the box and all configuration
changes will most likely have to be relayed through techies at the
"client" so fine tuning this is going to be a veritable nightmare.
> >> > I need to get to the stage where I can run queries such as:
> >> > select street, locality_1, locality_2, city from address
> >> > where (city = 'Nottingham' or locality_2 = 'Nottingham'
> >> > or locality_1 = 'Nottingham')
> >> > and upper(substring(street from 1 for 1)) = 'A'
> >> > group by street, locality_1, locality_2, city
> >> > order by street
> >> > limit 20 offset 0
> >> This might be a lot quicker than pulling all the records like in
> >your example
> >> queries...
> >Yes, that certainly does seem to be the case - around 4 seconds. But I
> >need it to be 10 times faster (or thereabouts) otherwise I have big
> *beats drum* Get it in RAM, Get it in RAM, ...
Ok, but I currently have 2 million rows. When this launches in a couple
of weeks, it'll launch with 5 million+ and then gain > a million a year.
I think the upshot of this all is 4GB RAM as a minimum and judicious use
of normalization so as to avoid more expensive string comparisons and
reduce table size is my immediate plan (along with proper configuration
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly