[PERFORM] How to handle a large DB and simultaneous accesses?

2006-01-10 Thread Charles A. Landemaine
Hello,

I have to develop a companies search engine (looks like the Yellow
pages). We're using PostgreSQL at the company, and the initial DB is
2GB large, as it
has companies from the entire world, with a fair amount of information.

What reading do you suggest so that we can develop the search engine
core, in order that the result pages show up instantly, no matter the
heavy load and
the DB size. The DB is 2GB but should grow to up to 10GB in 2 years,
and there should be 250,000 unique visitors per month by the end of
the year.

Are there special techniques? Maybe there's a way to sort of cache
search results? We're using PHP5 + phpAccelerator.
Thanks,

--
Charles A. Landemaine.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to handle a large DB and simultaneous accesses?

2006-01-10 Thread David Lang

On Tue, 10 Jan 2006, Charles A. Landemaine wrote:


Hello,

I have to develop a companies search engine (looks like the Yellow
pages). We're using PostgreSQL at the company, and the initial DB is
2GB large, as it
has companies from the entire world, with a fair amount of information.

What reading do you suggest so that we can develop the search engine
core, in order that the result pages show up instantly, no matter the
heavy load and
the DB size. The DB is 2GB but should grow to up to 10GB in 2 years,
and there should be 250,000 unique visitors per month by the end of
the year.

Are there special techniques? Maybe there's a way to sort of cache
search results? We're using PHP5 + phpAccelerator.
Thanks,


frankly that is a small enough chunk of data compared to available memory 
sizes that I think your best bet is to plan to have enough ram that you 
only do disk I/O to write and on boot.


a dual socket Opteron system can hold 16G with 2G memory modules (32G as 
4G modules become readily available over the next couple of years). this 
should be enough to keep your data and indexes in ram at all times. if you 
find that other system processes push the data out of ram consider loading 
the data from disk to a ramfs filesystem, just make sure you don't update 
the ram-only copy (or if you do that you have replication setup to 
replicate from the ram copy to a copy on real disks somewhere). depending 
on your load you could go with single core or dual core chips (and the 
cpu's are a small enough cost compared to this much ram that you may as 
well go with the dual core cpu's)


now even with your data in ram you can slow down if your queries, indexes, 
and other settings are wrong, but if performance is important you should 
be able to essentially eliminate disks for databases of this size.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq