>
>1. "Souped-up" DB server - Dual CPU, 4 GB Ram (min) RAID 5 or 10, 1-2 
>NICS
>  
>
This is the 'fetcher' server?

This is you fetch/crawler/indexer -- create the final segments here, then
move them to the search server. That way if a search server goes down,
simply move the segment to another server.

>2. Basic Search Servers - Single/Dual CPU, Maximum RAM, Single IDE/SATA 
>drive (or 2 for redundancy)
>  
>
These are the 'fetched' segment backup and search servers?
If I have 10 Million pages / server, this is good thing: 2kbyte * 10 = 20
GByte RAM? Or there is enought 10 GByte, and later put more if it need?

Actually, you'll want 20GB ram if you're trying to displace MSN as the
fastest search engine. Believe it or not, Lucene is EXTREMELY fast even when
reading from disk (whose the genius who wrote that software?). I would keep
about 4-8MM/pages per server and give about 1GB per million. Let the Linux
file caching system do it's magic. After the first 20-30 searches, things
should be pretty fast. Take a look at filangy.com - search is pretty fast
and we're hitting the disk. The only drawback is that from disk we see
things starting to slow down if more that 5-6 searches happen
simultaneously. That's 5-6 per second -- and we usually improve by adding
another server. Given that 1GB stick are much cheper than 2GB sticks, oyu'll
find adding another cheap server is cheaper that adding more RAM. And the2GB
sticks are suported is more high-end server -- so cheap hardware cannot be
user anymore.
  

>3. Basic Web Servers - Single/Dual CPU, Medium RAM
>  
>
In this boxs I will put 1-2 GByte RAM.
I would like put frontend Apache2 and mod_jk2, this is bottleneck, or in
this way I will tunning somethings: static images, web pages etc. 
caching? Or better way Tomcats directly to the WEB?


Go with tomcat straight for now -- you don't want the search pages to take
the Apache/mod_jk2 hit everytime. Later you can split up the static pages in
a separate site that can be on apache. For loading images, make a separate
url image.domain.com and load those from there.




Reply via email to