Hi all

Having spent some type researching and developing prototypes using the
nutch search engine, we're now in the process of planning a live
deployment.

The production server(s) would only be required to crawl a relatively
low number of html pages ~ 10,000.  They would however be required to
server a fairly large number of searches, up to ~ 10 searches per
second.
Could anyone give a rough idea of the kind of hardware we'll need for
this?  E.g. How much Ram, CPU's, hard disk space.  I'm thinking since
there aren't many pages it wont need that much RAM and hard disk space
but would need a decent CPU(s) to handle the load.
Would it be beneficial to use hadoop to distribute over two machines (or
more), or would it be easier to just get a seperate nutch installation
on each machine and load balance?

Thanks for any help

Regards
Aled



###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/
************************************************************************
This e-mail and any attachments are strictly confidential and intended solely 
for the addressee. They may contain information which is covered by legal, 
professional or other privilege. If you are not the intended addressee, you 
must not copy the e-mail or the attachments, or use them for any purpose or 
disclose their contents to any other person. To do so may be unlawful. If you 
have received this transmission in error, please notify us as soon as possible 
and delete the message and attachments from all places in your computer where 
they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is 
your responsibility to ensure that they are actually virus free.
 

Reply via email to