Actually at mozdex we have consolidated a bit and we are rebuilding under the latest release. For 50 million urls a 200 gig disk is all you need. That leaves you enough room for your segmetns, db and the space needed to process (about double your db size)
The biggest boost you can give your query servers is tons of memory. SATA 150 or Scsi drives at 10krpm is also a bonus. We have finished migrating to entirely Athlon 64's and i'll be posting our build on the site and wiki -byron -----Original Message----- From: Philippe LE NAOUR <[EMAIL PROTECTED]> To: [email protected] Date: Fri, 20 May 2005 18:27:53 +0200 Subject: Hardware requirements and some other questions about Nutch > Hi, > > I'm new to this list. > > I have some questions about Nutch to see if it suits my needs. > > First of all, I have a database that contains 50 000 URLs classified by > categories and sub-categories, I wish to fully crawl the 50 000 sites > behind those URLs. No problem I can provide the urls to nutch. > > I want to use the categories informations in searches to restrict > results, for example a user can search all sites that contains cat in > pet category. Is it possible with Nutch ? I've seen that I can add > plugins, perhaps is it possible with plugins ? > > > Second part: hardware requirements. > > Lets say that each website have a maximum of 1000 pages, I must store > the index for 50 000 000 pages. How many disk storage do I need ? > I've seen that Mozdex works with 10 servers for 100 000 000 pages but I > don't know how many requests it serves. Is there something to do to > reduce the number of servers ? > > Thanks for your replies. > > PS: sorry for my very bad english. >
