I am not an expert on it, but I am doing something similar. So you got 100k pages, that is very few to nutch's standard. I think crawling will be the slow part, not because hardware, but because of that if you crawling fast then 1page/second per site, you may be blocked by some site. If you really want to update it everyday, this may be a problem.
the searching stuff is really fast, I was worried about it woo, but once I saw my AMD 1800+ pc(1G mem) can do a search less than 0.1 second, I didn't bother myself looking into this problem anymore. I saw someone on this list doing crawling/searching on a PIII with resealable speed. Regards Pan Tomislav Poljak wrote: > > I need help determining hardware specs for crawling 100 sites with 1000 > pages each. Regular re-crawl is needed probably every day (maybe even > more often). So will one server meet these crawling requirements (only > crawling, searching will be handled by other machine)? If so, what > hardware specification would be recommended (how much Ram, CPU's, hard > disk space)? > > Thanks, > Tomislav > > > -- View this message in context: http://www.nabble.com/help-with-hardware-requirements-tf4333859.html#a12381466 Sent from the Nutch - User mailing list archive at Nabble.com.
