I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6. and I've spend two week to read the source code of nutch 0.6.
Now I want to build a bigger one. I want to crawl the pages from several website I specific. My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps I want to provide a search service about some specific domain. so i choose some big websites, and crawl them. so my question is : Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study? any advices would be greatly appreciated. -- Best regards, Heart mailto:[EMAIL PROTECTED]
