Hi i can share my times with you. I'm fetching 500 000 pages in each run. generating 5 hrs fetching 8 hrs parsing 2.5 hrs updating 3.5 hrs i have about 30 million urls in db right now and those times are for a cluster of 3 machines. So yes it takes a lot of time. I think that using native hadoop libraries could speed it up a bit, but unfortunately i can't get them to work on debian. I will switch the cluster to fedora or some other linux that is supported and check then, but it'll probably be next week. I hope you do realize that you will need huge storage for storing segments for 100 million pages. Also i'd suggest running nightly build of nutch as it has hadoop 0.15 which is much more stable for me.
-- Karol Rybak Programista / Programmer Sekcja aplikacji / Applications section Wyższa Szkoła Informatyki i Zarządzania / University of Internet Technology and Management +48(17)8661277
