Dmoz isn't big (not small either). 5-10% of my target sites are under dmoz. Dmoz is a nice starting point for large crawls.
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 31, 2005 5:13 PM To: [EMAIL PROTECTED] Subject: Re: [Nutch-general] DMOZ Web coverage I imagine the only people who can answer this question are those who have crawled laaaaarge portion of the Web (i.e. Google, Yahoo, Teoma...), and I don't think they'll care to share :( Otis --- Chetan Sahasrabudhe <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to figure out how much web coverage is achievable > through dmoz file ? > In case I want to crawl whole web how much time would it take and > what shall be the approach for the same. > > Parameters I am interested in are, > > 1. Size of whole web index. > 2. Time for generating whole web index. > 3. How much web coverage does dmoz file provides. > > Regards > Chetan > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ > Nutch-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-general > ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
