0.1billion is pages not urls, sorry for that should be 4TB 0.1 billion pages
On 10/6/09, Gaurang Patel <gaurangtpa...@gmail.com> wrote: > Hey Jack, > > *One concern:* > > I am not sure where can I get 0.1 billion page urls? I am using DMOZ Open > Directory(which has around 3M urls) to inject the crawldb. > > Please help. > > Regards, > Gaurang > > 2009/10/4 Jack Yu <jackyu...@gmail.com> > >> 0.1 billion pages for 1.5TB >> >> >> On 10/5/09, Gaurang Patel <gaurangtpa...@gmail.com> wrote: >> > All- >> > >> > I am novice to using Nutch. Can anyone tell me the estimated size in (I >> > suppose, in TBs) that will be required to store the crawled results for >> > whole web? I want to get estimate of the memory requirements for my >> project, >> > that uses Nutch web crawler. >> > >> > >> > >> > Regards, >> > Gaurang Patel >> > >> >