I guess that is not enough. Here is an du -hs of an 500K Pages Segment

2.7G    ./urlsegments/20060805144644/content
2.8G    ./urlsegments/20060805144644/parse_data
993M    ./urlsegments/20060805144644/parse_text
1.7G    ./urlsegments/20060805144644/index
94M     ./urlsegments/20060805144644/fetcher
71M     ./urlsegments/20060805144644/fetchlist
8.2G    ./urlsegments/20060805144644

So I need about 16Gig for one Million pages + additional 1.7 Gig if I need a
merge index.

By the way, could I delete the fetcher and fetchlist directory or is it need
anyway?

Matthias

> 
> You will also need more than 1 terabyte to get to 100 million pages.  A
> good rule of thumb is 2 gigs * replication factor for every 1 million
pages.
> 
> Dennis
> 
> Dan Morrill wrote:
> > Hi,
> >
> > I found that with a 3 meg DSL line I was averaging 8 pages per second
with a
> > similar set up, to reach 100 million pages would take about 144 days.
> >
> > 100,000,000 / 8 pages per second / 60 seconds per minute / 60 minutes
per
> > hour / 24 hours in a day.
> >
> > Just a FYI rule of thumb on a qwest DSL line with no metering.
> >
> > r/d
> >
> > -----Original Message-----
> > From: Bui Quang Hung [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 23, 2006 4:50 AM
> > To: [email protected]
> > Subject: How long to get 100 million page
> >
> >
> >
> > Hi,
> > I am planning to create an index of 100 million pages by using a
back-end
> > machine which includes a single-processor box with 1 gigabyte of RAM, 1
> > terabyte hard disk. Can you teach me that how long it will take?
> > Thank you in advance.
> > Regards,
> > B.Q. Hung
> >
> >
> >


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to