I guess that is not enough. Here is an du -hs of an 500K Pages Segment 2.7G ./urlsegments/20060805144644/content 2.8G ./urlsegments/20060805144644/parse_data 993M ./urlsegments/20060805144644/parse_text 1.7G ./urlsegments/20060805144644/index 94M ./urlsegments/20060805144644/fetcher 71M ./urlsegments/20060805144644/fetchlist 8.2G ./urlsegments/20060805144644
So I need about 16Gig for one Million pages + additional 1.7 Gig if I need a merge index. By the way, could I delete the fetcher and fetchlist directory or is it need anyway? Matthias > > You will also need more than 1 terabyte to get to 100 million pages. A > good rule of thumb is 2 gigs * replication factor for every 1 million pages. > > Dennis > > Dan Morrill wrote: > > Hi, > > > > I found that with a 3 meg DSL line I was averaging 8 pages per second with a > > similar set up, to reach 100 million pages would take about 144 days. > > > > 100,000,000 / 8 pages per second / 60 seconds per minute / 60 minutes per > > hour / 24 hours in a day. > > > > Just a FYI rule of thumb on a qwest DSL line with no metering. > > > > r/d > > > > -----Original Message----- > > From: Bui Quang Hung [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, August 23, 2006 4:50 AM > > To: [email protected] > > Subject: How long to get 100 million page > > > > > > > > Hi, > > I am planning to create an index of 100 million pages by using a back-end > > machine which includes a single-processor box with 1 gigabyte of RAM, 1 > > terabyte hard disk. Can you teach me that how long it will take? > > Thank you in advance. > > Regards, > > B.Q. Hung > > > > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
