Hi Gal: Yes, I set db.max.per.host == 1.
Another interesting thing I found is that when I dubugging print out page information in FetchListTool.java to go generation, I check the log, found "...Next fetch: Fri Apr 14 19:49:3...". This webdb I generate in April 9 and refetching interval is set to 1 day. Should "Next fetch" date around Aril 10th? Why this happens? thanks, Michael, --- Gal Nitzan <[EMAIL PROTECTED]> wrote: > > What about db.max.per.host? is it set to -1 ? > > > -----Original Message----- > From: Michael Ji [mailto:[EMAIL PROTECTED] > Sent: Monday, April 10, 2006 3:18 AM > To: [email protected] > Subject: refetching interval > > hi there, > > I have webdb with over 60,000 pages (using > nutch/admin > dumptxt command) and refetching interval is set as 1 > day > > <property> > <name>db.default.fetch.interval</name> > <value>1</value> > <description>The default number of days between > re-fetches of a page. > </description> > </property> > > But, when I do crawling based on this webdb next > day, > the generate log only showing that around 8,000 > pages > being generated for fetching and actually 7,500 > pages > being fetched down. > > Any reason why it behaves like that? Should 60,000 > pages being fetching this time? > > thanks, > > Michael, > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
