hi there, I have webdb with over 60,000 pages (using nutch/admin dumptxt command) and refetching interval is set as 1 day
<property> <name>db.default.fetch.interval</name> <value>1</value> <description>The default number of days between re-fetches of a page. </description> </property> But, when I do crawling based on this webdb next day, the generate log only showing that around 8,000 pages being generated for fetching and actually 7,500 pages being fetched down. Any reason why it behaves like that? Should 60,000 pages being fetching this time? thanks, Michael, __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
