Hi Gal:

Yes, I set db.max.per.host == 1.

Another interesting thing I found is that when I
dubugging print out page information in
FetchListTool.java to go generation, I check the log,
found "...Next fetch: Fri Apr 14 19:49:3...". This
webdb I generate in April 9 and refetching interval is
set to 1 day. 

Should "Next fetch" date around Aril 10th?

Why this happens?

thanks,

Michael,

--- Gal Nitzan <[EMAIL PROTECTED]> wrote:

> 
> What about db.max.per.host? is it set to -1 ?
> 
> 
> -----Original Message-----
> From: Michael Ji [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 10, 2006 3:18 AM
> To: [email protected]
> Subject: refetching interval
> 
> hi there,
> 
> I have webdb with over 60,000 pages (using
> nutch/admin
> dumptxt command) and refetching interval is set as 1
> day
> 
> <property>
>   <name>db.default.fetch.interval</name>
>   <value>1</value>
>   <description>The default number of days between
> re-fetches of a page.
>   </description>
> </property>
> 
> But, when I do crawling based on this webdb next
> day,
> the generate log only showing that around 8,000
> pages
> being generated for fetching and actually 7,500
> pages
> being fetched down.
> 
> Any reason why it behaves like that? Should 60,000
> pages being fetching this time?
> 
> thanks,
> 
> Michael,
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to