Hi Andrzej,

nutch-site.xml says:
<name>db.default.fetch.interval</name>
<value>15</value>

I tried readdb -dump.
I am not an expert in dump output but to me it seems that db is not updated.
I have two dump output (pre and post) and diffing then I found the
following differencies:
1) Some score values were changed.
2) Only one fetch time for one document has been changed but that is
not any of that three fatched pages...

I also checked these three pages and they are still unfetched.

Wow that seems very strange...
Any idea?

Regards,
Lukas

On 5/16/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Lukas Vlcek wrote:
> Hi,
>
> I am using nutch0.8-dev. I have a small shell script for
> generate/fetch/update cycle. I used generate command with -topN 500.
> After crawling about 2000 pages I changed -topN to 3 (yes three pages
> only) to see what pages are crawled.
>
> I found that generate/fetch/update cycles are always crawling the same
> three pages!
> I would expect that it should crawl different pages in every cycle
> (and we have more then 3 pages on intranet and I am sure I injected
> enough link food).
>
> Can anybody tell me what am I doing wrong?

This indeed sounds strange - looks like their information is not being
updated in the db. What was the fetch interval for these pages? Could
you run a readdb -dump before and after updatedb?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com





-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to