Lukas Vlcek wrote:
Hi Andrzej,

nutch-site.xml says:
<name>db.default.fetch.interval</name>
<value>15</value>

I tried readdb -dump.
I am not an expert in dump output but to me it seems that db is not updated.
I have two dump output (pre and post) and diffing then I found the
following differencies:
1) Some score values were changed.
2) Only one fetch time for one document has been changed but that is
not any of that three fatched pages...

I also checked these three pages and they are still unfetched.

Wow that seems very strange...
Any idea?

Ok, this could indicate some bugs in either Generate or CrawlDbReducer (both of which has been recently changes in a couple places). Could you please do the following:

* prepare a fragment of the crawldb dump with the data about these three pages.

* generate, so that you get these three pages in the fetchlist (easy to check with segread).

* fetch

* prepare a fragment of the segment dump (segread -dump) with the data about these pages

* run updatedb

* prepare a fragment of the crawldb dump after updating

And then package this data nicely and send them to me. Thanks!

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to