Doğacan Güney wrote:
Hi,
On 9/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Tim Gautier wrote:
It seems to me that there may be a bug in either updatedb or readdb
-stats. Can anyone help me out? I'm really hoping that I'm just
doing something wrong, but I can't figure out what it might be.
I'm trying to track down a similar issue in an existing installation,
which uses a snapshot of the trunk/ as of ~3 months ago, and where we
applied the patches in NUTCH-522 and NUTCH-547. Unfortunately, the
amount of diffs relative to the current trunk/ is several MB large ...
Could you perform the same test with the version before NUTCH-439 ? In
terms of date, this would be before 2007-08-21 12:50:07 +0200, and
before rev. 568053.
Andrzej, do you think that this is somehow related to a commit around
NUTCH-439? Looking at CHANGES.TXT I don't see anything there that can
trigger such a bug.
Right, I'm confused too. In my case, a very similar problem appeared
during fetching - before I applied the code from these two issues all
worked fine, after I applied the patches, both Fetcher and Fetcher2
started losing urls - i.e. they wouldn't fetch all urls from the
fetchlist, only about 1/10th, with no messages in the logs ... This of
course later on caused strange results during updatedb.
I would hazard a guess that this is related to adaptive crawl code.
There may still be a bug there that we are missing or one of the later
commits might have broken it.
Right. Let's keep digging ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com