A good analysis

Even i was doing something in a similar manner

We should also have more people testing this and contributing so that we can
commit this to nutch

Rgds
Prabhu


On 3/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> D.Saravanaraj wrote:
> > Hi Andrzej,
> >
> > Thanks for your Adaptice Reftech patch. I didn't get the working of
> adaptive
> > refetch well. I examined working of adaptive refetching, by reading the
> > crawldb. I created a folder in windows with 2 files and tried adaptive
> > refetching on that (URL is file:/D:/Test/).
> >
>
>
> > What i infer is,
> >
> >    1. For every refetch, the score of files (but not the directory) is
> >    increasing
> >
>
> This is curious, it should not be so. However, it's the same in the
> vanilla version of Nutch (without this patch), so we'll address this
> separately.
>
>
> >    2. Irrespective of the retry interval, the files will be fetched,
> when
> >    their modified date is changed
> >
>
> Or if the last fetch time (or their next fetch time) is greater than the
> system-wide db.max.fetch.interval. This is to prevent pages being lost
> when you phase out old segments.
>
>
> >    3. Even though the directory modified date is not changed, since it's
> >    contents changed (as the last modified date of one of the files is
> changed,
> >    which is indexed as the content of the directory), that directory is
> >    refetched
> >
>
> Well, yes, in case of protocol-file plugin this seems ok, don't you agree?
>
>
> > Please let me know if my inferences are correct and sorry for a bigger
> mail.
> >
>
> No problem with the size.Yes, your conclusions seem correct.
>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

Reply via email to