A good analysis Even i was doing something in a similar manner
We should also have more people testing this and contributing so that we can commit this to nutch Rgds Prabhu On 3/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > D.Saravanaraj wrote: > > Hi Andrzej, > > > > Thanks for your Adaptice Reftech patch. I didn't get the working of > adaptive > > refetch well. I examined working of adaptive refetching, by reading the > > crawldb. I created a folder in windows with 2 files and tried adaptive > > refetching on that (URL is file:/D:/Test/). > > > > > > What i infer is, > > > > 1. For every refetch, the score of files (but not the directory) is > > increasing > > > > This is curious, it should not be so. However, it's the same in the > vanilla version of Nutch (without this patch), so we'll address this > separately. > > > > 2. Irrespective of the retry interval, the files will be fetched, > when > > their modified date is changed > > > > Or if the last fetch time (or their next fetch time) is greater than the > system-wide db.max.fetch.interval. This is to prevent pages being lost > when you phase out old segments. > > > > 3. Even though the directory modified date is not changed, since it's > > contents changed (as the last modified date of one of the files is > changed, > > which is indexed as the content of the directory), that directory is > > refetched > > > > Well, yes, in case of protocol-file plugin this seems ok, don't you agree? > > > > Please let me know if my inferences are correct and sorry for a bigger > mail. > > > > No problem with the size.Yes, your conclusions seem correct. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > >
