Re: Adaptive fetch

2006-04-04 Thread D . Saravanaraj
Hi,

Is the patch for Adaptive Refetch has been released? Considering intranet
and using nutch for  indexing large static HTML pages, i hope this feature
plays a crucial role. Please update me on this.

Thanks,
D.Saravanaraj

On 3/31/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:

 Raghavendra Prabhu wrote:
  I believe we had a recent mail with problem of redirection also (with
 this
  patch applied..)
 
  And as you said  more people testing the patch would be better.
 
  Considering that this has the highest votes for add-on features, it is a
  critical one i guess.
 

 Ok, I'll bring this patch up to date over the weekend.

 --
 Best regards,
 Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com





Re: Adaptive fetch

2006-03-31 Thread Andrzej Bialecki

Raghavendra Prabhu wrote:

Hi Andrzej

Can you put in the latest version of the diff for the adaptive fetch?

Because we seem to have problem patching agains the latest release.

This should help us test it.
  


The patch is probably out of sync, there have been many (trivial) 
changes in the meantime. The best option would be to commit this 
functionality, if enough people consider it of a sufficiently good 
quality. What prevents me from doing this is that I don't use this 
version on a regular basis - the original version is good enough for my 
use, even though not ideal. And I have a feeling that not too many 
people really reviewed this patch.


So, IMHO these patches need more testing, because the potential for 
disruption is rather large.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: Adaptive fetch

2006-03-31 Thread Raghavendra Prabhu
I believe we had a recent mail with problem of redirection also (with this
patch applied..)

And as you said  more people testing the patch would be better.

Considering that this has the highest votes for add-on features, it is a
critical one i guess.


Rgds
Prabhu

On 3/31/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:

 Raghavendra Prabhu wrote:
  Hi Andrzej
 
  Can you put in the latest version of the diff for the adaptive fetch?
 
  Because we seem to have problem patching agains the latest release.
 
  This should help us test it.
 

 The patch is probably out of sync, there have been many (trivial)
 changes in the meantime. The best option would be to commit this
 functionality, if enough people consider it of a sufficiently good
 quality. What prevents me from doing this is that I don't use this
 version on a regular basis - the original version is good enough for my
 use, even though not ideal. And I have a feeling that not too many
 people really reviewed this patch.

 So, IMHO these patches need more testing, because the potential for
 disruption is rather large.

 --
 Best regards,
 Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com





Re: Adaptive fetch

2006-03-31 Thread Andrzej Bialecki

Raghavendra Prabhu wrote:

I believe we had a recent mail with problem of redirection also (with this
patch applied..)

And as you said  more people testing the patch would be better.

Considering that this has the highest votes for add-on features, it is a
critical one i guess.
  


Ok, I'll bring this patch up to date over the weekend.

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: adaptive fetch

2006-03-28 Thread Andrzej Bialecki

Raghavendra Prabhu wrote:

Hi Andrzej

After applying the patch, i seemed to find some strange behaviour

The fetch list for each URL was getting created inspite of the fact that
db.default.fetch.interval had not been reached
  


You probably forgot to change the interval from days to seconds. It's 
now expressed in seconds. This defines the maximum allowed interval, and 
any pages with interval higher than that will be refetched anyway - so 
if it's 30 (seconds :) ) then there is a high probability that you reach 
this limit before each cycle completes...



I thought this was supposed to be in this order

1)For the particular url/file get db fetch interval (which changes)

2) if current date exceeds db fetch interval, generate fetch list for the
particular file url

3) fetch list checks for file modified date and then decides to fetch the
latest contents file/URL

It is supposed to function in the above manner right. Did i miss out
anything???

  


Yes, this is how it's supposed to work.

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: Adaptive fetch schedule

2006-03-22 Thread Andrzej Bialecki

(Moved to the proper list)

Raghavendra Prabhu wrote:

Hi

Does the inlink value problem solve the OPIC problem which was there.

That is on a recrawl, the page would have a higher score.

Does this fix that problem?
  

No, it doesn't. But it prevents your linkDB from growing indefinitely, which is 
also good.

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: Adaptive fetch

2006-02-28 Thread Raghavendra Prabhu
Point noted Andrzej

We will experiment with the schedules and let you know how it worked
out.Itis flexible right now i guess.

Thanks


Rgds
Prabhu


On 2/28/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:

 Raghavendra Prabhu wrote:
  Maybe we can add a function which will do this so that people using
 crawl
  can make use of this function.(a new function with  a minor modification
 in
  update database which so that it will replace the
  db.defautl.fetch.intervalin the webdb to zero)
 

 Ah, well... there will always be this or that that you can add, the
 question is whether you should?

 Somehow I don't see that it would be needed to put this functionality in
 the FetchSchedule interface... that was the whole point of this patch,
 so that you can experiment and implement various fetch schedules as you
 wish. In this case I recommend that you do just that ;-)


 --
 Best regards,
 Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com