Hi,

I'd like to crawl pages of chat logs that change whenever someone sends
a message in our chat rooms, which happens every couple of seconds.
The HTML log pages are updated instantly by the prosody jabber server
and thus have always current timestamps.

Nutch seems to reject them now because they are too new:

> -shouldFetch rejected
>  'http://conference.nr:5290/muc_log/',
>  fetchTime=1314950217363, curTime=1312358255779


I have two questions:

1. Which timestamp format is that? They don't seem to be unix
timestamps, because 
> $ php -r 'echo date("Y-m-d H:i:s", 1312358255779);'
> 43556-12-23 16:56:19
is the wrong year :)

2. What can I do to not get those URLs rejected? I already tried to set
   > db.fetch.schedule.adaptive.sync_delta
   to false and 
   > db.fetch.schedule.adaptive.inc_rate
   > db.fetch.schedule.adaptive.dec_rate
   to 0, but that does not help.

-- 
Viele Grüße
Christian Weiske

Reply via email to