Hello again,
I was inspecting the generator because it doesn't deliver all urls for
the fetcht list from the crawldb even if I set the addDays atribute to a
value much higher than the max fetch intervall.
As I had a look at the log file I notice that it uses a time stamp which
I don't know:
2012-01-20 18:32:24,506 DEBUG org.apache.nutch.crawl.Generator:
-shouldFetch rejected 'http://(...)', fetchTime=1327667923420,
curTime=1327076858662
So I wanted to see what time these to values are actually are and
converted them using the date command:
date -u -d @1327667923420
Do 13. Feb 21:23:40 UTC 44042
So the fetch time is in the year 44042? Quite a long time to wait the
same with the system time:
date -u -d @1327076858662
Di 23. Mai 20:44:22 UTC 44023
(My system is NOT set to that date!) ;-)
Does the generator use another kind of timestamp than unix systems? Or
is something terrible wrong here?
Thanks a lot in advance