Armel T. Nene wrote: > Hi guys, > > > > I am using Nutch 0.8.2-dev. I have notice that the crawldb does not actually > save the last modified date of files. I have run a crawl on my local file > system and the web. When I dumped the content of crawldb for both crawl, the > modified date of the files were set to 01-Jan-1970 01:00:00. I don't if it's > intended to be as is or if it's a bug. Therefore my question is: > > > > * How does the generator knows which file to crawl again? > > o Is it looking at the fetch time? > > o The modified date as this can be misleading? > > > > There is a modified date returned in most http headers and files on file > system all have modified date which is the last modified date. How come it's > not stored in the crawldb? > >
This is the issue described in NUTCH-61 - patches from that issue will be applied soon to trunk/ . -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers