Isn't he in fact NOT using the US date notation? AFAIK, the US date notation is mm/dd/yyyy.
Russ ------Original Message------ From: Andrzej Bialecki To: [email protected] ReplyTo: [email protected] Sent: Sep 18, 2008 11:18 AM Subject: Re: Dedup David Jashi wrote: > Hello, colleagues. > > I have a theoretical question - let's say > on 01/01/2008 we have crawled page http://www.site.com/page.html > on 10/01/2008 the page changed > on 01/02/2008 we crawled it once again and merged old and new indexes > > which version of this page Nutch dedup will leave in index? If we assume that you're using the US date notation (how quaint ;) ), then yes - Dedup always keeps the latest version of the page with the same url, and discards all other versions. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com Sent from my Verizon Wireless BlackBerry
