Not to be a pedant, but that could be either.

On Sep 18, 2008, at 8:43 AM, [EMAIL PROTECTED] wrote:

Isn't he in fact NOT using the US date notation? AFAIK, the US date notation is mm/dd/yyyy.

Russ
------Original Message------
From: Andrzej Bialecki
To: [email protected]
ReplyTo: [email protected]
Sent: Sep 18, 2008 11:18 AM
Subject: Re: Dedup

David Jashi wrote:
Hello, colleagues.

I have a theoretical question - let's say
on 01/01/2008 we have crawled page http://www.site.com/page.html
on 10/01/2008 the page changed
on 01/02/2008 we crawled it once again and merged old and new indexes

which version of this page Nutch dedup will leave in index?

If we assume that you're using the US date notation (how quaint ;) ),
then yes - Dedup always keeps the latest version of the page with the
same url, and discards all other versions.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Sent from my Verizon Wireless BlackBerry

Reply via email to