Elwin wrote:
When I read pages out of a webdb and printed out the url of each page, I
found two urls  are just the same.
Is it possible that two pages with the same url?

WebDB should not allow two URLs that are exactly the same (Nutch uses MD5 signature for that). Please check them carefully, most probably they differ only in a single character, or a whitespace.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to