Re: find duplicate urls in webdb

2006-03-06 Thread Andrzej Bialecki
Elwin wrote: When I read pages out of a webdb and printed out the url of each page, I found two urls are just the same. Is it possible that two pages with the same url? WebDB should not allow two URLs that are exactly the same (Nutch uses MD5 signature for that). Please check them

find duplicate urls in webdb

2006-03-05 Thread Elwin
When I read pages out of a webdb and printed out the url of each page, I found two urls are just the same. Is it possible that two pages with the same url? -- 《盖世豪侠》好评如潮,让无线收视居高不下, 无线高兴之余,仍未重用。周星驰岂是池中物, 喜剧天分既然崭露,当然不甘心受冷落,于是 转投电影界,在大银幕上一展风采。无线既得 千里马,又失千里马,当然后悔莫及。