Hi,

At the moment you cannot do this out-of-the-box. It's a very, very, nasty 
problem that needs a lot of thinking if you want to prevent downloading such 
URL's. 
What you can do is just download them and mark them as duplicate by either 
using the simple hashing algorithm or a more advanced text profile signature. 

Cheers,

On Thursday 18 August 2011 15:35:26 Dinçer Kavraal wrote:
> Hi,
> 
> I have two URLs such as:
> http://example.com/pageBla/John/*123*/blabla
> http://example.com/pageBla/Doe/*123*/albalb
> The thing is these two URLs are same because of the id part of the URL
> (which is *123* in this sample). How could I manage to prevent download
> same thing twice because of that?
> 
> I think I can customize injection classes but how could I check if another
> form of the URL is already fetched?
> 
> Any ideas? Thanks

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to