Hi Markus, Thanks but I need to prevent download, because I have more CPU resources than bandwidth :) Therefore, it is more important to deal with the beast before born.
Dincer 2011/8/18 Markus Jelsma <[email protected]> > Hi, > > At the moment you cannot do this out-of-the-box. It's a very, very, nasty > problem that needs a lot of thinking if you want to prevent downloading > such > URL's. > What you can do is just download them and mark them as duplicate by either > using the simple hashing algorithm or a more advanced text profile > signature. > > Cheers, > > On Thursday 18 August 2011 15:35:26 Dinçer Kavraal wrote: > > Hi, > > > > I have two URLs such as: > > http://example.com/pageBla/John/*123*/blabla > > http://example.com/pageBla/Doe/*123*/albalb > > The thing is these two URLs are same because of the id part of the URL > > (which is *123* in this sample). How could I manage to prevent download > > same thing twice because of that? > > > > I think I can customize injection classes but how could I check if > another > > form of the URL is already fetched? > > > > Any ideas? Thanks > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

