Hi Markus,

Thanks but I need to prevent download, because I have more CPU resources
than bandwidth :) Therefore, it is more important to deal with the beast
before born.

Dincer


2011/8/18 Markus Jelsma <[email protected]>

> Hi,
>
> At the moment you cannot do this out-of-the-box. It's a very, very, nasty
> problem that needs a lot of thinking if you want to prevent downloading
> such
> URL's.
> What you can do is just download them and mark them as duplicate by either
> using the simple hashing algorithm or a more advanced text profile
> signature.
>
> Cheers,
>
> On Thursday 18 August 2011 15:35:26 Dinçer Kavraal wrote:
> > Hi,
> >
> > I have two URLs such as:
> > http://example.com/pageBla/John/*123*/blabla
> > http://example.com/pageBla/Doe/*123*/albalb
> > The thing is these two URLs are same because of the id part of the URL
> > (which is *123* in this sample). How could I manage to prevent download
> > same thing twice because of that?
> >
> > I think I can customize injection classes but how could I check if
> another
> > form of the URL is already fetched?
> >
> > Any ideas? Thanks
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to