If I'm in directory foo with subdirectory sth, a url pointing to
/sth/../sth/../sth/../ points to foo.  If this continues doing
(/sth/..)* then I could have an infinitely long URL pointing to the
same place which is wasteful.

On 9/7/07, Smith Norton <[EMAIL PROTECTED]> wrote:
> /sth/../sth/../sth/ also works. Thanks for the quick response.
>
> Why is this filter necessary? It says to break out of loops.
>
> Could someone please tell me what can go wrong if I chose to remove this 
> filter?
>
> On 9/7/07, Damian Florczyk <[EMAIL PROTECTED]> wrote:
> > "Smith Norton" <[EMAIL PROTECTED]> wrote:
> >
> > > This is a very basic question and unfortunately I am not able to
> > > figure this out.
> > >
> > > In the regex-urlfilter.txt, I find this line present:-
> > >
> > > # skip URLs with slash-delimited segment that repeats 3+ times, to break 
> > > loops
> > > -.*(/.+?)/.*?\1/.*?\1/
> > >
> > > What type of URLs does it block? What does 'segment' mean here? Could
> > > someone please provide an example of an URL that this particular regex
> > > will select and prevent from being crawled.
> > For example:
> >
> > /sth/../sth/../sth/
> >
> > --
> > Damian Florczyk aka thunder
> > Gentoo Developer, Gentoo/NetBSD Development Lead
> >
>

Reply via email to