If I'm in directory foo with subdirectory sth, a url pointing to /sth/../sth/../sth/../ points to foo. If this continues doing (/sth/..)* then I could have an infinitely long URL pointing to the same place which is wasteful.
On 9/7/07, Smith Norton <[EMAIL PROTECTED]> wrote: > /sth/../sth/../sth/ also works. Thanks for the quick response. > > Why is this filter necessary? It says to break out of loops. > > Could someone please tell me what can go wrong if I chose to remove this > filter? > > On 9/7/07, Damian Florczyk <[EMAIL PROTECTED]> wrote: > > "Smith Norton" <[EMAIL PROTECTED]> wrote: > > > > > This is a very basic question and unfortunately I am not able to > > > figure this out. > > > > > > In the regex-urlfilter.txt, I find this line present:- > > > > > > # skip URLs with slash-delimited segment that repeats 3+ times, to break > > > loops > > > -.*(/.+?)/.*?\1/.*?\1/ > > > > > > What type of URLs does it block? What does 'segment' mean here? Could > > > someone please provide an example of an URL that this particular regex > > > will select and prevent from being crawled. > > For example: > > > > /sth/../sth/../sth/ > > > > -- > > Damian Florczyk aka thunder > > Gentoo Developer, Gentoo/NetBSD Development Lead > > >
