/sth/../sth/../sth/ also works. Thanks for the quick response. Why is this filter necessary? It says to break out of loops.
Could someone please tell me what can go wrong if I chose to remove this filter? On 9/7/07, Damian Florczyk <[EMAIL PROTECTED]> wrote: > "Smith Norton" <[EMAIL PROTECTED]> wrote: > > > This is a very basic question and unfortunately I am not able to > > figure this out. > > > > In the regex-urlfilter.txt, I find this line present:- > > > > # skip URLs with slash-delimited segment that repeats 3+ times, to break > > loops > > -.*(/.+?)/.*?\1/.*?\1/ > > > > What type of URLs does it block? What does 'segment' mean here? Could > > someone please provide an example of an URL that this particular regex > > will select and prevent from being crawled. > For example: > > /sth/../sth/../sth/ > > -- > Damian Florczyk aka thunder > Gentoo Developer, Gentoo/NetBSD Development Lead >
