np, you are welcome
Dean Elwood wrote:
Ah, thanks EM - so basically we need to escape the dots.......
something that didn't even occur to me -many thanks!
Dean
----- Original Message ----- From: "EM" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Sunday, November 06, 2005 9:43 PM
Subject: Re: Not crawling specific pages
Here's the relevant extract from my crawl-urlfilter.txt file:-
# Site to crawl
+^http://([a-z0-9]*\.)*mysite.org/
# ignore error pages
-^http://www.mysite.org/view/.error_page
As you can see, I took a "guess" that I could simply use the minus
sign as a means of ignoring the page that I want excluded.
This doesn't seem to work. Any guidance would be greatly appreciated.
any dot in the url, has to be substituted with "\." without the quotes.
Just putting a dot in the expression will match any character.
For more, google for "regex"
Hope this helps,
EM