How to I use a URL pattern that has a '?' or '\' in the URL (Was Re: Spidering Help)

Robert O'Connor Wed, 02 Oct 2002 05:30:10 -0700

> I'm using the new version of the Plucker Desktop, and I'm trying to Pluck an
> article that spans several pages, but I'm not having much luck, could someone
> point out what I'm doing wrong?
> 
> The article is at:
> http://avault.com/developer/getarticle.asp?name=bsawyer1
> 
> and each page is linked thus:
> http://avault.com/developer/getarticle.asp?name=bsawyer1&page=2
> http://avault.com/developer/getarticle.asp?name=bsawyer1&page=3
> etc etc.
>
>  used an
> URL pattern filter as ".*avault.com/developer/getarticle.asp?name=bsawyer1.*".


Hi Ian,

The URL above has a ? in it and probably should be escaped, as ? has special meaning 
in regular 
expressions. 
For example, try this URL pattern:
.*avault\.com/developer/getarticle\.asp\?name=bsawyer.*

(It is not as important to escape the periods though, just the '?')

This would probably also apply to MSW users using a URL pattern on their local 
filesystem, as 
many backslash letter combinations have special meaning, and MSW filesystem 
unfortunately uses 
backslashes.

This would make a good addition as a tip to the help, to mention the need to escape 
special 
charactersn in urls.

Here is a 1 page reference for regex, as applies to the python parser:
http://www.python.org/doc/current/lib/re-syntax.html

Best wishes,
Robert
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

How to I use a URL pattern that has a '?' or '\' in the URL (Was Re: Spidering Help)

Reply via email to