The probability of encountering a $ sign somewhere inside URL is not insignificant... I agree that it's very unlikely (perhaps even illegal) to use ^ in URLs, but $ are sometimes used.

I'd have to take a look at the spec, but I think both characters should be URL-encoded anyway. Maybe it'd be a good idea to include a URL-normalizing filter that would encode everything properly (according to www-url-encoding) before regexping?

D.


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to