On Monday, December 15, 2003, at 02:34 PM, Al wrote:

"Philip J. Newman" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
so far i got ...
if (!ereg("^http:\/\/([_\.0-9a-zA-Z-]+\.)+[a-zA-Z]/i",$websiteUrl) {

On a quick glance, three things stand out:


1) You allow underscores in the website domain, but these are not valid
characters for domain names.
2) You allow dashes in the first pattern, but do not escape the dash
character.
3) You have not provided a quantifier to the top-level domain pattern:
[a-zA-Z], so it is only looking for one character fits the class [a-zA-Z].

I'd also add (depending on the situation):


4) allow for https:// and ftp:// and other schemes
5) allow for .htaccess password combinations (guest:[EMAIL PROTECTED])
6) allow for a trailing slash (http://example.com/)
7) allow for directories and files [paths] (http://example.com/path/to/foo.something)
8) allow for query strings (everything after the ?) and anchors (#)


#7 & #8 may seem stupid, but not everyone's base url is a full domain -- sometimes people are burried a few levels deep in directories, and you will undoubtedly get people copying and pasting things like 'example.com/index.html?id=5#foo'

Now, the above is starting to look pretty complex huh?

Take a look at http://php.net/parse_url

It returns the URL as an array of manageable chunks (scheme, domain, path, query, fragment) which you can then perform small, focused checks and regexp's on to check the url conforms to what you want.


Justin French


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to