php-general Digest 9 Jan 2011 08:06:06 -0000 Issue 7123

Topics (messages 310606 through 310614):

Re: Command line PHP
        310606 by: Daniel Brown

Re: Validate Domain Name by Regular Express
        310607 by: Ashley Sheridan
        310608 by: TR Shaw
        310609 by: Al
        310610 by: Tamara Temple
        310611 by: WalkinRaven
        310612 by: WalkinRaven
        310613 by: WalkinRaven
        310614 by: Ashley Sheridan

Administrivia:

To subscribe to the digest, e-mail:
        [email protected]

To unsubscribe from the digest, e-mail:
        [email protected]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------
--- Begin Message ---
On Sat, Jan 8, 2011 at 00:23, Larry Garfield <[email protected]> wrote:
> On Friday, January 07, 2011 9:34:42 pm David Hutto wrote:
>
>> Which yielded this as the first result:
>>
>>
>> http://php.net/manual/en/features.commandline.php
>
> As noted in my original email, I find the native SAPI clunky and difficult to
> work with.  Hence I was looking for something more usable and robust built on
> top of it that I could leverage rather than rolling my own one-off.  Of
> course, I got lost somewhere in the language holy wars (dear god, people...)
> so I'll probably just take the "roll my own" approach.

    Larry, I have a malicious process detector I wrote a while back.
Haven't maintained or even looked at the code in a while, and it was
only intended for my own usage (so it's not going to be in the best of
shape), but you're welcome to it if you want something to jump-start
your work.  Just let me know.

-- 
</Daniel P. Brown>
Network Infrastructure Manager
Documentation, Webmaster Teams
http://www.php.net/

--- End Message ---
--- Begin Message ---
On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:

> PHP 5.3 PCRE
> 
> Regular Express to match domain names format according to RFC 1034 - 
> DOMAIN NAMES - CONCEPTS AND FACILITIES
> 
> /^
> (
>    [a-z]                 |
>    [a-z] (?:[a-z]|[0-9]) |
>    [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                     ) # One 
> label
> 
> (?:\.(?1))*+        # More labels
> \.?                 # Root domain name
> $/iDx
> 
> This rule matches only <label> and <label>. but not <label>.<label>...
> 
> I don't know what wrong with it.
> 
> Thank you.
> 



I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );


if(strlen($domain <= 253))
{
        $labels = explode('.', $domain);
        if(in_array($labels[count($labels)-1], $tlds))
        {
                for($i=0; $i<count($labels) -1; $i++)
                {
                        if(strlen($labels[$i]) <= 63 && 
(!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
                        {
                                $valid = false;
                                break;  // no point continuing if one label is 
wrong
                        }
                        else
                        {
                                $valid = true;
                        }
                }
        }
}

var_dump($valid);


This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)

Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.

I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?

Thanks,
Ash
http://www.ashleysheridan.co.uk



--- End Message ---
--- Begin Message ---
On Jan 8, 2011, at 12:09 PM, Ashley Sheridan wrote:

> On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:
> 
>> PHP 5.3 PCRE
>> 
>> Regular Express to match domain names format according to RFC 1034 - 
>> DOMAIN NAMES - CONCEPTS AND FACILITIES
>> 
>> /^
>> (
>>   [a-z]                 |
>>   [a-z] (?:[a-z]|[0-9]) |
>>   [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                     ) # One 
>> label
>> 
>> (?:\.(?1))*+        # More labels
>> \.?                 # Root domain name
>> $/iDx
>> 
>> This rule matches only <label> and <label>. but not <label>.<label>...
>> 
>> I don't know what wrong with it.
>> 
>> Thank you.
>> 
> 
> 
> 
> I think trying to do all of this in one regex will prove more trouble
> than it's worth. Maybe breaking it down into something like this:
> 
> <?php
> $domain = "www.ashleysheridan.co.uk";
> $valid = false;
> 
> $tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
> 'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
> 'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
> 'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
> 'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
> 'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
> 'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
> 'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
> 'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
> 'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
> 'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
> 'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
> 'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
> 'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
> 'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
> 'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
> 'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
> 'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
> 'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
> 'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
> 'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
> 'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
> 've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
> 'zw', );
> 
> 
> if(strlen($domain <= 253))
> {
>       $labels = explode('.', $domain);
>       if(in_array($labels[count($labels)-1], $tlds))
>       {
>               for($i=0; $i<count($labels) -1; $i++)
>               {
>                       if(strlen($labels[$i]) <= 63 && 
> (!preg_match('/^[a-z0-9][a-z0-9
> \-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
> $labels[$i]) ))
>                       {
>                               $valid = false;
>                               break;  // no point continuing if one label is 
> wrong
>                       }
>                       else
>                       {
>                               $valid = true;
>                       }
>               }
>       }
> }
> 
> var_dump($valid);
> 
> 
> This matches the last label with a TLD, and each label thereafter
> against the standard a-z0-9 and hyphen rule as indicated in the
> preferred characters allowed in a label (LDH rule), with the start and
> end character in a label isn't a hyphen (oddly enough it doesn't mention
> starting with a digit!)
> 
> Also, each label is checked to ensure it doesn't run over 63 characters,
> and the whole thing isn't over 253 characters. Lastly, each label is
> checked to ensure it doesn't completely consist of digits.
> 
> I've tested it only with my domain so far, but it should work fairly
> well. As I said before, I couldn't think of a way to do it all with one
> regex. It could probably be done, but would you really want to create a
> huge and difficult to read/understand expression just because it's
> possible?

Ash

I doubt its possible since the ccTLD's have valid 3 and more dotted domain 
names. You should see .us And .uk doesn't follow the ccTLS rules for .tk for 
example.

Now, if the purpose is to write a regex for a host name then that's a different 
story.

Tom

--- End Message ---
--- Begin Message ---


On 1/8/2011 3:55 AM, WalkinRaven wrote:
PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 - DOMAIN
NAMES - CONCEPTS AND FACILITIES

/^
(
[a-z] |
[a-z] (?:[a-z]|[0-9]) |
[a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9]) ) # One label

(?:\.(?1))*+ # More labels
\.? # Root domain name
$/iDx

This rule matches only <label> and <label>. but not <label>.<label>...

I don't know what wrong with it.

Thank you.



Look at filter_var()

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396),


--- End Message ---
--- Begin Message ---
On Jan 8, 2011, at 2:22 PM, Al wrote:



On 1/8/2011 3:55 AM, WalkinRaven wrote:
PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 - DOMAIN
NAMES - CONCEPTS AND FACILITIES

/^
(
[a-z] |
[a-z] (?:[a-z]|[0-9]) |
[a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9]) ) # One label

(?:\.(?1))*+ # More labels
\.? # Root domain name
$/iDx

This rule matches only <label> and <label>. but not <label>.<label>...

I don't know what wrong with it.

Thank you.



Look at filter_var()

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396) ,



I'm wondering what mods to make for this now that unicode chars are allowed in domain names....



--- End Message ---
--- Begin Message ---
On 01/09/2011 01:09 AM, Ashley Sheridan wrote:
On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:

PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 -
DOMAIN NAMES - CONCEPTS AND FACILITIES

/^
(
    [a-z]                 |
    [a-z] (?:[a-z]|[0-9]) |
    [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                      ) # One 
label

(?:\.(?1))*+        # More labels
\.?                 # Root domain name
$/iDx

This rule matches only<label>  and<label>. but not<label>.<label>...

I don't know what wrong with it.

Thank you.




I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );


if(strlen($domain<= 253))
{
        $labels = explode('.', $domain);
        if(in_array($labels[count($labels)-1], $tlds))
        {
                for($i=0; $i<count($labels) -1; $i++)
                {
                        if(strlen($labels[$i])<= 63&&  
(!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
                        {
                                $valid = false;
                                break;  // no point continuing if one label is 
wrong
                        }
                        else
                        {
                                $valid = true;
                        }
                }
        }
}

var_dump($valid);


This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)

Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.

I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?

Thanks,
Ash
http://www.ashleysheridan.co.uk




Thank you for replying, Ash.

I know it may better to pre-deal it with explode()-like, and then we will get a less complex regular express. But I just want to know what the problem in my Regular express.

And the code you've offered, I don't like the idea of a limited set of suffix, for when it may be updated some times. I just want to do format validation, not content validation.

And the regular express itself, yes it is complex, but I've checked it times very carefully -- letter by letter -- I just don't understand what's wrong with it. Or there is some bug in PCRE engine?
--- End Message ---
--- Begin Message ---
On 01/09/2011 01:09 AM, Ashley Sheridan wrote:
On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:

PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 -
DOMAIN NAMES - CONCEPTS AND FACILITIES

/^
(
    [a-z]                 |
    [a-z] (?:[a-z]|[0-9]) |
    [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                      ) # One 
label

(?:\.(?1))*+        # More labels
\.?                 # Root domain name
$/iDx

This rule matches only<label>  and<label>. but not<label>.<label>...

I don't know what wrong with it.

Thank you.




I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );


if(strlen($domain<= 253))
{
        $labels = explode('.', $domain);
        if(in_array($labels[count($labels)-1], $tlds))
        {
                for($i=0; $i<count($labels) -1; $i++)
                {
                        if(strlen($labels[$i])<= 63&&  
(!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
                        {
                                $valid = false;
                                break;  // no point continuing if one label is 
wrong
                        }
                        else
                        {
                                $valid = true;
                        }
                }
        }
}

var_dump($valid);


This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)

Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.

I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?

Thanks,
Ash
http://www.ashleysheridan.co.uk




Thank you for replying, Ash.

I know it may better to pre-deal it with explode()-like, and then we will get a less complex regular express. But I just want to know what the problem in my Regular express.

And the code you've offered, I don't like the idea of a limited set of suffix, for when it may be updated some times. I just want to do format validation, not content validation.

And the regular express itself, yes it is complex, but I've checked it times very carefully -- letter by letter -- I just don't understand what's wrong with it. Or there is some bug in PCRE engine?
--- End Message ---
--- Begin Message --- Right, RFC 1034 allow valid endless . parts, till the sum length is over 255.

On 01/09/2011 01:21 AM, TR Shaw wrote:
On Jan 8, 2011, at 12:09 PM, Ashley Sheridan wrote:

On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:

PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 -
DOMAIN NAMES - CONCEPTS AND FACILITIES

/^
(
   [a-z]                 |
   [a-z] (?:[a-z]|[0-9]) |
   [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                       ) # One 
label

(?:\.(?1))*+        # More labels
\.?                 # Root domain name
$/iDx

This rule matches only<label>  and<label>. but not<label>.<label>...

I don't know what wrong with it.

Thank you.



I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );


if(strlen($domain<= 253))
{
        $labels = explode('.', $domain);
        if(in_array($labels[count($labels)-1], $tlds))
        {
                for($i=0; $i<count($labels) -1; $i++)
                {
                        if(strlen($labels[$i])<= 63&&  
(!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
                        {
                                $valid = false;
                                break;  // no point continuing if one label is 
wrong
                        }
                        else
                        {
                                $valid = true;
                        }
                }
        }
}

var_dump($valid);


This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)

Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.

I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?
Ash

I doubt its possible since the ccTLD's have valid 3 and more dotted domain 
names. You should see .us And .uk doesn't follow the ccTLS rules for .tk for 
example.

Now, if the purpose is to write a regex for a host name then that's a different 
story.

Tom

--
Me at:
http://WalkinRaven.name


--- End Message ---
--- Begin Message ---
On Sun, 2011-01-09 at 11:37 +0800, WalkinRaven wrote:

> On 01/09/2011 01:09 AM, Ashley Sheridan wrote:
> > On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:
> >
> >> PHP 5.3 PCRE
> >>
> >> Regular Express to match domain names format according to RFC 1034 -
> >> DOMAIN NAMES - CONCEPTS AND FACILITIES
> >>
> >> /^
> >> (
> >>     [a-z]                 |
> >>     [a-z] (?:[a-z]|[0-9]) |
> >>     [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])                 ) # One 
> >> label
> >>
> >> (?:\.(?1))*+        # More labels
> >> \.?                 # Root domain name
> >> $/iDx
> >>
> >> This rule matches only<label>  and<label>. but not<label>.<label>...
> >>
> >> I don't know what wrong with it.
> >>
> >> Thank you.
> >>
> >
> >
> >
> > I think trying to do all of this in one regex will prove more trouble
> > than it's worth. Maybe breaking it down into something like this:
> >
> > <?php
> > $domain = "www.ashleysheridan.co.uk";
> > $valid = false;
> >
> > $tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
> > 'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
> > 'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
> > 'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
> > 'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
> > 'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
> > 'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
> > 'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
> > 'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
> > 'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
> > 'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
> > 'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
> > 'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
> > 'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
> > 'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
> > 'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
> > 'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
> > 'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
> > 'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
> > 'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
> > 'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
> > 'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
> > 've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
> > 'zw', );
> >
> >
> > if(strlen($domain<= 253))
> > {
> >     $labels = explode('.', $domain);
> >     if(in_array($labels[count($labels)-1], $tlds))
> >     {
> >             for($i=0; $i<count($labels) -1; $i++)
> >             {
> >                     if(strlen($labels[$i])<= 63&&  
> > (!preg_match('/^[a-z0-9][a-z0-9
> > \-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
> > $labels[$i]) ))
> >                     {
> >                             $valid = false;
> >                             break;  // no point continuing if one label is 
> > wrong
> >                     }
> >                     else
> >                     {
> >                             $valid = true;
> >                     }
> >             }
> >     }
> > }
> >
> > var_dump($valid);
> >
> >
> > This matches the last label with a TLD, and each label thereafter
> > against the standard a-z0-9 and hyphen rule as indicated in the
> > preferred characters allowed in a label (LDH rule), with the start and
> > end character in a label isn't a hyphen (oddly enough it doesn't mention
> > starting with a digit!)
> >
> > Also, each label is checked to ensure it doesn't run over 63 characters,
> > and the whole thing isn't over 253 characters. Lastly, each label is
> > checked to ensure it doesn't completely consist of digits.
> >
> > I've tested it only with my domain so far, but it should work fairly
> > well. As I said before, I couldn't think of a way to do it all with one
> > regex. It could probably be done, but would you really want to create a
> > huge and difficult to read/understand expression just because it's
> > possible?
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
> >
> >
> 
> Thank you for replying, Ash.
> 
> I know it may better to pre-deal it with explode()-like, and then we 
> will get a less complex regular express. But I just want to know what 
> the problem in my Regular express.
> 
> And the code you've offered, I don't like the idea of a limited set of 
> suffix, for when it may be updated some times. I just want to do format 
> validation, not content validation.
> 
> And the regular express itself, yes it is complex, but I've checked it 
> times very carefully -- letter by letter -- I just don't understand 
> what's wrong with it. Or there is some bug in PCRE engine?


The list there is limiting it to the current valid TLD's. That's
correct, and is what a proper validator should do. You can always add
new ones to the array as necessary.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--- End Message ---

Reply via email to