Edit report at https://bugs.php.net/bug.php?id=61018&edit=1
ID: 61018 Comment by: mattfic...@php.net Reported by: dey101+php at gmail dot com Summary: Unexplained bool(false) returned from preg_match Status: Open Type: Bug Package: PCRE related Operating System: Windows Server 2008 PHP Version: 5.3.10 Block user comment: N Private report: N New Comment: Thank you for your report and helping to make php better. When I ran your script on Windows 2008 and Linux(using TS build of php5.3.10), it looks like the output is the same on both OSes. I don't think this is a PHP on Windows bug. If you would like, I can reclassify this bug as a general bug, not specific to Windows. Or, am I missing something? Is this really a PHP on Windows problem? win2008 sp1 x64 output(TS Build): Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/ Host: ABCDEFGHIJ1234567890. Result: (error) bool(false) Host: ABCDEFGHI234567890. Result: (no match) int(0) Host: ABCDEFGHIJ1234567890 Result: (match) int(1) Host: ABCDEFGHI1234567890 Result: (match) int(1) Host: ABCDEFGHI123456789 Result: (match) int(1) Host: ABCDEFGHIJ-1234567890 Result: (match) int(1) Host: ABCDEFGHIJ-123456789 Result: (match) int(1) Host: ABCDEFGHI-123456789 Result: (match) int(1) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (no match) int(0) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890 -abcd-efgh-hijk.COM Result: (no match) int(0) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/ Host: ABCDEFGHIJ1234567890. Result: (match) int(1) Host: ABCDEFGHI234567890. Result: (match) int(1) Host: ABCDEFGHIJ1234567890 Result: (error) bool(false) Host: ABCDEFGHI1234567890 Result: (error) bool(false) Host: ABCDEFGHI123456789 Result: (no match) int(0) Host: ABCDEFGHIJ-1234567890 Result: (error) bool(false) Host: ABCDEFGHIJ-123456789 Result: (error) bool(false) Host: ABCDEFGHI-123456789 Result: (no match) int(0) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (error) bool(false) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890 -abcd-efgh-hijk.COM Result: (error) bool(false) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/ Host: ABCDEFGHIJ1234567890. Result: (match) int(1) Host: ABCDEFGHI234567890. Result: (match) int(1) Host: ABCDEFGHIJ1234567890 Result: (no match) int(0) Host: ABCDEFGHI1234567890 Result: (no match) int(0) Host: ABCDEFGHI123456789 Result: (no match) int(0) Host: ABCDEFGHIJ-1234567890 Result: (no match) int(0) Host: ABCDEFGHIJ-123456789 Result: (no match) int(0) Host: ABCDEFGHI-123456789 Result: (no match) int(0) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (error) bool(false) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890 -abcd-efgh-hijk.COM Result: (error) bool(false) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum: ]\-]*[[:alnum:]])*$/ Host: ABCDEFGHIJ1234567890. Result: (error) bool(false) Host: ABCDEFGHI234567890. Result: (error) bool(false) Host: ABCDEFGHIJ1234567890 Result: (error) bool(false) Host: ABCDEFGHI1234567890 Result: (error) bool(false) Host: ABCDEFGHI123456789 Result: (match) int(1) Host: ABCDEFGHIJ-1234567890 Result: (error) bool(false) Host: ABCDEFGHIJ-123456789 Result: (error) bool(false) Host: ABCDEFGHI-123456789 Result: (match) int(1) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (match) int(1) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890 -abcd-efgh-hijk.COM Result: (match) int(1) Linux-x64-gentoo output: Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/ Host: ABCDEFGHIJ1234567890. Result: (error) bool(false) Host: ABCDEFGHI234567890. Result: (no match) int(0) Host: ABCDEFGHIJ1234567890 Result: (match) int(1) Host: ABCDEFGHI1234567890 Result: (match) int(1) Host: ABCDEFGHI123456789 Result: (match) int(1) Host: ABCDEFGHIJ-1234567890 Result: (match) int(1) Host: ABCDEFGHIJ-123456789 Result: (match) int(1) Host: ABCDEFGHI-123456789 Result: (match) int(1) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (no match) int(0) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-123 45-67890-abcd-efgh-hijk.COM Result: (no match) int(0) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/ Host: ABCDEFGHIJ1234567890. Result: (match) int(1) Host: ABCDEFGHI234567890. Result: (match) int(1) Host: ABCDEFGHIJ1234567890 Result: (error) bool(false) Host: ABCDEFGHI1234567890 Result: (error) bool(false) Host: ABCDEFGHI123456789 Result: (no match) int(0) Host: ABCDEFGHIJ-1234567890 Result: (error) bool(false) Host: ABCDEFGHIJ-123456789 Result: (error) bool(false) Host: ABCDEFGHI-123456789 Result: (no match) int(0) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (error) bool(false) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ- 12345-67890-abcd-efgh-hijk.COM Result: (error) bool(false) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/ Host: ABCDEFGHIJ1234567890. Result: (match) int(1) Host: ABCDEFGHI234567890. Result: (match) int(1) Host: ABCDEFGHIJ1234567890 Result: (no match) int(0) Host: ABCDEFGHI1234567890 Result: (no match) int(0) Host: ABCDEFGHI123456789 Result: (no match) int(0) Host: ABCDEFGHIJ-1234567890 Result: (no match) int(0) Host: ABCDEFGHIJ-123456789 Result: (no match) int(0) Host: ABCDEFGHI-123456789 Result: (no match) int(0) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (error) bool(false) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM Result: (error) bool(false) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/ Host: ABCDEFGHIJ1234567890. Result: (error) bool(false) Host: ABCDEFGHI234567890. Result: (error) bool(false) Host: ABCDEFGHIJ1234567890 Result: (error) bool(false) Host: ABCDEFGHI1234567890 Result: (error) bool(false) Host: ABCDEFGHI123456789 Result: (match) int(1) Host: ABCDEFGHIJ-1234567890 Result: (error) bool(false) Host: ABCDEFGHIJ-123456789 Result: (error) bool(false) Host: ABCDEFGHI-123456789 Result: (match) int(1) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (match) int(1) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM Result: (match) int(1) Previous Comments: ------------------------------------------------------------------------ [2012-02-08 18:43:42] dey101+php at gmail dot com Description: ------------ PHP VC9 x86 Thread Safe (from http://windows.php.net/download/) Using a regex to validate if a string is a valid hostname (host or FQDN). It seems that for certain length strings trying to match a literal period at the end will cause the preg_match to return false if the string does not have a period in it. It also will return false if the string has a period at the end, and the regex does not try to match them. The regex is using subpatterns ()to apply the zero or more repetition quantifier *. I tried with both capturing and non-capturing (?:), both yield the same result. However, if I use the one or more quantifier + it does not return bool(false). Using {0,} instead of * does not change the outcome. It seems that the cutoff length for the string is about 20 characters. Less than that, the results are int(0) or int(1) depending on if the regex matches, longer than that, and bool(false) is returned. If the subpattern is part of a longer string, it does work as anticipated. Matching a literal period at the beginning of the pattern does not yield an error. Substituting a-zA-Z0-9 for the [:alnum:] character class does not affect the results. error_get_last() does not return anything, nothing is showing up in logs with error_reporting(-1) set either. Test script: --------------- $regexs = array ( '/^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/', '/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/', '/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/', '/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/' ); $hosts = array ( 'ABCDEFGHIJ1234567890.', // long string with period at end 'ABCDEFGHI234567890.', // slightly shorter string with period at end 'ABCDEFGHIJ1234567890', // long string no period 'ABCDEFGHI1234567890', // a little shorter 'ABCDEFGHI123456789', // even shorter 'ABCDEFGHIJ-1234567890', // long with hyphen 'ABCDEFGHIJ-123456789', // sorter with hyphen 'ABCDEFGHI-123456789', // even shorter with hyphen 'WWW.ABCDEFGHIJ-1234567890.COM', // a FQDN with long sting and hyphen 'WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM' // a really long FQDN ); foreach ($regexs as $regex) { echo "\nRegex: $regex\n"; foreach ($hosts as $host) { echo " Host: $host\n"; $result = preg_match($regex, $host); echo ' Result: '; if ($result === false) { echo '(error) '; print_r(error_get_last()); // never prints anything? } else { echo ($result) ? '(match) ' : '(no match) '; } var_dump($result); } } Expected result: ---------------- none of the results should yield bool(false) Actual result: -------------- // just the output from the last regex, but others yield bool(false) Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/ Host: ABCDEFGHIJ1234567890. Result: (error) bool(false) Host: ABCDEFGHI234567890. Result: (error) bool(false) Host: .ABCDEFGHIJ1234567890 Result: (no match) int(0) Host: ABCDEFGHIJ1234567890 Result: (error) bool(false) Host: ABCDEFGHI1234567890 Result: (error) bool(false) Host: ABCDEFGHI123456789 Result: (match) int(1) Host: ABCDEFGHIJ-1234567890 Result: (error) bool(false) Host: ABCDEFGHIJ-123456789 Result: (error) bool(false) Host: ABCDEFGHI-123456789 Result: (match) int(1) Host: WWW.ABCDEFGHIJ-1234567890.COM Result: (match) int(1) Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM Result: (match) int(1) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=61018&edit=1