From: strata_ranger at hotmail dot com
Operating system: *
PHP version: 5.2.10
PHP Bug Type: PCRE related
Bug description: PREG_BAD_UTF8_ERROR should emit E_NOTICE
Description:
------------
This is not a PHP bug, but a suggestion that would help with
troubleshooting PCRE calls in one's own PHP scripts.
When using the /u modifier in PCRE, if the subject string contains an
invalid Unicode sequence, this generates a PREG_BAD_UTF8_ERROR (which can
be retrieved using preg_last_error() ). This is expected behavior for
PCRE, but it should also emit an E_NOTICE to the user because it could
indicate an error in their script (the definition of an E_NOTICE).
Specifically, when using preg_replace() in an assignment context (i.e:
$subject = preg_replace($foo, $bar, $subject) ), this can create situations
where a PREG_BAD_UTF8_ERROR causes the subject string to be "erased"
(re-assigned NULL) if the script author didn't take time to ensure that
their subject string was valid utf-8 before calling preg_replace().
Even though it's the fault of the script author, the preg_* functions
should still at least emit an E_NOTICE about bad UTF-8; it's a pain to hunt
through one's proverbial 'miles of code' to figure out why one of their
variables suddenly 'disappeared', without a file name or line number to
start the troubleshooting by.
Workarounds available in the meantime are:
// As of PHP 5.3
// (unless the replacement yields string '0')
$string = preg_replace(..., $string) ?: $string; // As of PHP 5.3
// Other workaround (any PHP version)
$string = is_string($repl=preg_replace(..., $string))? $repl : string;
Reproduce code:
---------------
---
>From manual page: reference.pcre.pattern.modifiers
---
error_reporting(-1); // Emit all errors
$subject = "fa\xa0ade"; // Valid in ISO-8859-1 (but not UTF-8!)
// Causes a PREG_BAD_UTF8_ERROR and sets $subject to NULL.
// And we didn't make a copy of the original $subject. Oops!
$subject = preg_replace('//u', '', $subject);
var_dump($string); // NULL
var_dump(preg_last_error());
---
Actual result:
--------------
preg_replace() returns NULL; checking preg_last_error() verifies a
PREG_BAD_UTF8_ERROR. No errors, warnings, or notices of any kind were
generated.
We did, however, immediately assign the preg_replace() back to $subject,
so $subject is now NULL and has lost whatever data it originally contained.
Even though this was obviously our fault, an E_NOTICE would have told us
about it.
--
Edit bug report at http://bugs.php.net/?id=49339&edit=1
--
Try a snapshot (PHP 5.2):
http://bugs.php.net/fix.php?id=49339&r=trysnapshot52
Try a snapshot (PHP 5.3):
http://bugs.php.net/fix.php?id=49339&r=trysnapshot53
Try a snapshot (PHP 6.0):
http://bugs.php.net/fix.php?id=49339&r=trysnapshot60
Fixed in SVN:
http://bugs.php.net/fix.php?id=49339&r=fixed
Fixed in SVN and need be documented:
http://bugs.php.net/fix.php?id=49339&r=needdocs
Fixed in release:
http://bugs.php.net/fix.php?id=49339&r=alreadyfixed
Need backtrace:
http://bugs.php.net/fix.php?id=49339&r=needtrace
Need Reproduce Script:
http://bugs.php.net/fix.php?id=49339&r=needscript
Try newer version:
http://bugs.php.net/fix.php?id=49339&r=oldversion
Not developer issue:
http://bugs.php.net/fix.php?id=49339&r=support
Expected behavior:
http://bugs.php.net/fix.php?id=49339&r=notwrong
Not enough info:
http://bugs.php.net/fix.php?id=49339&r=notenoughinfo
Submitted twice:
http://bugs.php.net/fix.php?id=49339&r=submittedtwice
register_globals:
http://bugs.php.net/fix.php?id=49339&r=globals
PHP 4 support discontinued: http://bugs.php.net/fix.php?id=49339&r=php4
Daylight Savings: http://bugs.php.net/fix.php?id=49339&r=dst
IIS Stability:
http://bugs.php.net/fix.php?id=49339&r=isapi
Install GNU Sed:
http://bugs.php.net/fix.php?id=49339&r=gnused
Floating point limitations:
http://bugs.php.net/fix.php?id=49339&r=float
No Zend Extensions:
http://bugs.php.net/fix.php?id=49339&r=nozend
MySQL Configuration Error:
http://bugs.php.net/fix.php?id=49339&r=mysqlcfg