ID: 35711 Updated by: [EMAIL PROTECTED] Reported By: matteo at beccati dot com -Status: Open +Status: Assigned Bug Type: mbstring related Operating System: Debian GNU/Linux PHP Version: 5.1-dev Assigned To: hirokawa New Comment:
Rui, could you take a look at this once again? Previous Comments: ------------------------------------------------------------------------ [2005-12-24 13:59:10] matteo at beccati dot com I've made a patch which adds an mbstring.strict_detection php.ini flag that specifies the default behaviour (defaults to off). I just started taking a look to PHP internals so I could have made mistakes; make test passes the mbstring related checks, I'll do more tests later. http://beccati.com/download/mbstring-patch-20051224.txt ------------------------------------------------------------------------ [2005-12-24 12:30:08] matteo at beccati dot com These are great news and I'm really thankful for your help. Now mb_detect_encoding is correctly working when the strict flag is set, but... - There's no way to set the strict flag in mb_convert_encoding; however one could use mb_detect_encoding with the strict flag as source charset. - There's no way to set the strict flag for http_input translation, which indeed would be much more useful (that's how I found the problem described here). ------------------------------------------------------------------------ [2005-12-24 02:23:06] [EMAIL PROTECTED] This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. The character-end detection was introduced in the strict mode (mb_detect_encoding ($s,$list,TRUE)). Please try the strict mode. ------------------------------------------------------------------------ [2005-12-24 01:03:21] [EMAIL PROTECTED] Have you ever tried the strict mode (default:FALSE) ? string mb_detect_encoding ( string str [, mixed encoding_list [, bool strict]] ) ------------------------------------------------------------------------ [2005-12-20 17:10:56] matteo at beccati dot com Of course, I agree that 0xe8 is a valid if taken as part of a multibyte character, but I don't think it could be considered valid it the next bytes are missing (because the string ends prematurely). The iconv extension raises notices when it finds illegal or incomplete multibyte characters, I don't see why mbstring should accept as a valid UTF-8 a string which indeed isn't. The same should apply to other multibyte encodings. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/35711 -- Edit this bug report at http://bugs.php.net/?id=35711&edit=1