ID: 41147
User updated by: teracci2002 at yahoo dot co dot jp
Reported By: teracci2002 at yahoo dot co dot jp
Status: Closed
Bug Type: mbstring related
Operating System: Linux
PHP Version: 5.2.1
Assigned To: hirokawa
New Comment:
I guess the problem is not only in the document.
var_dump(mb_check_encoding("\x00\xE3","UTF-8"));
=> bool(true) may be checking validity of "byte streams"
var_dump(mb_check_encoding("\xE3", "UTF-8"));
=> bool(false) may be checking validity of "string"
# I hope that this function checks validity of "string", not "byte
streams" (but this is just my opinion).
Previous Comments:
------------------------------------------------------------------------
[2007-09-24 10:15:32] [EMAIL PROTECTED]
This bug has been fixed in CVS.
Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
Thank you for the report, and for helping us make PHP better.
It is a documentation problem, and it is already fixed in CVS.
------------------------------------------------------------------------
[2007-09-19 20:52:48] mike at silverorange dot com
0x00, 0xe3 is a valid byte sequence in UTF-8 but by itself is not a
valid UTF-8 string (it's missing two bytes).
The function is documented as checking the validity of a string so it
should return false for this case. If the function is only supposed to
validate byte-streams then the documentation should be fixed.
------------------------------------------------------------------------
[2007-09-16 08:56:57] [EMAIL PROTECTED]
Sorry for delaying response.
0x00,0x81 is also valid byte sequence in Shift_JIS
because 0x81 is a valid first byte of a double-byte
JIS X 0208 character.
See: http://en.wikipedia.org/wiki/Shift_jis
We cannot decide the byte stream is valid or
invalid because the last byte of byte stream (0x81)
is a valid first byte of double-byte character.
In this case, true (valid) will be returned.
The byte stream including a valid first byte +
a invalid second byte returns false.
For example,
var_dump(mb_check_encoding("\x81\x00", "Shift_JIS"));
returns false (invalid).
It is because 0x81 is valid first byte of a double-byte
JIS X0208 character, but, 0x00 is invalid second byte of
a double-byte JIS X0208 character.
And,
0x00, 0xe3 in UTF-8, it is also
valid byte sequence (a null byte + first byte of
a three-byte UTF-8 character).
See: http://en.wikipedia.org/wiki/UTF-8
------------------------------------------------------------------------
[2007-09-04 22:38:26] [EMAIL PROTECTED]
Did you read it Rui? (why do your reports end up as 'Analyzed' all the
time? :)
------------------------------------------------------------------------
[2007-09-04 14:55:58] teracci2002 at yahoo dot co dot jp
> 0x00+0xa1 is valid byte sequence in Shift_JIS sequence.
I know it.
But 0x00+0x81 is invalid sequence in Shift_JIS.
Then, why below statement returns "bool(true)" ?
var_dump(mb_check_encoding("\x00\x81", "Shift_JIS"));
Read bug report again, please.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/41147
--
Edit this bug report at http://bugs.php.net/?id=41147&edit=1