ID:               41147
 User updated by:  teracci2002 at yahoo dot co dot jp
 Reported By:      teracci2002 at yahoo dot co dot jp
 Status:           Closed
 Bug Type:         mbstring related
 Operating System: Linux
 PHP Version:      5.2.1
 Assigned To:      hirokawa
 New Comment:

I guess the problem is not only in the document.

var_dump(mb_check_encoding("\x00\xE3","UTF-8"));
=> bool(true)        may be checking validity of "byte streams"

var_dump(mb_check_encoding("\xE3", "UTF-8"));
=> bool(false)       may be checking validity of "string"

# I hope that this function checks validity of "string", not "byte
streams" (but this is just my opinion).


Previous Comments:
------------------------------------------------------------------------

[2007-09-24 10:15:32] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

It is a documentation problem, and it is already fixed in CVS.


------------------------------------------------------------------------

[2007-09-19 20:52:48] mike at silverorange dot com

0x00, 0xe3 is a valid byte sequence in UTF-8 but by itself is not a
valid UTF-8 string (it's missing two bytes).

The function is documented as checking the validity of a string so it
should return false for this case. If the function is only supposed to
validate byte-streams then the documentation should be fixed.

------------------------------------------------------------------------

[2007-09-16 08:56:57] [EMAIL PROTECTED]


Sorry for delaying response.

0x00,0x81 is also valid byte sequence in Shift_JIS
because 0x81 is a valid first byte of a double-byte 
JIS X 0208 character.

See: http://en.wikipedia.org/wiki/Shift_jis

We cannot decide the byte stream is valid or 
invalid because the last byte of byte stream (0x81)
is a valid first byte of double-byte character.
In this case, true (valid) will be returned.

The byte stream including a valid first byte +
a invalid second byte returns false.

For example,

var_dump(mb_check_encoding("\x81\x00", "Shift_JIS"));

returns false (invalid).

It is because 0x81 is valid first byte of a double-byte
JIS X0208 character, but, 0x00 is invalid second byte of
a double-byte JIS X0208 character.

And, 
0x00, 0xe3 in UTF-8, it is also 
valid byte sequence (a null byte + first byte of 
a three-byte UTF-8 character).

See: http://en.wikipedia.org/wiki/UTF-8










------------------------------------------------------------------------

[2007-09-04 22:38:26] [EMAIL PROTECTED]

Did you read it Rui? (why do your reports end up as 'Analyzed' all the
time? :)

------------------------------------------------------------------------

[2007-09-04 14:55:58] teracci2002 at yahoo dot co dot jp

> 0x00+0xa1 is valid byte sequence in Shift_JIS sequence.

I know it.
But 0x00+0x81 is invalid sequence in Shift_JIS.
Then, why below statement returns "bool(true)" ?

var_dump(mb_check_encoding("\x00\x81", "Shift_JIS"));

Read bug report again, please.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/41147

-- 
Edit this bug report at http://bugs.php.net/?id=41147&edit=1

Reply via email to