ID: 27505 Updated by: [EMAIL PROTECTED] Reported By: ywliu at hotmail dot com -Status: Open +Status: Closed Bug Type: *Languages/Translation Operating System: linux PHP Version: 4.3.4 New Comment:
This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. Previous Comments: ------------------------------------------------------------------------ [2004-03-05 03:43:06] ywliu at hotmail dot com Description: ------------ In ext/standard/html.c , htmlentities() fails to identify BIG5 Chinese characters correctly. I have checked CVS version 1.87, the bug is still there. Reproduce code: --------------- In html.c, look for this piece of code : case cs_big5: case cs_gb2312: case cs_big5hkscs: { /* check if this is the first of a 2-byte sequence */ if (this_char >= 0xa1 && this_char <= 0xf9) { /* peek at the next char */ unsigned char next_char = str[pos]; if ((next_char >= 0x40 && next_char <= 0x73) ||(next_char >= 0xa1 && next_char <= 0xfe)) { Expected result: ---------------- In fact, the first byte should be from 0xa1 to 0xfe, and the second byte should be from 0x40-0x7e and 0xa1-0xfe. (from page 88, "Understanding Japanese Information Processing" by Ken Lunde , O'Reilly.) Actual result: -------------- So it should be : if (this_char >= 0xa1 && this_char <= 0xfe) { and if ((next_char >= 0x40 && next_char <= 0x7e) ||(next_char >= 0xa1 && next_char <= 0xfe)) { ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=27505&edit=1
