[PHP-DOC] #30801 [Opn->Csd]: imagettftext 'text' parameter wrongly describes Unicode character entities

vrana Wed, 29 Dec 2004 03:21:24 -0800

 ID:               30801
 Updated by:       [EMAIL PROTECTED]
 Reported By:      andy at andyh dot co dot uk
-Status:           Open
+Status:           Closed
 Bug Type:         Documentation problem
 Operating System: n/a
 PHP Version:      5.0.2
 New Comment:


This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation
better.




Previous Comments:
------------------------------------------------------------------------

[2004-12-19 18:41:31] andy at andyh dot co dot uk

The updated documentation has the following problems:

(1) Does not mention UTF-8 at all, despite this one of the major
points, that it can accept UTF-8 encoded Unicode strings directly.

(2) The example character reference following the words "of the form:"
appears as three garbled characters "€"; was the leading '&' not
escaped as '&amp;' so that it appears as a literal & on the page?

Thanks.

------------------------------------------------------------------------

[2004-11-16 10:33:55] [EMAIL PROTECTED]

This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation
better.

------------------------------------------------------------------------

[2004-11-15 20:44:32] andy at andyh dot co dot uk

Description:
------------
The description of the 'text' parameter in imagettftext wrongly
describes decimal numeric character entities as "UTF-8 character
sequences", whereas the function actually accepts BOTH of UTF-8
character sequences, and decimal numeric character entities.

http://uk2.php.net/imagettftext
"
text
    The text string.

    May include any UTF-8 character sequences (of the form: &#123;) to
access characters in a font beyond the first 255
"

&#123; is not a UTF-8 character sequence, it is a decimal numeric
character reference as per section 5.3.1 of HTML 4.0.1.

http://www.w3.org/TR/html4/charset.html#h-5.3.1

This has nothing to do with the UTF-8 encoding of Unicode characters;
the reference is entirely ASCII, and the character it refers to is the
code point in the Unicode character set - not the corresponding UTF-8
encoding of that character.

Also the example value "123" given is not past the first 255 characters
in the font.

imagettftext passes the string to GD, which expects a UTF-8 encoded
string according to its documentation. So, any character above 127 will
be a malformed UTF-8 string.

See: http://www.boutell.com/gd/manual2.0.33.html#gdImageStringFT
"
The null-terminated string argument is considered to be encoded via the
UTF_8 standard; also, HTML entities are supported, including decimal,
hexadecimal, and named entities (2.0.26).
"

Given characters in the 128-255 range, which are not valid UTF-8 single
byte characters, it appears to fall back to using them as a single-byte
character, but this is not documented.

Suggest the description be changed to something like:

"
text
    The text string, encoded in UTF-8.

    May include decimal numeric character references (of the form:
&#8364;) to access characters in a font beyond position 127.
"

(The value 8364 is the Euro symbol)



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=30801&edit=1

[PHP-DOC] #30801 [Opn->Csd]: imagettftext 'text' parameter wrongly describes Unicode character entities

Reply via email to