The attached patch is the result of my work today on htmlentities.
It provides a charset aware implementation that works for:

iso-8859-1 (latin1),
iso-8859-15 (latin9)
Windows-1252 (latin1 with MS extensions)

I have added a third optional parameter to htmlentities() to allow the
to specify the charset.
If omitted and the build supports setlocale(), it tries to determine the
charset from the locale (pretty futile to be honest, but worth a go).
If it can't determine to charset, or none is supplied, it defaults to

I didn't really want to add the optional parameter, but setlocale() sucks
and an ini option didn't seem appropriate to something that could
theoretically be changed multiple times per script.

If the charset is utf-8, it unpacks the utf-8 encoding and uses the
entities if any match.

I've rewritten the entity substitution code so that multiple tables can
used for different character ranges and shared between charsets (eg:
and utf-8 use the same table for the latin1 range).

This patch has an impact on the wddx extension which needs to pass a NULL
for the charset, which will in turn search the locale to determine which
It might be a good idea to define a magic charset value that means "use
default latin1 tables"
#define HTML_ENTITY_LATIN1_CHARSET  (char*)(-1)
or just pass "iso-8859-1" instead of NULL.

Any comments?



PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to