Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c
Hey Moriyoshi, Sorry for my late entry into the debate, but I run into htmlentities() default charset problem today. I wonder why did you opt to use mbstring ini setting (thus making this nice feature mbstring dependant) when we have default_charset ini setting. It just sounds more logical to me to use SG(default_charset) for the default charset of htmlentities(). Your thoughts? Edin - Original Message - From: Moriyoshi Koizumi [EMAIL PROTECTED] To: Wez Furlong [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, October 17, 2002 7:48 AM Subject: Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c Yep, as far as I read the archives, I haven't found any discussions on the charset related backwards problems. So I wrote *exactly* about this issue. You may want to redirect me to bug #9392 (http://bugs.php.net/bug.php?id=9392), but it doens't seem to help... In addition, I found determining the internal charset by LC_CTYPE is dangerous because setlocale() is not thread-safe in some libc implementations (glibc seems to be that one). I'm going to read archives more carefully, though I think even handling the charset in phpinfo() will yield the same discussion in the future. Moriyoshi Koizumi Wez Furlong [EMAIL PROTECTED] wrote: Search the archives for the discussion. phpinfo could determine the charset as your patch does at the start, and then pass the info in php_escape_html_entities. Seems easy to me. --Wez. On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote: Wez Furlong [EMAIL PROTECTED] wrote: Unfortunately, we absolutely must remain 100% backwards compatible with htmlentities(), so this patch should not be applied. Were there any discussions exactly about this issue? Though I have to see some historical reason, however I don't understand why 100% backwards compatibility is required for htmlentities(). Because the patched htmlentities() acts in the same way with default configuration, and IMHO defaulting to iso-8859-1 is quite meaningless for the scripts that uses other charsets than it. Hmm... otherwise I would like to suggest a mbstring function like mb_htmlentities(), but it would sound like a reinvention of the same wheel... However, I don't see a problem with making phpinfo determine the charset and passing that on to the internal htmlentities function? The problem is that php_info_html_esc() in ext/standard/info.c calls php_escape_html_entities() with no charset information specified. Without the patch, every character is treated as ISO-8859-1 even if a fetched character is actually a mere first byte of a multibyte character. Moriyoshi Koizumi -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c
Hello Edin, I don't know if your proposal is logical or not, I figured the problem in the historical context, that it shouldn't rely on mbstring ext too much because it's been there since mbstring(formerly called as jstring) was introduced. But, It's a problem that the internal encoding is not necessarily the same as the output encoding when mb_output_handler is enabled. So in this sense, giving more priority to mbstring.internal_encoding is quite natural to me. In addition there's a hack in mbstring.c that overrides the Content-Type header whatever the SAPI setting is, when output handler is enabled by the ini setting. I think the real issue is we have two similar options that seem to stay different as long as the ZE's parser doens't support various charsets, at least those which can be handled by the current version of mbstring. You may want to point me out that we already have --enable-zend-multibyte, but it's virtually a hack IMO, and it should be integrated to the core at lower level in the future version. BTW, the temporary solution is to give a priority to each setting, like 1. MBSTRG(internal_encoding) 2. SG(default_charset) 3. System's locale setting How about this option? Moriyoshi Edin Kadribasic [EMAIL PROTECTED] wrote: Hey Moriyoshi, Sorry for my late entry into the debate, but I run into htmlentities() default charset problem today. I wonder why did you opt to use mbstring ini setting (thus making this nice feature mbstring dependant) when we have default_charset ini setting. It just sounds more logical to me to use SG(default_charset) for the default charset of htmlentities(). Your thoughts? Edin - Original Message - From: Moriyoshi Koizumi [EMAIL PROTECTED] To: Wez Furlong [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, October 17, 2002 7:48 AM Subject: Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c Yep, as far as I read the archives, I haven't found any discussions on the charset related backwards problems. So I wrote *exactly* about this issue. You may want to redirect me to bug #9392 (http://bugs.php.net/bug.php?id=9392), but it doens't seem to help... In addition, I found determining the internal charset by LC_CTYPE is dangerous because setlocale() is not thread-safe in some libc implementations (glibc seems to be that one). I'm going to read archives more carefully, though I think even handling the charset in phpinfo() will yield the same discussion in the future. Moriyoshi Koizumi Wez Furlong [EMAIL PROTECTED] wrote: Search the archives for the discussion. phpinfo could determine the charset as your patch does at the start, and then pass the info in php_escape_html_entities. Seems easy to me. --Wez. On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote: Wez Furlong [EMAIL PROTECTED] wrote: Unfortunately, we absolutely must remain 100% backwards compatible with htmlentities(), so this patch should not be applied. Were there any discussions exactly about this issue? Though I have to see some historical reason, however I don't understand why 100% backwards compatibility is required for htmlentities(). Because the patched htmlentities() acts in the same way with default configuration, and IMHO defaulting to iso-8859-1 is quite meaningless for the scripts that uses other charsets than it. Hmm... otherwise I would like to suggest a mbstring function like mb_htmlentities(), but it would sound like a reinvention of the same wheel... However, I don't see a problem with making phpinfo determine the charset and passing that on to the internal htmlentities function? The problem is that php_info_html_esc() in ext/standard/info.c calls php_escape_html_entities() with no charset information specified. Without the patch, every character is treated as ISO-8859-1 even if a fetched character is actually a mere first byte of a multibyte character. Moriyoshi Koizumi -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c
BTW, the temporary solution is to give a priority to each setting, like 1. MBSTRG(internal_encoding) 2. SG(default_charset) 3. System's locale setting How about this option? This sounds fine. This way people who compile php without mbstring support can alter the default charset. Edin -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c
On 10/17/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote: Yep, as far as I read the archives, I haven't found any discussions on the charset related backwards problems. So I wrote *exactly* about this issue. Search for htmlentities charset. Both myself and thies (and probably others were discussing this). In short: there are many, many, many people who have scripts that rely on htmlentities defaulting to iso-8859-1 (the documented default for ever). I'm going to read archives more carefully, though I think even handling the charset in phpinfo() will yield the same discussion in the future. This is a separate issue and nothing to do with changing the behaviour of htmlentities(). --Wez. -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c
Yep, as far as I read the archives, I haven't found any discussions on the charset related backwards problems. So I wrote *exactly* about this issue. You may want to redirect me to bug #9392 (http://bugs.php.net/bug.php?id=9392), but it doens't seem to help... In addition, I found determining the internal charset by LC_CTYPE is dangerous because setlocale() is not thread-safe in some libc implementations (glibc seems to be that one). I'm going to read archives more carefully, though I think even handling the charset in phpinfo() will yield the same discussion in the future. Moriyoshi Koizumi Wez Furlong [EMAIL PROTECTED] wrote: Search the archives for the discussion. phpinfo could determine the charset as your patch does at the start, and then pass the info in php_escape_html_entities. Seems easy to me. --Wez. On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote: Wez Furlong [EMAIL PROTECTED] wrote: Unfortunately, we absolutely must remain 100% backwards compatible with htmlentities(), so this patch should not be applied. Were there any discussions exactly about this issue? Though I have to see some historical reason, however I don't understand why 100% backwards compatibility is required for htmlentities(). Because the patched htmlentities() acts in the same way with default configuration, and IMHO defaulting to iso-8859-1 is quite meaningless for the scripts that uses other charsets than it. Hmm... otherwise I would like to suggest a mbstring function like mb_htmlentities(), but it would sound like a reinvention of the same wheel... However, I don't see a problem with making phpinfo determine the charset and passing that on to the internal htmlentities function? The problem is that php_info_html_esc() in ext/standard/info.c calls php_escape_html_entities() with no charset information specified. Without the patch, every character is treated as ISO-8859-1 even if a fetched character is actually a mere first byte of a multibyte character. Moriyoshi Koizumi -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Development Mailing List http://www.php.net/ To unsubscribe, visit: http://www.php.net/unsub.php