Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c

2002-10-24 Thread Edin Kadribasic
Hey Moriyoshi,

Sorry for my late entry into the debate, but I run into
htmlentities() default charset problem today. I wonder why did you
opt to use mbstring ini setting (thus making this nice feature
mbstring dependant) when we have default_charset ini setting.

It just sounds more logical to me to use SG(default_charset) for the
default charset of htmlentities(). Your thoughts?

Edin

- Original Message -
From: Moriyoshi Koizumi [EMAIL PROTECTED]
To: Wez Furlong [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, October 17, 2002 7:48 AM
Subject: Re: [PHP-DEV] [PATCH] Changing entity charset
handlinginext/standard/html.c


 Yep, as far as I read the archives, I haven't found any
discussions on the
 charset related backwards problems. So I wrote *exactly* about
this
 issue.

 You may want to redirect me to bug #9392
(http://bugs.php.net/bug.php?id=9392), but it doens't seem to
help...

 In addition, I found determining the internal charset by LC_CTYPE
is
 dangerous because setlocale() is not thread-safe in some libc
 implementations (glibc seems to be that one).

 I'm going to read archives more carefully, though I think even
handling
 the charset in phpinfo() will yield the same discussion in the
future.


 Moriyoshi Koizumi

 Wez Furlong [EMAIL PROTECTED] wrote:

  Search the archives for the discussion.
  phpinfo could determine the charset as your patch does at the
start,
  and then pass the info in php_escape_html_entities.
 
  Seems easy to me.
 
  --Wez.
 
  On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED]
wrote:
   Wez Furlong [EMAIL PROTECTED] wrote:
Unfortunately, we absolutely must remain 100% backwards
compatible with
htmlentities(), so this patch should not be applied.
  
   Were there any discussions exactly about this issue? Though I
have to see
   some historical reason, however I don't understand why 100%
backwards
   compatibility is required for htmlentities().
   Because the patched htmlentities() acts in the same way with
default
   configuration, and IMHO defaulting to iso-8859-1 is quite
meaningless for
   the scripts that uses other charsets than it.
  
   Hmm... otherwise I would like to suggest a mbstring function
like
   mb_htmlentities(), but it would sound like a reinvention of
the same
   wheel...
  
However, I don't see a problem with making phpinfo determine
the charset
and passing that on to the internal htmlentities function?
  
   The problem is that php_info_html_esc() in ext/standard/info.c
calls
   php_escape_html_entities() with no charset information
specified. Without
   the patch, every character is treated as ISO-8859-1 even if a
fetched
   character is actually a mere first byte of a multibyte
character.
  
  
   Moriyoshi Koizumi
  
  
  
   --
   PHP Development Mailing List http://www.php.net/
   To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 


 --
 PHP Development Mailing List http://www.php.net/
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c

2002-10-24 Thread Moriyoshi Koizumi
Hello Edin,

I don't know if your proposal is logical or not, I figured the problem 
in the historical context, that it shouldn't rely on mbstring ext too much 
because it's been there since mbstring(formerly called as jstring) was 
introduced.

But, It's a problem that the internal encoding is not necessarily the same 
as the output encoding when mb_output_handler is enabled. So in this sense, 
giving more priority to mbstring.internal_encoding is quite natural to me.

In addition there's a hack in mbstring.c that overrides the Content-Type 
header whatever the SAPI setting is, when output handler is enabled by the 
ini setting.

I think the real issue is we have two similar options that seem to 
stay different as long as the ZE's parser doens't support various charsets, 
at least those which can be handled by the current version of mbstring.

You may want to point me out that we already have --enable-zend-multibyte,
but it's virtually a hack IMO, and it should be integrated to the core at 
lower level in the future version.

BTW, the temporary solution is to give a priority to each setting, like

1. MBSTRG(internal_encoding)
2. SG(default_charset)
3. System's locale setting

How about this option?


Moriyoshi

Edin Kadribasic [EMAIL PROTECTED] wrote:

 Hey Moriyoshi,
 
 Sorry for my late entry into the debate, but I run into
 htmlentities() default charset problem today. I wonder why did you
 opt to use mbstring ini setting (thus making this nice feature
 mbstring dependant) when we have default_charset ini setting.
 
 It just sounds more logical to me to use SG(default_charset) for the
 default charset of htmlentities(). Your thoughts?
 
 Edin
 
 - Original Message -
 From: Moriyoshi Koizumi [EMAIL PROTECTED]
 To: Wez Furlong [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Thursday, October 17, 2002 7:48 AM
 Subject: Re: [PHP-DEV] [PATCH] Changing entity charset
 handlinginext/standard/html.c
 
 
  Yep, as far as I read the archives, I haven't found any
 discussions on the
  charset related backwards problems. So I wrote *exactly* about
 this
  issue.
 
  You may want to redirect me to bug #9392
 (http://bugs.php.net/bug.php?id=9392), but it doens't seem to
 help...
 
  In addition, I found determining the internal charset by LC_CTYPE
 is
  dangerous because setlocale() is not thread-safe in some libc
  implementations (glibc seems to be that one).
 
  I'm going to read archives more carefully, though I think even
 handling
  the charset in phpinfo() will yield the same discussion in the
 future.
 
 
  Moriyoshi Koizumi
 
  Wez Furlong [EMAIL PROTECTED] wrote:
 
   Search the archives for the discussion.
   phpinfo could determine the charset as your patch does at the
 start,
   and then pass the info in php_escape_html_entities.
  
   Seems easy to me.
  
   --Wez.
  
   On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED]
 wrote:
Wez Furlong [EMAIL PROTECTED] wrote:
 Unfortunately, we absolutely must remain 100% backwards
 compatible with
 htmlentities(), so this patch should not be applied.
   
Were there any discussions exactly about this issue? Though I
 have to see
some historical reason, however I don't understand why 100%
 backwards
compatibility is required for htmlentities().
Because the patched htmlentities() acts in the same way with
 default
configuration, and IMHO defaulting to iso-8859-1 is quite
 meaningless for
the scripts that uses other charsets than it.
   
Hmm... otherwise I would like to suggest a mbstring function
 like
mb_htmlentities(), but it would sound like a reinvention of
 the same
wheel...
   
 However, I don't see a problem with making phpinfo determine
 the charset
 and passing that on to the internal htmlentities function?
   
The problem is that php_info_html_esc() in ext/standard/info.c
 calls
php_escape_html_entities() with no charset information
 specified. Without
the patch, every character is treated as ISO-8859-1 even if a
 fetched
character is actually a mere first byte of a multibyte
 character.
   
   
Moriyoshi Koizumi
   
   
   
--
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php
  
  
  
 
 
  --
  PHP Development Mailing List http://www.php.net/
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 
 
 -- 
 PHP Development Mailing List http://www.php.net/
 To unsubscribe, visit: http://www.php.net/unsub.php
 


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c

2002-10-24 Thread Edin Kadribasic
 BTW, the temporary solution is to give a priority to each setting,
like

 1. MBSTRG(internal_encoding)
 2. SG(default_charset)
 3. System's locale setting

 How about this option?

This sounds fine. This way people who compile php without mbstring
support can alter the default charset.

Edin


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c

2002-10-17 Thread Wez Furlong

On 10/17/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote:
 Yep, as far as I read the archives, I haven't found any discussions on the 
 charset related backwards problems. So I wrote *exactly* about this 
 issue.

Search for htmlentities charset.  Both myself and thies (and probably others
were discussing this).
In short: there are many, many, many people who have scripts that rely
on htmlentities defaulting to iso-8859-1 (the documented default for ever).

 I'm going to read archives more carefully, though I think even handling 
 the charset in phpinfo() will yield the same discussion in the future.

This is a separate issue and nothing to do with changing the behaviour of
htmlentities().

--Wez.


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] [PATCH] Changing entity charset handlinginext/standard/html.c

2002-10-16 Thread Moriyoshi Koizumi

Yep, as far as I read the archives, I haven't found any discussions on the 
charset related backwards problems. So I wrote *exactly* about this 
issue.

You may want to redirect me to bug #9392 (http://bugs.php.net/bug.php?id=9392), but it 
doens't seem to help...

In addition, I found determining the internal charset by LC_CTYPE is 
dangerous because setlocale() is not thread-safe in some libc 
implementations (glibc seems to be that one).

I'm going to read archives more carefully, though I think even handling 
the charset in phpinfo() will yield the same discussion in the future.


Moriyoshi Koizumi

Wez Furlong [EMAIL PROTECTED] wrote:

 Search the archives for the discussion.
 phpinfo could determine the charset as your patch does at the start,
 and then pass the info in php_escape_html_entities.
 
 Seems easy to me.
 
 --Wez.
 
 On 10/16/02, Moriyoshi Koizumi [EMAIL PROTECTED] wrote:
  Wez Furlong [EMAIL PROTECTED] wrote:
   Unfortunately, we absolutely must remain 100% backwards compatible with
   htmlentities(), so this patch should not be applied.
  
  Were there any discussions exactly about this issue? Though I have to see 
  some historical reason, however I don't understand why 100% backwards 
  compatibility is required for htmlentities().
  Because the patched htmlentities() acts in the same way with default 
  configuration, and IMHO defaulting to iso-8859-1 is quite meaningless for 
  the scripts that uses other charsets than it.
  
  Hmm... otherwise I would like to suggest a mbstring function like 
  mb_htmlentities(), but it would sound like a reinvention of the same 
  wheel...
  
   However, I don't see a problem with making phpinfo determine the charset
   and passing that on to the internal htmlentities function?
  
  The problem is that php_info_html_esc() in ext/standard/info.c calls 
  php_escape_html_entities() with no charset information specified. Without 
  the patch, every character is treated as ISO-8859-1 even if a fetched 
  character is actually a mere first byte of a multibyte character.
  
  
  Moriyoshi Koizumi
  
  
  
  -- 
  PHP Development Mailing List http://www.php.net/
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 


-- 
PHP Development Mailing List http://www.php.net/
To unsubscribe, visit: http://www.php.net/unsub.php