php-i18n Digest 16 Dec 2002 14:28:44 -0000 Issue 137

Topics (messages 369 through 377):

Re: htmlspecialchars and UTF-8
        369 by: Moriyoshi Koizumi
        371 by: a.h.s. boy
        372 by: Renato De Giovanni
        373 by: Renato De Giovanni
        374 by: Moriyoshi Koizumi
        375 by: Moriyoshi Koizumi

Unicode localizations?
        370 by: a.h.s. boy

Questions about multi-lingual web site
        376 by: William Lam
        377 by: n0n4m3d

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
Hi,

If you want to do the full entity translation, use htmlentities() instead 
of htmlspecialchars()

http://www.php.net/htmlspecialchars

Moriyoshi

"Renato De Giovanni" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> We're working on a program that needs to generate an xml document, utf-8 
> encoded, based on information stored with another charset encoding inside a 
> database.
> 
> So basically at some point we have a function that looks like:
> 
> function encodeString($s, $encoding)
> {
>   $s = mb_convert_encoding($s, 'UTF-8', $encoding);
> 
>   $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
> 
>   return $s;
> }
> 
> What happens is that "htmlspecialchars" seems to be always returning a 
> latin1 string, therefore causing problems with the generated document 
> supposedly utf-8 encoded.
> 
> My question is: is this a php bug or am I misunderstanding something related 
> to multi-byte character support?
> 
> PHP version is 4.2.3
> 
> Thank you very much!
> --
> Renato
> CRIA - Centro de Referencia em Informacao Ambiental
> http://www.cria.org.br/
> 
> 
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
> 
> 
> -- 
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

--- End Message ---
--- Begin Message ---
On Wednesday, December 4, 2002, at 01:37  PM, Renato De Giovanni wrote:

What happens is that "htmlspecialchars" seems to be always returning a
latin1 string, therefore causing problems with the generated document
supposedly utf-8 encoded.

My question is: is this a php bug or am I misunderstanding something related
to multi-byte character support?
It's probable that it's a PHP...erm..."fact of life" right now. I ran into similar problems with iso-8859-7 and -9, using both htmlspecialchars and htmlentities with the (optional) 3rd parameter. Things worked unpredictably. In the PHP build I have now (4.4ish, from recent CVS), htmlspecialchars actually prints out a PHP error message (E_WARNING, I believe) that:

"ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"

So I wouldn't be surprised if you weren't running into this problem, which wasn't officially recognized until after 4.2 was released. Look at bugs.php.net for related bugs...it's the only good way to keep up on the issue, which seems to be evolving...

Cheers,
spud.

PHP version is 4.2.3

Thank you very much!
--
Renato
CRIA - Centro de Referencia em Informacao Ambiental
http://www.cria.org.br/


--
This message has been scanned for viruses and
dangerous content and is believed to be clean.


--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org            "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------

--- End Message ---
--- Begin Message ---
Hi Moriyoshi,

To generate a valid xml document I only need to escape five characters 
inside any content: &<>"'

So what I really need is "htmlspecialchars", not "htmlentities". 
And besides the unnecessary translation of many characters, by doing so 
"htmlentities" produces an invalid xml document...
--
Renato

On 5 Dec 2002 at 4:22, Moriyoshi Koizumi wrote:

> Hi,
> 
> If you want to do the full entity translation, use htmlentities() instead 
> of htmlspecialchars()
> 
> http://www.php.net/htmlspecialchars
> 
> Moriyoshi
> 
> "Renato De Giovanni" wrote:
> 
> > Hi,
> > 
> > We're working on a program that needs to generate an xml document, utf-8 
> > encoded, based on information stored with another charset encoding inside a 
> > database.
> > 
> > So basically at some point we have a function that looks like:
> > 
> > function encodeString($s, $encoding)
> > {
> >   $s = mb_convert_encoding($s, 'UTF-8', $encoding);
> > 
> >   $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
> > 
> >   return $s;
> > }
> > 
> > What happens is that "htmlspecialchars" seems to be always returning a 
> > latin1 string, therefore causing problems with the generated document 
> > supposedly utf-8 encoded.
> > 
> > My question is: is this a php bug or am I misunderstanding something related 
> > to multi-byte character support?
> > 
> > PHP version is 4.2.3
> > 
> > Thank you very much!
> > --
> > Renato
> > CRIA - Centro de Referencia em Informacao Ambiental
> > http://www.cria.org.br/

--
This message has been scanned for viruses and
dangerous content and is believed to be clean.

--- End Message ---
--- Begin Message ---
> It's probable that it's a PHP...erm..."fact of life" right now. I ran 
> into similar problems with iso-8859-7 and -9, using both 
> htmlspecialchars and htmlentities with the (optional) 3rd parameter. 
> Things worked unpredictably. In the PHP build I have now (4.4ish, from 
> recent CVS), htmlspecialchars actually prints out a PHP error message 
> (E_WARNING, I believe) that:
> 
> "ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"
> 
> So I wouldn't be surprised if you weren't running into this problem, 
> which wasn't officially recognized until after 4.2 was released. Look 
> at bugs.php.net for related bugs...it's the only good way to keep up on 
> the issue, which seems to be evolving...
> 
> Cheers,
> spud.

Ok, so it's a known "missing feature".

Meanwhile, it's possible to replace:

$s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');

with:

mb_regex_encoding('UTF-8');
$s = mb_ereg_replace('&', '&amp;', $s);
$s = mb_ereg_replace('>', '&gt;', $s);
$s = mb_ereg_replace('<', '&lt;', $s);
$s = mb_ereg_replace('"', '&quot;', $s);

...which should decrease performance considerably, but I see no other 
workaround.

Thanks,
--
Renato

--
This message has been scanned for viruses and
dangerous content and is believed to be clean.

--- End Message ---
--- Begin Message ---
Hi Renato,

Hmm, I mistook your purpose somewhat...
I've just looked into the source codes but no related bugs have been found, 
as I was also working on the improvement of htmlspecialchars() and 
htmlentities() a month ago.

Moriyoshi

"Renato De Giovanni" <[EMAIL PROTECTED]> wrote:

> Hi Moriyoshi,
> 
> To generate a valid xml document I only need to escape five characters 
> inside any content: &<>"'
> 
> So what I really need is "htmlspecialchars", not "htmlentities". 
> And besides the unnecessary translation of many characters, by doing so 
> "htmlentities" produces an invalid xml document...
> --
> Renato
> 
> On 5 Dec 2002 at 4:22, Moriyoshi Koizumi wrote:
> 
> > Hi,
> > 
> > If you want to do the full entity translation, use htmlentities() instead 
> > of htmlspecialchars()
> > 
> > http://www.php.net/htmlspecialchars
> > 
> > Moriyoshi
> > 
> > "Renato De Giovanni" wrote:
> > 
> > > Hi,
> > > 
> > > We're working on a program that needs to generate an xml document, utf-8 
> > > encoded, based on information stored with another charset encoding inside a 
> > > database.
> > > 
> > > So basically at some point we have a function that looks like:
> > > 
> > > function encodeString($s, $encoding)
> > > {
> > >   $s = mb_convert_encoding($s, 'UTF-8', $encoding);
> > > 
> > >   $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
> > > 
> > >   return $s;
> > > }
> > > 
> > > What happens is that "htmlspecialchars" seems to be always returning a 
> > > latin1 string, therefore causing problems with the generated document 
> > > supposedly utf-8 encoded.
> > > 
> > > My question is: is this a php bug or am I misunderstanding something related 
> > > to multi-byte character support?
> > > 
> > > PHP version is 4.2.3
> > > 
> > > Thank you very much!
> > > --
> > > Renato
> > > CRIA - Centro de Referencia em Informacao Ambiental
> > > http://www.cria.org.br/
> 
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
> 
> 
> -- 
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

--- End Message ---
--- Begin Message ---
Hi,

For your interest: htmlentities() and htmlspecialchars() support following 
codesets in 4.3.0.

ISO-8859-1 (alias: ISO8859-1)
ISO-8859-15 (alias: ISO8859-15)
UTF-8
cp1252 (alias: Windows-1252, 1252)
BIG5 (alias: 950)
GB2312 (alias: 936)
BIG5-HKSCS
Shift_JIS (alias: SJIS, 932)
EUCJP (alias: EUC-JP)

Regards,
Moriyoshi

"a.h.s. boy" <[EMAIL PROTECTED]> wrote:

> On Wednesday, December 4, 2002, at 01:37  PM, Renato De Giovanni wrote:
> 
> > What happens is that "htmlspecialchars" seems to be always returning a
> > latin1 string, therefore causing problems with the generated document
> > supposedly utf-8 encoded.
> >
> > My question is: is this a php bug or am I misunderstanding something 
> > related
> > to multi-byte character support?
> 
> It's probable that it's a PHP...erm..."fact of life" right now. I ran 
> into similar problems with iso-8859-7 and -9, using both 
> htmlspecialchars and htmlentities with the (optional) 3rd parameter. 
> Things worked unpredictably. In the PHP build I have now (4.4ish, from 
> recent CVS), htmlspecialchars actually prints out a PHP error message 
> (E_WARNING, I believe) that:
> 
> "ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"
> 
> So I wouldn't be surprised if you weren't running into this problem, 
> which wasn't officially recognized until after 4.2 was released. Look 
> at bugs.php.net for related bugs...it's the only good way to keep up on 
> the issue, which seems to be evolving...
> 
> Cheers,
> spud.
> 
> >
> > PHP version is 4.2.3
> >
> > Thank you very much!
> > --
> > Renato
> > CRIA - Centro de Referencia em Informacao Ambiental
> > http://www.cria.org.br/
> >
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content and is believed to be clean.
> >
> >
> > -- 
> > PHP Internationalization Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
> >
> -------------------------------------------------------------------
> a.h.s. boy
> spud(at)nothingness.org            "as yes is to if,love is to yes"
> http://www.nothingness.org/
> -------------------------------------------------------------------
> 
> 
> -- 
> PHP Internationalization Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

--- End Message ---
--- Begin Message --- While modifying my internationalized application so that it not only supports multiple languages, but tries its best to display Turkish, Greek and English nicely on the same page, I ended up using UTF-8 as my default encoding. It makes the languages work nicely together, but it ruins other aspects of localization.

Since "switching languages" no longer switches encodings for the page (e.g. changing to "Greek" does NOT change the encoding for the page from UTF-8 to ISO-8859-7), the only real effect it has is on date and currency formatting. Unfortunately, while my input text displays beautifully in Greek, the Linux (Red Hat) Greek localization file (or whatever its nomenclature) at /usr/share/i18n/locales/el_GR seems to specify ISO-8859-7 as the charset, so my dates aren't displaying correctly in UTF-8.

D'oh! Silly me...I just discovered that there's a folder for "el_GR.utf8". Specifying that seems to do the trick! How easy that was!

Apologies for dragging you through my discovery process, but I figure this might be worth sending anyway, to preempt any future travelers in my fool's footsteps.

Cheers,
spud.

-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------

--- End Message ---
--- Begin Message ---
Hi all.

I've just got a project that develop a Traditonal Chinese (Big5) &
Simplified Chinese (GB) web site.

When I was working with multi-lingual web site, I used to stored every
single string in the database and load it on each page. It's very slow.

Is there a better method to do that?? Also any tutorial / article available
online??

I've heard people to sue gettext to do it. I'm planning to use Smarty with
my application, how should I work together?

I'll be using PHP 4.23 with MySQL

Thanks,

William Lam
Solution Specialist
Zenon Consulting


--- End Message ---
--- Begin Message ---
You can make a query only in the 1st page load.

Not in every pageload...

And you can make your uor template engines...  prepared with the
miltilanguage purpose...


Register an Array array of your result

IF (!$_SESSION[$strings]) {
    session_start();
    $_SESSION[$strings] = mysql_fetch_array($result);
}
echo "<HR><PRE>\n";
    print_r($_SESSION);
echo "</PRE><HR>\n";


"William Lam" <[EMAIL PROTECTED]> escreveu na mensagem
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Hi all.
>
> I've just got a project that develop a Traditonal Chinese (Big5) &
> Simplified Chinese (GB) web site.
>
> When I was working with multi-lingual web site, I used to stored every
> single string in the database and load it on each page. It's very slow.
>
> Is there a better method to do that?? Also any tutorial / article
available
> online??
>
> I've heard people to sue gettext to do it. I'm planning to use Smarty with
> my application, how should I work together?
>
> I'll be using PHP 4.23 with MySQL
>
> Thanks,
>
> William Lam
> Solution Specialist
> Zenon Consulting


--- End Message ---

Reply via email to