php-i18n Digest 24 Feb 2004 09:58:23 -0000 Issue 216
Topics (messages 667 through 673):
Re: gettext and utf8
667 by: walter fan
668 by: Moriyoshi Koizumi
670 by: walter fan
messages.po HTML encoding
669 by: a.h.s. boy
671 by: Moriyoshi Koizumi
672 by: Moriyoshi Koizumi
mb_strcoll
673 by: Brodie Thiesfield
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Hi,Moriyoshi
Thanks for your information. There is a strange thing, I had modified it to
charset=UTF-8, it can not display UTF-8 strings. But I modified it to charset=CHARSET,
it's okay. My server is RedHat Linux7.2.
Btw, why the translation strings still display by gettext after I deleted the po and
mo files in locale folder?
Thanks & Regards,
Walter Fan
----- Original Message -----
From: "Moriyoshi Koizumi" <[EMAIL PROTECTED]>
To: "walter fan" <[EMAIL PROTECTED]>
Sent: Tuesday, February 10, 2004 3:37 AM
Subject: Re: [PHP-I18N] gettext and utf8
> On 2004/02/05, at 12:42, walter fan wrote:
>
> > 2.I modified the charset of php files and messages.po to UTF-8
> >
> > $iconv -f gb2312 -t utf-8 messages_gb2312.po>messages.po
> > $msgfmt messages.po
> >
> > But the page didn't display UTF-8 strings.
> >
>
> Be sure to specify "Content-Type" in your po file.
>
> ex.
> "Content-Type: text/plain; charset=gb2312"
> "Content-Type: text/plain; charset=UTF-8"
>
> Moriyoshi
>
>
--- End Message ---
--- Begin Message ---
On 2004/02/10, at 9:39, walter fan wrote:
Hi,Moriyoshi
Thanks for your information. There is a strange thing, I had modified
it to charset=UTF-8, it can not display UTF-8 strings. But I modified
it to charset=CHARSET, it's okay. My server is RedHat Linux7.2.
Perhaps you don't issue a correct HTTP header to the browser.
Just add header('Content-Type: text/html; charset=UTF-8')
at the top of your script.
And try using bind_textdomain_codeset() to specify the output charset,
which you
want to send the texts as.
Btw, why the translation strings still display by gettext after I
deleted the po and mo files in locale folder?
Do you mean translated strings continue to appear if the message
catalogs are removed?
Maybe you have two or more base directory and you are specifying
another base to bindtextdomain()
which isn't involved with the removed catalogs.
Moriyoshi
Thanks & Regards,
Walter Fan
----- Original Message -----
From: "Moriyoshi Koizumi" <[EMAIL PROTECTED]>
To: "walter fan" <[EMAIL PROTECTED]>
Sent: Tuesday, February 10, 2004 3:37 AM
Subject: Re: [PHP-I18N] gettext and utf8
On 2004/02/05, at 12:42, walter fan wrote:
2.I modified the charset of php files and messages.po to UTF-8
$iconv -f gb2312 -t utf-8 messages_gb2312.po>messages.po
$msgfmt messages.po
But the page didn't display UTF-8 strings.
Be sure to specify "Content-Type" in your po file.
ex.
"Content-Type: text/plain; charset=gb2312"
"Content-Type: text/plain; charset=UTF-8"
Moriyoshi
--- End Message ---
--- Begin Message ---
Moriyoshi,thanks again. .
I tried your method,but my question still existed. Though I modified the
charset using header('Content-Type: text/html; charset=UTF-8'),
the page still display Chinese characters. So these strings display
garbage.
> Do you mean translated strings continue to appear if the message
> catalogs are removed?[Walter say:Yes]
> Maybe you have two or more base directory and you are specifying
> another base to bindtextdomain()
> which isn't involved with the removed catalogs.
I only have one directory in my server and wrote only one bindtextdomain
function.
I think it may be caused by server(Redhat
Linux7.3+Apache1.3+PHP4.2.2+gettext0.13), but I didn't got the real reason.
Shall I make some configure for the server?
Thanks & Regards,
Walter Fan
"Moriyoshi Koizumi" <[EMAIL PROTECTED]>
>
> On 2004/02/10, at 9:39, walter fan wrote:
>
> > Hi,Moriyoshi
> >
> > Thanks for your information. There is a strange thing, I had modified
> > it to charset=UTF-8, it can not display UTF-8 strings. But I modified
> > it to charset=CHARSET, it's okay. My server is RedHat Linux7.2.
>
> Perhaps you don't issue a correct HTTP header to the browser.
>
> Just add header('Content-Type: text/html; charset=UTF-8')
> at the top of your script.
>
> And try using bind_textdomain_codeset() to specify the output charset,
> which you
> want to send the texts as.
>
> > Btw, why the translation strings still display by gettext after I
> > deleted the po and mo files in locale folder?
>
> Do you mean translated strings continue to appear if the message
> catalogs are removed?
> Maybe you have two or more base directory and you are specifying
> another base to bindtextdomain()
> which isn't involved with the removed catalogs.
>
> Moriyoshi
>
> > Thanks & Regards,
> > Walter Fan
> > ----- Original Message -----
> > From: "Moriyoshi Koizumi" <[EMAIL PROTECTED]>
> > To: "walter fan" <[EMAIL PROTECTED]>
> > Sent: Tuesday, February 10, 2004 3:37 AM
> > Subject: Re: [PHP-I18N] gettext and utf8
> >
> >
> >> On 2004/02/05, at 12:42, walter fan wrote:
> >>
> >>> 2.I modified the charset of php files and messages.po to UTF-8
> >>>
> >>> $iconv -f gb2312 -t utf-8 messages_gb2312.po>messages.po
> >>> $msgfmt messages.po
> >>>
> >>> But the page didn't display UTF-8 strings.
> >>>
> >>
> >> Be sure to specify "Content-Type" in your po file.
> >>
> >> ex.
> >> "Content-Type: text/plain; charset=gb2312"
> >> "Content-Type: text/plain; charset=UTF-8"
> >>
> >> Moriyoshi
> >>
> >>
> >
> >
--- End Message ---
--- Begin Message ---
I have a functional gettext-based internationalized content management
system for a while now. A number of translators have offered their
support, and I have localization files for Swedish, Norwegian, Chinese,
Arabic, Turkish, Japanese, Spanish, etc.
The PHP software system is utf-8 based, so character sets haven't been
an issue. Indeed, everything's been working quite well, but I just
noticed a procedural item that made me wonder what the best approach
is.
When non-roman language translators (japanese, arabic, chinese) send me
their messages.po files, I open and save them as "utf-8 (no BOM)" files
to preserve their integrity. (I use BBEdit on Mac OS X, which handles
this nicely).
When using Spanish, Swedish, etc files, however, many of the
translators have converted the text strings to HTML entities, e.g.
"español". In one way, this makes sense, since they are to be
displayed on a web page. But is it the right thing to do? Or should
such strings be in messages.po with all their accents, and converted
with htmlspecialchars() before output?
The issue cropped up because I'm converting the site to XHTML 1.1
output, and that means encoding things like ampersands. I have
functions for creating drop-down menus (e.g. "categories" and
"languages"). If a menu has an item like a "Crime & Punishment"
category, I'd want to convert it to "Crime & Punishment" for XHTML
compliance. But I don't want the language menu to RE-encode
"español" as "espan&ntilde;ol", which would screw everything
up.
So what's the best way to handle the relationship between HTML entities
and gettext-based messages.po files?
In fact, the larger question is: do HTML entities really need to be
entity-ized on utf-8 pages, whose character set actually should be
capable of displaying the characters? Obviously "htmlspecialchars()"
handles characters that cause output problems (like < and >, which
indicate tag opening/closing), but for a utf-8 based system, "n tilde"
doesn't need to be encoded at all, does it?
It seems like early HTML education would state categorically that
"espaņol" needs to be written as "espagñol" on a web page, but
that isn't really true for utf-8 pages, is it?
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
On 2004/02/11, at 1:45, a.h.s. boy wrote:
When using Spanish, Swedish, etc files, however, many of the
translators have converted the text strings to HTML entities, e.g.
"español". In one way, this makes sense, since they are to be
displayed on a web page. But is it the right thing to do? Or should
such strings be in messages.po with all their accents, and converted
with htmlspecialchars() before output?
Yep, I guess you should. It'd not be a good idea to have accented
characters as
entities in the .po file, because it only makes sense when gettext is
used in
conjunction with HTML / XML. Besides you won't need to convert such
strings into
their entitied form as long as you choose UTF-8 as the output charset.
The issue cropped up because I'm converting the site to XHTML 1.1
output, and that means encoding things like ampersands. I have
functions for creating drop-down menus (e.g. "categories" and
"languages"). If a menu has an item like a "Crime & Punishment"
category, I'd want to convert it to "Crime & Punishment" for XHTML
compliance. But I don't want the language menu to RE-encode
"español" as "espan&ntilde;ol", which would screw
everything up.
So what's the best way to handle the relationship between HTML
entities and gettext-based messages.po files?
In fact, the larger question is: do HTML entities really need to be
entity-ized on utf-8 pages, whose character set actually should be
capable of displaying the characters? Obviously "htmlspecialchars()"
handles characters that cause output problems (like < and >, which
indicate tag opening/closing), but for a utf-8 based system, "n tilde"
doesn't need to be encoded at all, does it?
It seems like early HTML education would state categorically that
"espaņol" needs to be written as "espagñol" on a web page, but
that isn't really true for utf-8 pages, is it?
spud.
-------------------------------------------------------------------
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org/
-------------------------------------------------------------------
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
I just clicked "send" button too early. Please ignore the previous one,
sorry :)
On 2004/02/11, at 1:45, a.h.s. boy wrote:
When using Spanish, Swedish, etc files, however, many of the
translators have converted the text strings to HTML entities, e.g.
"español". In one way, this makes sense, since they are to be
displayed on a web page. But is it the right thing to do? Or should
such strings be in messages.po with all their accents, and converted
with htmlspecialchars() before output?
Yep, I guess you should. It'd not be a good idea to have accented
characters as
entities in the .po file, because it only makes sense when gettext is
used in
conjunction with HTML / XML. Besides you won't need to convert such
strings into
their entitied form as long as you choose UTF-8 as the output charset.
In fact, the larger question is: do HTML entities really need to be
entity-ized on utf-8 pages, whose character set actually should be
capable of displaying the characters? Obviously "htmlspecialchars()"
handles characters that cause output problems (like < and >, which
indicate tag opening/closing), but for a utf-8 based system, "n tilde"
doesn't need to be encoded at all, does it?
They don't have to be entitized, as the core idea behind HTML entitiy
is to represent
various characters in a document written in a legacy character set
which are not always
available across any other character sets. UTF-8 is developed to
resolve such issues.
Moriyoshi
--- End Message ---
--- Begin Message ---
Hi,
Is there any plans to make a multibyte version of strcoll? Or at least a
version which supports utf-8 and uses the unicode collation
algorithm/tables (http://www.unicode.org/reports/tr10/)?
Regards,
Brodie
--- End Message ---