php-i18n Digest 29 Nov 2004 19:34:50 -0000 Issue 264
Topics (messages 813 through 818):
Re: Accented characters
813 by: steve
816 by: Christophe Chisogne
817 by: steve
Using Translation from PEAR, other libraries
814 by: Jacob Singh
818 by: Jochem Maas
Re: GETTEXT strings occasionally don't get translated
815 by: Xavier O
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Jacob Singh wrote:
> Anyway, I had the same problem as steve (I think, I've read the entire
> thread). It was a HASSTLE. we got a new server after a crash so I
> uploaded my DB dump from the local box onto a fresh mySQL 3.23 and
> Apache 2. It seemed more or less okay, I didn't test extensively and a
> week later after 10,000 inserts had been made, I realized the accented
> chars were screwed up, and we use about 25 of them (28 to be exact). So
> I looked in the DB and lo and behold, they were all corrupted and
> replaced with the char combos Steve mentioned. To be specific it was an
> upper case A with two dots above it followed by another char, usually
> something weird like the Euro or a Cubed exponent.
Well, I *think* I may have located my problem - and I think that Apache, PHP
and MySQL are all in the clear - the problem was MySQLcc.
Here's why I think that. I dumped a table from the live server (as I did
before) and viewed it with Kwrite (which is set to open/save as
iso-8859-1). This showed that the dumped file did indeed contain latin1
characters.
Now, I had been uploading the tables to my local server using the SQL panel
in MySQLcc - just open the dumped file, and go... When the data were viewed
in MySQLcc, the characters rendered correctly and (here's the snag)
according to all the MySQL config files and the server info in MySQLcc,
everything MySQL-related was set to use latin1. BUT...
If I dumped the table from the local server using mysqldump, the accented
chars were now in utf-8 - note, no Apache or PHP involved. This told me it
was a MySQL problem.
So, I took the original dumped file, which I knew to be in latin1 and
uploaded it to the MySQL server from the command line:
mysql -D database < filename.sql
And lo! It was *still* in latin1 and now works correctly on the web page set
to iso-8859-1.
But, the characters don't render correctly when viewed with MySQLcc - which
I'm now convinced is using utf-8. I can't find any config settings for
MySQLcc relating to encoding, so maybe it's something to do with KDE? I
don't know - thing is, I've solved the problem. I'll just avoid using
MySQLcc for loading tables.
I'll be sticking with latin1 (or maybe iso-8859-15). I'll never produce a
site that uses more than English and French, so messing with unicode is
just too much grief for me...
--
@+
Steve
--- End Message ---
--- Begin Message ---
steve wrote:
But, the characters don't render correctly when viewed with MySQLcc - which
I'm now convinced is using utf-8. I can't find any config settings for
MySQLcc relating to encoding, so maybe it's something to do with KDE? I
If mysqlcc uses locales, just set locale before launching it (via xterm)
Under Linux and for french locale, you can choose it via LANG env variable:
LANG=fr_BE.iso88591 mysqlcc
[EMAIL PROTECTED] mysqlcc
LANG=fr_FR.utf8 mysqlcc
Sorry, I dont use KDE/Gnome very often. But I guess they both defaults
to utf-8 these days.
I'll be sticking with latin1 (or maybe iso-8859-15). I'll never produce a
site that uses more than English and French
As Tex pointed out,
"1) ISO 8859-1 does not have the Euro character so is not really suitable for
France or Europe, unless you never have or discuss commercial transactions.
and "(...) Greek (...) is also not covered by latin-1)"
About iso-8859-15 (aka latin9, aka latin0), from "man iso_8859-15":
"(...latin1...) lacks the EURO symbol and does not fully cover Finnish and
French.
ISO 8859-15 is a modification of ISO 8859-1 that covers these needs
FYI I made a diff between latin1 and latin9 (with man -7 and diff)
hex iso-8859-1/latin1 iso-8859-15/latin9
----------------------------------------------------------------------------
A4 CURRENCY SIGN EURO SIGN
A6 BROKEN BAR LATIN CAPITAL LETTER S WITH CARON
A8 DIAERESIS LATIN SMALL LETTER S WITH CARON
B4 ACUTE ACCENT LATIN CAPITAL LETTER Z WITH CARON
B8 CEDILLA LATIN SMALL LETTER Z WITH CARON
BC VULGAR FRACTION ONE QUARTER LATIN CAPITAL LIGATURE OE
BD VULGAR FRACTION ONE HALF LATIN SMALL LIGATURE OE
BE VULGAR FRACTION THREE QUARTERS LATIN CAPITAL LETTER Y WITH DIAERESIS
Be carefull that some chars are undef in latin1 (hex 80-9F, deci 128-159).
You also need to take into account that Micro$oft, in his whole little world,
has its own "latin1" : cp1252 [1]. As windows users often used it and it's
incompatible with latin1 and add a few chars to latin1 in 0x80-0x9F range.
This means some translations must take place, whatever you choose
(latin1, latin9, utf-8).
Some facts can be worth knowing. Ex M$ cp1252 char A4 is 'Currency sign' too.
But M$ fonts (ex Arial) really use 'euro sign' for that char (even from
window95,
with ms 'euro patches'). So lack of Euro sign can be dealt with by simply
stating
(as ms) that A4 sign is euro sign. Ugly, but works quite well : ms users are
happy, but pblm remains with Mac/Unix/oldwindows users
Using iso-8859-15 also means not (really) using iso-8859-1, which is
the same as Unicode in lower 8 bits. To prepare Unicode migration
(utf-8 or other encodings), perhaps it's better to choose latin1
As Tex said too
"you will have to either go thru the work to convert to utf-8 anyway"
Everyone is migrating to Unicode (often utf-8 encoding), to avoid
encoding problems/headaches. So you'll have to do it someday.
But not everyone is always up-to-date, on the edge, etc.
Ex many people still uses Windows98 (21% of google users in mid 2004), [2]
not the newest XP. That said, there was already some Unicode support
back to Ms-Office 97.
Everyone is moving to Unicode, it's up to you decide when you'll do it.
Personnaly, I thinks that, for very 'local' websites
(like only English/French/Dutch in Belgium/France)
latin1 is still an option, even if utf-8 will replace it
in a somewhat near future -- I mean when (nearly) all "old"
softs/web-apps using latin1 will be upgraded to Unicode.
But yes, Unicode will be the only choice quite soon,
so be prepared seems a good idea
Christophe
[1] cp1252
http://www.microsoft.com/typography/unicode/1252.htm
[2] april 2004 zeitgeist google
http://www.google.com/press/zeitgeist/zeitgeist-apr04.html
--- End Message ---
--- Begin Message ---
Christophe Chisogne wrote:
[lots of useful info snipped]
Thanks for all that, Christophe - still digesting much of it.
As I don't code or web design for a living, and now have a site that's at
least working, I think I can safely put off the whole unicode issue
indefinitely ;-)
One issue remains in my mind, should I decide to go the unicode route: given
that my hosting company uses latin1 for its mysql server, are there any
issues in having a myql server using latin1 while web pages and the data
itself are utf-8? I've already proven that I can convert latin1 data to
utf-8 without even trying... :-(
--
@+
Steve
--- End Message ---
--- Begin Message ---
What is the common framework people use for I18N on your sites? John
Coggenshall has an article in PHPBuilder about using smarty filters. I
don't really approve of this approach because it is forcing me into
Smarty, which I am not particularly fond of.
I like the look of PEAR::translation2, but I am not sure about the best
way to implement it. I feel that a good I18N package, like any other
package, doesn't compromise your framework intentions. This one seems
to require that you use PEAR:DB through their connection, which is a
problem because of connection pooling, and the fact that I don't use
PEAR:DB, I am using propel.
Any thoughts on this? I need to make a site that is UTF-8 and has
translations not only for labels and images, but in many cases for
actual data.
I'm thinking of storing my data in an XML format in MySQL with multiple
translations and making my own search index for each language. The
problem with this is that I have to grab the entire XML doc for each
field which may have 10-15 translations, parse and then display, wasting
lots of processing and database time.
I'm not farmilliar with XML databases, and I'm told they are bad voodo,
but what is another solution if you have to store user entered records
in 'n' languages?
Thanks
Jacob
--- End Message ---
--- Begin Message ---
Jacob Singh wrote:
What is the common framework people use for I18N on your sites? John
Coggenshall has an article in PHPBuilder about using smarty filters. I
don't really approve of this approach because it is forcing me into
Smarty, which I am not particularly fond of.
I like the look of PEAR::translation2, but I am not sure about the best
way to implement it. I feel that a good I18N package, like any other
package, doesn't compromise your framework intentions. This one seems
to require that you use PEAR:DB through their connection, which is a
problem because of connection pooling, and the fact that I don't use
PEAR:DB, I am using propel.
Any thoughts on this? I need to make a site that is UTF-8 and has
translations not only for labels and images, but in many cases for
actual data.
a few thoughts:
1. I believe translation is integral to any web framework because a
framework is about managing contextual content display (and the language
is a variable attribute of the content). Also I wouldn't expect alot of
code out there that doesn't come with some baggage (from the point fo
view of your own framework), then again there is nothing to stop you
from stripping down a PEAR module to suit your needs.
2. I view static text (e.g. button labels) and user text as
fundamentally different - for the static texts I use a class that
handles translating placeholder strings and for user created text I have
an integrated translation service in my data objects - one tells the DB
class to attempt (if a translation for the current language is not found
then the original value is shown) to translate relevant values (i.e.
fields marked in the data objects as 'translatable') when 'getting'
values, the translations
are stored in a seperate table ala:
KEY - a user created string taken from an arbitrary row & table
in the DB.
LANG - a language code relating to the language of the value of the
TEXT field
TEXT - the translated value of KEY
I'm thinking of storing my data in an XML format in MySQL with multiple
translations and making my own search index for each language. The
problem with this is that I have to grab the entire XML doc for each
field which may have 10-15 translations, parse and then display, wasting
lots of processing and database time.
I'm not farmilliar with XML databases, and I'm told they are bad voodo,
but what is another solution if you have to store user entered records
in 'n' languages?
the table I describe above actaully covers that scenario - how you
present the management interface is ofcourse up to you. for a given KEY
(text te translate) and LANG (id of the desired language) it is possible
to retrieve a translation - the table stipulate the 3 bits of
information required for every/any specific translation that needs to
occur. you could alternatively implement it as a set of arrays (one for
each lang). e.g.
$Lang['KEY'] = 'TEXT';
(I do something like this for what I call 'static' texts).
Bare in mind that you could use foreign key relationships to create
a M-to-N joining table(s) that stores translations for given entities in
the DB e.g.
WEBPAGES
id
title
url
WEBPAGE_CONTENTS
webpage_id --> WEBPAGES.id
lang_id --> LANGS.id
content
LANGS
id
name
(another trick I use when it is not feasable to use a default value as a
key - i.e. a whole page of text makes rather a large key value - rather
larger than most DBs expect for indexable key fields)
You mention John C.'s article about smarty filters - you might then
want to look at Apache2 output filters, very cool stuff by all accounts,
although I have no personal experience with them
---
I18N / L10N can be a bitch, I mean not only do you have to implement it
but then you have users who want to quickly/easily manage 100's/10,000's
of translatable text. on top of which you will find yourself in the
murky waters of encoding translation and/or Unicode (UTF8/16) - the
reason I say this is that these things can be complex enough with out
making life even harder by starting off determined to use XML as part of
the solution. besides unless you are going to use some serious caching
of output (e.g. smarty caching, homebrewed output caching, squid etc
etc) then extracting large chunks of XML from a DB and then having to
parse it before extracting the relevant values (probably repeated more
than once per request) is probably going to make your site alot slower.
I'll say that another way - deciding to use XML should be the endpoint
of your investigation not the starting point.
Hope thats given you some stuff to think about and maybe spark some ideas!
grds,
Jochem
Thanks
Jacob
--- End Message ---
--- Begin Message ---
Hi,
We got the same problem. Sometimes, the translation is displayed,
sometimes, the Original is displayed. Has anybody found a solution ?
Regards,
Xavier
Patrick Savelberg a écrit :
Hi,
I have an application written in PHP with gettext support. Every now and
then the messages don't get translated. A refresh of the page will sometimes
help. But after reloading the page about five times the untranslated strings
show up again. There seems to be no clear reason why this happens. Anybody?
--- End Message ---