php-i18n Digest 29 Nov 2004 19:34:50 -0000 Issue 264

Topics (messages 813 through 818):

Re: Accented characters
        813 by: steve
        816 by: Christophe Chisogne
        817 by: steve

Using Translation from PEAR, other libraries
        814 by: Jacob Singh
        818 by: Jochem Maas

Re: GETTEXT strings occasionally don't get translated
        815 by: Xavier O

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
Jacob Singh wrote:

> Anyway, I had the same problem as steve (I think, I've read the entire
> thread).  It was a HASSTLE.  we got a new server after a crash so I
> uploaded my DB dump from the local box onto a fresh mySQL 3.23 and
> Apache 2.  It seemed more or less okay, I didn't test extensively and a
> week later after 10,000 inserts had been made, I realized the accented
> chars were screwed up, and we use about 25 of them (28 to be exact).  So
> I looked in the DB and lo and behold, they were all corrupted and
> replaced with the char combos Steve mentioned.  To be specific it was an
> upper case A with two dots above it followed by another char, usually
> something weird like the Euro or a Cubed exponent.

Well, I *think* I may have located my problem - and I think that Apache, PHP
and MySQL are all in the clear - the problem was MySQLcc.

Here's why I think that. I dumped a table from the live server (as I did
before) and viewed it with Kwrite (which is set to open/save as
iso-8859-1). This showed that the dumped file did indeed contain latin1
characters.

Now, I had been uploading the tables to my local server using the SQL panel
in MySQLcc - just open the dumped file, and go... When the data were viewed
in MySQLcc, the characters rendered correctly and (here's the snag)
according to all the MySQL config files and the server info in MySQLcc,
everything MySQL-related was set to use latin1. BUT...

If I dumped the table from the local server using mysqldump, the accented
chars were now in utf-8 - note, no Apache or PHP involved. This told me it
was a MySQL problem.

So, I took the original dumped file, which I knew to be in latin1 and
uploaded it to the MySQL server from the command line:
        mysql -D database < filename.sql
And lo! It was *still* in latin1 and now works correctly on the web page set
to iso-8859-1.

But, the characters don't render correctly when viewed with MySQLcc - which
I'm now convinced is using utf-8. I can't find any config settings for
MySQLcc relating to encoding, so maybe it's something to do with KDE? I
don't know - thing is, I've solved the problem. I'll just avoid using
MySQLcc for loading tables.

I'll be sticking with latin1 (or maybe iso-8859-15). I'll never produce a
site that uses more than English and French, so messing with unicode is
just too much grief for me...

-- 
@+
Steve

--- End Message ---
--- Begin Message --- steve wrote:
But, the characters don't render correctly when viewed with MySQLcc - which
I'm now convinced is using utf-8. I can't find any config settings for
MySQLcc relating to encoding, so maybe it's something to do with KDE? I

If mysqlcc uses locales, just set locale before launching it (via xterm) Under Linux and for french locale, you can choose it via LANG env variable: LANG=fr_BE.iso88591 mysqlcc [EMAIL PROTECTED] mysqlcc LANG=fr_FR.utf8 mysqlcc

Sorry, I dont use KDE/Gnome very often. But I guess they both defaults
to utf-8 these days.

I'll be sticking with latin1 (or maybe iso-8859-15). I'll never produce a
site that uses more than English and French

As Tex pointed out, "1) ISO 8859-1 does not have the Euro character so is not really suitable for France or Europe, unless you never have or discuss commercial transactions. and "(...) Greek (...) is also not covered by latin-1)"

About iso-8859-15 (aka latin9, aka latin0), from "man iso_8859-15":
"(...latin1...) lacks the EURO symbol and does not fully cover Finnish and 
French.
ISO 8859-15 is a modification of ISO 8859-1 that covers these needs

FYI I made a diff between latin1 and latin9 (with man -7 and diff)

hex     iso-8859-1/latin1               iso-8859-15/latin9
----------------------------------------------------------------------------
A4      CURRENCY SIGN                   EURO SIGN
A6      BROKEN BAR                      LATIN CAPITAL LETTER S WITH CARON
A8      DIAERESIS                       LATIN SMALL LETTER S WITH CARON
B4      ACUTE ACCENT                    LATIN CAPITAL LETTER Z WITH CARON
B8      CEDILLA                         LATIN SMALL LETTER Z WITH CARON
BC      VULGAR FRACTION ONE QUARTER     LATIN CAPITAL LIGATURE OE
BD      VULGAR FRACTION ONE HALF        LATIN SMALL LIGATURE OE
BE      VULGAR FRACTION THREE QUARTERS  LATIN CAPITAL LETTER Y WITH DIAERESIS

Be carefull that some chars are undef in latin1 (hex 80-9F, deci 128-159).

You also need to take into account that Micro$oft, in his whole little world,
has its own "latin1" : cp1252 [1]. As windows users often used it and it's
incompatible with latin1 and add a few chars to latin1 in 0x80-0x9F range.
This means some translations must take place, whatever you choose
(latin1, latin9, utf-8).

Some facts can be worth knowing. Ex M$ cp1252 char A4 is 'Currency sign' too.
But M$ fonts (ex Arial) really use 'euro sign' for that char (even from 
window95,
with ms 'euro patches'). So lack of Euro sign can be dealt with by simply 
stating
(as ms) that A4 sign is euro sign. Ugly, but works quite well : ms users are
happy, but pblm remains with Mac/Unix/oldwindows users

Using iso-8859-15 also means not (really) using iso-8859-1, which is
the same as Unicode in lower 8 bits. To prepare Unicode migration
(utf-8 or other encodings), perhaps it's better to choose latin1

As Tex said too
"you will have to either go thru the work to convert to utf-8 anyway"

Everyone is migrating to Unicode (often utf-8 encoding), to avoid
encoding problems/headaches. So you'll have to do it someday.
But not everyone is always up-to-date, on the edge, etc.
Ex many people still uses Windows98 (21% of google users in mid 2004), [2]
not the newest XP. That said, there was already some Unicode support
back to Ms-Office 97.

Everyone is moving to Unicode, it's up to you decide when you'll do it.

Personnaly, I thinks that, for very 'local' websites
(like only English/French/Dutch in Belgium/France)
latin1 is still an option, even if utf-8 will replace it
in a somewhat near future -- I mean when (nearly) all "old"
softs/web-apps using latin1 will be upgraded to Unicode.

But yes, Unicode will be the only choice quite soon,
so be prepared seems a good idea

Christophe

[1] cp1252
http://www.microsoft.com/typography/unicode/1252.htm

[2] april 2004 zeitgeist google
http://www.google.com/press/zeitgeist/zeitgeist-apr04.html

--- End Message ---
--- Begin Message ---
Christophe Chisogne wrote:
[lots of useful info snipped]

Thanks for all that, Christophe - still digesting much of it.

As I don't code or web design for a living, and now have a site that's at
least working, I think I can safely put off the whole unicode issue
indefinitely ;-)

One issue remains in my mind, should I decide to go the unicode route: given
that my hosting company uses latin1 for its mysql server, are there any
issues in having a myql server using latin1 while web pages and the data
itself are utf-8? I've already proven that I can convert latin1 data to
utf-8 without even trying... :-(

-- 
@+
Steve

--- End Message ---
--- Begin Message --- What is the common framework people use for I18N on your sites? John Coggenshall has an article in PHPBuilder about using smarty filters. I don't really approve of this approach because it is forcing me into Smarty, which I am not particularly fond of.

I like the look of PEAR::translation2, but I am not sure about the best way to implement it. I feel that a good I18N package, like any other package, doesn't compromise your framework intentions. This one seems to require that you use PEAR:DB through their connection, which is a problem because of connection pooling, and the fact that I don't use PEAR:DB, I am using propel.

Any thoughts on this? I need to make a site that is UTF-8 and has translations not only for labels and images, but in many cases for actual data.

I'm thinking of storing my data in an XML format in MySQL with multiple translations and making my own search index for each language. The problem with this is that I have to grab the entire XML doc for each field which may have 10-15 translations, parse and then display, wasting lots of processing and database time.

I'm not farmilliar with XML databases, and I'm told they are bad voodo, but what is another solution if you have to store user entered records in 'n' languages?

Thanks
Jacob

--- End Message ---
--- Begin Message --- Jacob Singh wrote:
What is the common framework people use for I18N on your sites? John Coggenshall has an article in PHPBuilder about using smarty filters. I don't really approve of this approach because it is forcing me into Smarty, which I am not particularly fond of.

I like the look of PEAR::translation2, but I am not sure about the best way to implement it. I feel that a good I18N package, like any other package, doesn't compromise your framework intentions. This one seems to require that you use PEAR:DB through their connection, which is a problem because of connection pooling, and the fact that I don't use PEAR:DB, I am using propel.

Any thoughts on this? I need to make a site that is UTF-8 and has translations not only for labels and images, but in many cases for actual data.

a few thoughts:

1. I believe translation is integral to any web framework because a framework is about managing contextual content display (and the language is a variable attribute of the content). Also I wouldn't expect alot of code out there that doesn't come with some baggage (from the point fo view of your own framework), then again there is nothing to stop you from stripping down a PEAR module to suit your needs.

2. I view static text (e.g. button labels) and user text as fundamentally different - for the static texts I use a class that handles translating placeholder strings and for user created text I have an integrated translation service in my data objects - one tells the DB class to attempt (if a translation for the current language is not found then the original value is shown) to translate relevant values (i.e. fields marked in the data objects as 'translatable') when 'getting' values, the translations
are stored in a seperate table ala:


KEY     - a user created string taken from an arbitrary row & table
          in the DB.
LANG    - a language code relating to the language of the value of the
          TEXT field
TEXT    - the translated value of KEY


I'm thinking of storing my data in an XML format in MySQL with multiple translations and making my own search index for each language. The problem with this is that I have to grab the entire XML doc for each field which may have 10-15 translations, parse and then display, wasting lots of processing and database time.


I'm not farmilliar with XML databases, and I'm told they are bad voodo, but what is another solution if you have to store user entered records in 'n' languages?

the table I describe above actaully covers that scenario - how you present the management interface is ofcourse up to you. for a given KEY (text te translate) and LANG (id of the desired language) it is possible to retrieve a translation - the table stipulate the 3 bits of information required for every/any specific translation that needs to occur. you could alternatively implement it as a set of arrays (one for each lang). e.g.


$Lang['KEY'] = 'TEXT';

(I do something like this for what I call 'static' texts).

Bare in mind that you could use foreign key relationships to create
a M-to-N joining table(s) that stores translations for given entities in the DB e.g.


WEBPAGES
id
title
url

WEBPAGE_CONTENTS
webpage_id      --> WEBPAGES.id
lang_id         --> LANGS.id
content

LANGS
id
name

(another trick I use when it is not feasable to use a default value as a key - i.e. a whole page of text makes rather a large key value - rather larger than most DBs expect for indexable key fields)


You mention John C.'s article about smarty filters - you might then want to look at Apache2 output filters, very cool stuff by all accounts, although I have no personal experience with them


---

I18N / L10N can be a bitch, I mean not only do you have to implement it but then you have users who want to quickly/easily manage 100's/10,000's of translatable text. on top of which you will find yourself in the murky waters of encoding translation and/or Unicode (UTF8/16) - the reason I say this is that these things can be complex enough with out making life even harder by starting off determined to use XML as part of the solution. besides unless you are going to use some serious caching of output (e.g. smarty caching, homebrewed output caching, squid etc etc) then extracting large chunks of XML from a DB and then having to parse it before extracting the relevant values (probably repeated more than once per request) is probably going to make your site alot slower. I'll say that another way - deciding to use XML should be the endpoint of your investigation not the starting point.

Hope thats given you some stuff to think about and maybe spark some ideas!

grds,
Jochem


Thanks Jacob


--- End Message ---
--- Begin Message ---
Hi,

We got the same problem. Sometimes, the translation is displayed, sometimes, the Original is displayed. Has anybody found a solution ?

Regards,

Xavier

Patrick Savelberg a écrit :
Hi,

I have an application written in PHP with gettext support. Every now and
then the messages don't get translated. A refresh of the page will sometimes
help. But after reloading the page about five times the untranslated strings
show up again. There seems to be no clear reason why this happens. Anybody?

--- End Message ---

Reply via email to