Hi group!

My name is Jacob, I'm re-working a site (calabashmusic.com) which is a music download site specializing in music from around the world. We have tie-ups with numerous international parters as well. The current version is Latin-1 encoded, and is only shown in English. The new one will me M17N and will need UTF because we are doing japan.

Anyway, I had the same problem as steve (I think, I've read the entire thread). It was a HASSTLE. we got a new server after a crash so I uploaded my DB dump from the local box onto a fresh mySQL 3.23 and Apache 2. It seemed more or less okay, I didn't test extensively and a week later after 10,000 inserts had been made, I realized the accented chars were screwed up, and we use about 25 of them (28 to be exact). So I looked in the DB and lo and behold, they were all corrupted and replaced with the char combos Steve mentioned. To be specific it was an upper case A with two dots above it followed by another char, usually something weird like the Euro or a Cubed exponent.

After Stuggling for much time, I eventually wrote a PHP script to traverse the Database, all tables and columns and change these screwy chars back to their equivilents which I had to match across simular records in my old DB. I also upgraded to MySQL 4.0.2 because I thought this might help.

Now everything seems to be pretty much okay, but I'm still shaking from the experience. I have no idea how it happened, or if I'll repeat it. I have MySQL 4.0.2 with latin1, apache with ISO-8859-1 as Default Charset.

Regarding what happened, any ideas? Regarding the future, how do I set up a purely unicode environment, and how do I convert my old data to it?

Thanks
Jacob

Steve wrote:
I know, I know, you've had this a million times. But I have Googled on this
and not come up with anything that really matches my problem, so I need
some advice about refining my search. Here's the situation:

I have a site using shared hosting which is running Apache 1.3.27, PHP
4.1.2. As this is a site about France, accented characters are used a lot,
but have never been a problem. Some such characters are entered via HTML
forms on the site, others are in MySQL databases where they have been
entered on my local system via MySQLcc. The characters are all 'proper'
characters - ie, they are not stored as HTML entities.

And that's all working fine. But...

I'm revamping the site, and on my local system (Apache 2, PHP 4.3.4), where
I'm doing the development, the exact same databases, using the exact same
browser (Firefox, FWIW) have problems with accented chars, which are shown
as a jumble of 2-3 chars.

Any ideas why there might be this discrepancy? Could it have anything to do
with the way PHP is installed on the two systems?

And what is the general recommendation about storing accented characters in
text fields on MySQL DBs? Convert to htmlentities during the saving?
Problem with that is that I might need the same databases for generating
email mail-outs where I'm not using HTML...

This is a problem I thought I'd solved ages ago, so my head's in a bit of
spin. Any advice is most welcome.


-- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to