Hi group!
My name is Jacob, I'm re-working a site (calabashmusic.com) which is a music download site specializing in music from around the world. We have tie-ups with numerous international parters as well. The current version is Latin-1 encoded, and is only shown in English. The new one will me M17N and will need UTF because we are doing japan.
Anyway, I had the same problem as steve (I think, I've read the entire thread). It was a HASSTLE. we got a new server after a crash so I uploaded my DB dump from the local box onto a fresh mySQL 3.23 and Apache 2. It seemed more or less okay, I didn't test extensively and a week later after 10,000 inserts had been made, I realized the accented chars were screwed up, and we use about 25 of them (28 to be exact). So I looked in the DB and lo and behold, they were all corrupted and replaced with the char combos Steve mentioned. To be specific it was an upper case A with two dots above it followed by another char, usually something weird like the Euro or a Cubed exponent.
After Stuggling for much time, I eventually wrote a PHP script to traverse the Database, all tables and columns and change these screwy chars back to their equivilents which I had to match across simular records in my old DB. I also upgraded to MySQL 4.0.2 because I thought this might help.
Now everything seems to be pretty much okay, but I'm still shaking from the experience. I have no idea how it happened, or if I'll repeat it. I have MySQL 4.0.2 with latin1, apache with ISO-8859-1 as Default Charset.
Regarding what happened, any ideas? Regarding the future, how do I set up a purely unicode environment, and how do I convert my old data to it?
Thanks Jacob
Steve wrote:
I know, I know, you've had this a million times. But I have Googled on this and not come up with anything that really matches my problem, so I need some advice about refining my search. Here's the situation:
I have a site using shared hosting which is running Apache 1.3.27, PHP 4.1.2. As this is a site about France, accented characters are used a lot, but have never been a problem. Some such characters are entered via HTML forms on the site, others are in MySQL databases where they have been entered on my local system via MySQLcc. The characters are all 'proper' characters - ie, they are not stored as HTML entities.
And that's all working fine. But...
I'm revamping the site, and on my local system (Apache 2, PHP 4.3.4), where I'm doing the development, the exact same databases, using the exact same browser (Firefox, FWIW) have problems with accented chars, which are shown as a jumble of 2-3 chars.
Any ideas why there might be this discrepancy? Could it have anything to do with the way PHP is installed on the two systems?
And what is the general recommendation about storing accented characters in text fields on MySQL DBs? Convert to htmlentities during the saving? Problem with that is that I might need the same databases for generating email mail-outs where I'm not using HTML...
This is a problem I thought I'd solved ages ago, so my head's in a bit of spin. Any advice is most welcome.
-- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php