I don't really have a good understanding of issues around character sets, encoding, what have you, though I am starting to work on it.
My problem involves a MySQL database and accented characters such as those you find in Spanish and French. My web server sends a "content-type: text/html; charset=iso-8859-1" header and my docs have an equivalent meta tag. My mysql's config says default-character-set = latin1 character_set_server = latin1 collation_server = latin1_general_ci and my data tables "SHOW CREATE" typically look like CREATE TABLE `people` ( `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT, `lastname` varchar(40) COLLATE latin1_general_ci NOT NULL, `firstname` varchar(40) COLLATE latin1_general_ci NOT NULL, /* etc */ ) ENGINE=MyISAM AUTO_INCREMENT=546 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci So what's the problem? Generally there is none. Characters like ó and ñ render correctly. The snag I am hitting now is writing a regular expression to whitelist the characters I can accept in proper names. I would think that the regex /^[-a-zA-Z\xC0-\xFF ']+$/ would test for anything that isn't a "letter" in most western european languages, or a space, or an apostrophe. But it is returning true (meaning yes there is an illegal character) in the name Barceló, where false is what I would like to hear. Would this regex work if the data were utf-8? Should I consider converting everything and working in utf-8, and if so, how painful is it to convert a MySQL database? My initial research suggests that it isn't painless. -- Support real health care reform: http://phimg.org/ -- David Mintz http://davidmintz.org/
_______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation