On 10/19/07 2:29 PM, "Michael B Allen" <[EMAIL PROTECTED]> wrote:
> On 10/19/07, Cliff Hirsch <[EMAIL PROTECTED]> wrote: >> >> There was recently a thread about some character set problem. I just found >> a similar issue. I just transferred a site from a Windows XP dev. platform >> to rhel. Everything looks fine except for a few special characters. >> >> Windows -> rhel >> it's -> it?s >> -> ? (should be the long dash, an em I think) >> 'blahblah' -> ?blahblah? >> " -> ? > > Hey Cliff, > > That's actually not a character encoding issue. The '?' or an empty > box is commonly displayed whenever a glyph associated with a character > value is not available. Meaning the client doesn't have the necessary > font. Also meaning, whatever editor was used to input those single > quotes didn't input the more common ASCII single quote character value > of 0x27. If you hexdump that content you'll see it's something else > (it will probably be a multibyte UTF-8 secquence which when decoded > will give you a Unicode value that you can lookup in Adobe's glyph > tables). > > This is the sort of thing that happends when you create some content > with a word processor and then copy and paste it into the web page. > > The way to fix this problem is to just seek and destory all of those > characters and replace them with their more common equivalent values > (e.g. the single quote 0x27 ASCII value). > > Or you could install whatever wacked out font that has that character > on every client that will ever visit the page but that's probably not > the more desirable solution. > >> In phpMyAdmin I see: can't >> In my app, I see: can?t >> So phpMyAdmin is displaying things correctly on either platform. > > That's odd. Maybe phpMyAdmin is doing some transliteration. > >> Where should I start looking? What is the best charset to use anyway? >> Iso-8859-1 or utf-8? > > Look at the page with hexdump to see verify what the encoding is and > what the unicode value of one of the errant characters really is. Then > you can start to figure out where things went wrong. > > Mike Mike: Thanks. This is helpful. Here's another interesting puzzle. Why does the page info in FireFox say encoding: UTF-8 while the Content-Type is charset=iso-8859-1. Ah, I think I see it. The encoding is how the page was saved. And as usual, Microsoft butchers everything. But this is php -- the page is dynamically generated. So is the encoding picked up from my php script, index.php, or the template file index.tpl? _______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php