I have a question about an odd phenomenon. It doesn't have much to do with
PHP except that I used strtr to solve it, and it maybe that the problem is
being caused by a setting in PHP, but I would like to get some more
background info as to why this is happening.

 

On a typical Windows system, most applications use the windows-1252
character set. Linux uses UTF-8 or Unicode. The former being an 8 bit set
and the latter being a 16 bit set. 

 

Well I have a form on a website that has to be able to take in text from
MSWord and Notepad and the like. If someone has been using "Autoformating"
in MS Word, the "special characters" get translated into a UTF-8 equivalent.
What's odd is that these 8 bit windows characters become 24 bit
combinations, I think. When I look at the characters in hex they are
represented by 3 numbers first one always being 0xE2.

 

Why is there an 0xE2 beginning the character combination and why does PHP
translate these characters this way? Is there something you can do to
minimize them besides writing some kind of character scrubber?

 

Thanks,

Carl

 

Reply via email to