On Wednesday 11 May 2005 07:43, Carl Furst wrote:
> I have a question about an odd phenomenon. It doesn't have much to do with
> PHP except that I used strtr to solve it, and it maybe that the problem is
> being caused by a setting in PHP, but I would like to get some more
> background info as to why this is happening.
>
>
>
> On a typical Windows system, most applications use the windows-1252
> character set. Linux uses UTF-8 or Unicode. The former being an 8 bit set
> and the latter being a 16 bit set.
>
>
>
> Well I have a form on a website that has to be able to take in text from
> MSWord and Notepad and the like. If someone has been using "Autoformating"
> in MS Word, the "special characters" get translated into a UTF-8
> equivalent. What's odd is that these 8 bit windows characters become 24 bit
> combinations, I think. When I look at the characters in hex they are
> represented by 3 numbers first one always being 0xE2.
>
>
>
> Why is there an 0xE2 beginning the character combination and why does PHP
> translate these characters this way? Is there something you can do to
> minimize them besides writing some kind of character scrubber?

If you check the UTF8 character set table at (http://www.unicode.org/charts/) 
you will see that the section for Basic Latin answers your question.

>
>
>
> Thanks,
>
> Carl

-- 

Cyberly yours,
Petar Nedyalkov
Devoted Orbitel Fan :-)

PGP ID: 7AE45436
PGP Public Key: http://bu.orbitel.bg/pgp/bu.asc
PGP Fingerprint: 7923 8D52 B145 02E8 6F63 8BDA 2D3F 7C0B 7AE4 5436

Attachment: pgp5rdwJkN92E.pgp
Description: PGP signature

Reply via email to