Re: utf-8 != latin-1

George Zeigler Sat, 14 Oct 2000 05:23:16 -0700
Hello,

      I didn't get it.  So what happens if a company had a Job site in Unicode,
and people were copying resume text from Word written in ISO 8859-1
and pasting into a text window in the browser?  Does the character set
automatically convert correctly.  Or does the user need to use a character set
converter like Recode?

Thanks
George 

Sat, 14 Oct 2000, �� ��������:
> Here's a gotcha story ..
> 
> Someone was working on documentation files in XML.  The PDF generator
> all of a sudden started choking, complaining that there was "Illegal
> character U+DC73" somewhere in the late stages of PDF generation. Well,
> the low surrogate certainly didn't belong there. Software bug? Memory
> corruption?
> 
> I converted the 1.1mb intermediate file into literal \uXXXX notation and
> searched for DC73. Sure enough, there was lower\uE54E\uDC73e  (U+E54E, a
> PUA, and U+DC73) .. in place of what was "lower-case" in the source
> text. Definitely memory corruption.. But wait..
>  On a hunch, I deleted the hyphen and replaced it, which worked somehow.
> I was told that the text "lower-case" was copied from another document.
> 
> Further inspection showed that the offending hypen was actually \xAD, 
> "soft hyphen".  Since the XML document had no encoding tag, it defaults
> to ..... UTF-8!  What happened was that the sequence  AD 63 61 73 was
> interpreted as U+E54E U+DC73.. 
> 
> So moral: BE CAREFUL when you are pasting text into utf-8 documents..
> 
> -steven
Re: utf-8 != latin-1

Reply via email to