Even in Windows 2000 where notepad support UTF-8 and wil try to auto-detect
it, BOM-less files will not be assumed to be UTF-8 if there is no reason why
it cannot be represented as a non-Unicode text file using the default system
code page.

If notepad was being used, and it was saved with a BOM, then it would have
worked. I think a BOM is the answer. Certainly notepad will never look at
the XML encoding tag (its not an XML parser, and technically a valid XML
parser does not have to respect the tag, per the spec).

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

----- Original Message -----
From: "Steven R. Loomis" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Saturday, October 14, 2000 2:48 PM
Subject: Re: utf-8 != latin-1


> Doug Ewell wrote:
> > Why?  As an illegal UTF-8 sequence, it shouldn't be interpreted as
anything.
>
>  It wasn't interpreted as anything. It halted processing at that point
> in the text, as an error.
>
> George Zeigler wrote:
> >       I didn't get it.  So what happens if a company had a Job site in
Unicode,
> > and people were copying resume text from Word written in ISO 8859-1
> > and pasting into a text window in the browser?  Does the character set
> > automatically convert correctly.  Or does the user need to use a
character set
> > converter like Recode?
>
>  It was pasted into Windows Notepad or some other editor editing an XML
> file. XML files unless otherwise tagged are UTF-8, but the editor
> thought it was something like Windows-1252. So, the right thing to do
> *might* be to tag the file as being 'windows-1252'.  A better solution
> would be to use UTF-8 aware editors only.
>
>  My point is that it was hard to tell visually whether the data being
> copied was a 'safe' subset of both utf-8 and windows-1252 [such as
> ASCII].
>
> -s
>

Reply via email to