Scott Eade wrote:
Okay, I'll answer my own question:
1. The character /u2019 will not be converted to a character reference when
UTF-8 is used
Correct
Notepad _can_ display Unicode characters from files that have been saved as UTF-8, as long as the font you use on Notepad can display that character. At work, we have lots of files that contain Chinese characters that are saved as UTF-8, and I use the SimSun or SimHei font to view those files, including XML files in UTF-8.(it will use two bytes and will not be displayed correctly in applications that do not correctly deal with UTF-8 - e.g. Windows notepad).
When you do a "Save As", you have the option to save a file as UTF-8 ( and UTF-16 I think ). Notepad also puts a BOM ( Byte Order Marking ) on front of the file. You can see this BOM through a hex editor.
The default is called windows-1252 in most cases at least ( Will be different of course for someone running Windows Thai ).2. In the cases where character references are used an editing component is causing them to be encoded - the component is not being used in the places where the characters are not encoded. 3. Windows file encodings are a PITA.
It's _not_ the same as iso-8859-1. You can think of windows-1252 as a superset of iso-8859-1.
http://czyborra.com/charsets/iso8859.html
On some websites, what were supposed to be "smart quote" characters appear as questions marks or as another funny character on your non-IE browser.
It turns out that the HTTP header for the webpage was advertised as iso-8859-1, but the file itself was encoded in windows-1252.
4. I know more now than I did before.
Sorry for the noise.
Scott
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
