Munzir Taha wrote:
> I opened notepad, write arabic, and saved the file as filename.htm with
> Encoding UTF-8.

note that this requires windows 2000. windows nt notepad can save only in the system 
codepage and in utf-16le. win9x notepad does not support unicode.

> Opening the page, I found that view -> Encoding      shows
> Unicode (UTF-8) with auto-select enabled. My question is where this info
> lies - In my box?.

notepad always saves unicode-encoded files with the appropriate signature byte 
sequence, like most other microsoft-apps and many other well-behaved applications.

they are the first 2 to 4 bytes in the text file, encode U+feff in the particular 
encoding scheme, and are as follows:

utf-8:      ef bb bf
utf-16be:   fe ff
utf-16le:   ff fe
utf-32be:   00 00 fe ff
utf-32le:   ff fe 00 00 (check before utf-16le!)
scsu:       0e fe ff (unfortunately rather rarely used)

> Suppose I publish the page, how can people know that I
> told notepad to save as Unicode ;-)

the best way for html is really the way michael described in his reply, with a meta 
tag. note that it is good practice, recommended by html 4.0 and required by xhtml 1.0, 
to write the elements and attributes in michael's html line in lowercase - xhtml and 
xml are case-sensitive.

for xml and xhtml, you need to specify the encoding in the xml declaration (it 
defaults to utf-8 in xml).

markus

Reply via email to