> What is notepad? A text editor? Text editors should not insert a UTF-8
> BOM either. The problem is that Microsoft sometimes invents
> non-standard things and then pushes it so hard that Unicode adds it to
> parts of the standard (or an FAQ). "Microsoft conventions for .txt
> files" in the Unicode FAQ looks sarcastic to me.

Well, maybe you're right, but I don't see how a text editor is supposed to
know the encoding of a file without some kind of mark.  See, HTTP transfers
the character set using the Content-Type response header.  In HTML, it's
spedified with a <meta http-equiv="Content-Type" ...> tag.  In XML, the
default encoding is UTF-8, and if a document is encoded in another encoding,
it must be specified in the <?xml ?> PI.  Plain text files have no means of
identifying the character encoding, so a single text file can be interpreted
as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the
exact character encoding used.

The point here is that, protocols which do not allow BOM are those who
provide other means of specifying the character encoding.  A certain byte
stream can have multiple interpretations depending on what content encoding
you use to interpret it, and there must be some way to cut off this
confusion.

YMMV,
-------------
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to