> What is notepad? A text editor? Text editors should not insert a UTF-8 > BOM either. The problem is that Microsoft sometimes invents > non-standard things and then pushes it so hard that Unicode adds it to > parts of the standard (or an FAQ). "Microsoft conventions for .txt > files" in the Unicode FAQ looks sarcastic to me.
Well, maybe you're right, but I don't see how a text editor is supposed to know the encoding of a file without some kind of mark. See, HTTP transfers the character set using the Content-Type response header. In HTML, it's spedified with a <meta http-equiv="Content-Type" ...> tag. In XML, the default encoding is UTF-8, and if a document is encoded in another encoding, it must be specified in the <?xml ?> PI. Plain text files have no means of identifying the character encoding, so a single text file can be interpreted as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the exact character encoding used. The point here is that, protocols which do not allow BOM are those who provide other means of specifying the character encoding. A certain byte stream can have multiple interpretations depending on what content encoding you use to interpret it, and there must be some way to cut off this confusion. YMMV, ------------- Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] _______________________________________________ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing