RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

jon Tue, 09 Dec 2003 10:01:36 -0800

> > You might as well say that C code is not plain text because it too is
> > subject to special canons of interpretation.
> 
> C, C++ and Java source files are not plain text as well (they have their own


C, C++ and Java source files are plain text.

> "text/*" MIME type, which is NOT "text/plain" notably because of the rules

I've seen text/cpp and text/java, but really there are no such types. I've also 
seen text/x-source-code which is at least legal, if of little value to 
interoperability.

The correct MIME type for C and C++ source files is text/plain. I'd be prepared 
to give good odds that that is the case with Java source files as well.

> associated with end-of-lines, notably in presence of comments).

As source files (that is, at the stage in processing at which a human user can 
see the source and edit it) the only handling required for end-of-lines is 
converstion of new line function characters, the same as for any other use of 
plain text.

The treatment of end-of-lines as significant when processed (for example 
following one-line // comments) is a matter of what an application chooses to do
with a particular character. This is no different than an indexer deciding that 
a plain text file contains a particular word, or for that matter in my putting 
coffee filters into my basket if I see "coffee filters" written on my shopping 
list.

> > But both XML/HTML/SGML and the various programming languages are plain
> text.
> 
> See "text/xml", "text/html" and "text/sgml" MIME types. They also aren't
> "text/plain" so they have their own interpretation of Unicode characters
> which is not the one found in the Unicode standard.

They have their own interpretation of tne Unicode characters which is *in 
addition to* the one found in the Unicode standard. As to all but the simplest 
applications that use Unicode (as interesting as many of them are, characters 
are of little use on their own).

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Reply via email to