James Kass posted:

The advantages of using P14 tags (...equals lang IDs mark-up) is
that runs of text could be tagged *in a standard fashion* and
preserved in plain-text.
But this still would not necessarily handle orthographic variations.

See Peter Constable's discussion of language classifcation and orhographic classification at http://www.unicode.org/notes/tn8.

Currently standard language tagging or orthographic tagging that is logically no more than a kludge once it tries to go beyond obvious different languages that are unintelligible to users of other languages.

Which language tag protocol should Unicode adopt? Should it create its own? That last seems beyond the mandate of Unicode.

There are often conflicting orthographic usages within a language. Language tagging alone does not indicate whether German text is to be rendered in Roman or Fraktur, whether Gaelic text is to be rendered in Roman or Uncial, and if Uncial, a modern Uncial or more traditional Uncial, whether English text is in Roman or Morse Code or Braille.

Capital Eng is found in both pointed and rounded forms in Sami texts and printed names, so far as I have read.

The pointed Eng is more common.

Does that mean it is "preferred" or only that it happens to be the more common form in available fonts?

Perhaps the rounded Eng is actually "peferred" by most.

Perhaps most don't care at all, any more than they care whether the hook on a _J_ descends below the baseline, whether the descender on _g_ is open or closed, whether _a_ is rendered with an upper curl or not.

Certainly language tagging shouldn't be used to distinguish between such forms, unless specifically requested by organizations that can show that their request is supported by a very large proportion of the users of the language.

But even then, do not those who disagree have the right to dissent, to push their own desires in spelling or orthography?

Language tagging and orthography tagging is not all that is needed.

One sometimes *needs* to show emphasis, for example in a database of books and articles one may need to catalogue titles like "Comments on the _Tao_Te_Ching_" (see http://www.friesian.com/taote.htm).

To be correct, the book title *must* be italicized, unless the article title appears in italicized text, in which case it should be non-italic to contrast.

Titles of articles in mathematics or chemistry may contain superscript and subscript characters beyond those hard-coded in Unicode.

These cannot be indexed in a database as plain text.

Plain text is not adequate for *so much* normal use. But who ever claimed it was? Plain text is only the underlying text, which is sometimes, alone, sufficient.

At the moment XML seems to be the mark-up protocol towards which most are moving, and there seems to be no point in duplicating its features in Unicode, unless Unicode can somehow do it better.

Jim Allan





Reply via email to