Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))
At 11:54 AM 2/6/03 -0800, Kenneth Whistler wrote: My personal opinion? The whole debate about deprecation of language tag characters is a frivolous distraction from other technical matters of greater import, and things would be just fine with the current state of the documentation. But, if formal deprecation by the UTC is what it would take to get people to stop advocating more use of the language tags after the UTC has long determined that their use is strongly discouraged, then so be it. My personal opinion is that labelling them as restricted for use with protocols requiring their use is sufficient and proper. In the context of such protocols, the use of tag characters is a fine mechanism. They certainly have some advantages over ASCII-style markup (e.g. lang=...) in many situations. Where they don't have a place is in regular 'plain' text streams. Formal deprecation would imply to me that ANY use is discouraged, including the use with protocols that wish to make use of them. THAT seems to be going too far in this case. Where we have deprecated format characters in the past it has been precisely in situations where we wanted to discourage the use of particular 'protocols', for example for shaping and national digit selection. A./
Re: VS vs. P14 (was Re: Indic Devanagari Query)
John H. Jenkins wrote: Ah, but decorative motifs are not plain text. Ah, but it could be.
Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))
I feel that as the matter was put forward for Public Review then it is reasonable for someone reading of that review to respond to the review on the basis of what is stated as the issue in the Public Review item itself. Kenneth Whistler now states an opinion as to what the review is about and mentions a file PropList.txt of which I was previously unaware. Recent discussions in the later part of 2002 in this forum about the possibilities of using language tags only started as a direct result of the Unicode Consortium instituting the Public Review. The recent statement by Asmus Freytag seems fine to me. Certainly I might be inclined to add in a little so as to produce Plane 14 tags are reserved for use with particular protocols requiring, or providing facilities for, their use so that the possibility of using them to add facilities rather than simply using them when obligated to do so is included, but that is not a great issue: what Asmus wrote is fine. Public Review is, in my opinion, a valuable innovation. Two issues have so far been resolved using the Public Review process. Those results do seem to indicate the value of seeking opinions by Public Review. As I have mentioned before I have a particular interest in the use of Unicode in relation to the implementation of my telesoftware invention using the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system. I feel that language tags may potentially be very useful for broadcasts of multimedia packages which include Unicode text files, by direct broadcast satellites across whole continents. Someone on this list, I forget who, but I am grateful for the comment, mentioned that even if formal deprecation goes ahead then that does not stop the language tags being used as once an item is in Unicode it is always there. So fine, though it would be nice if the Unicode Specification did allow for such possibilities within its wording. The wording stated by Asmus Freytag pleases me, as it seems a good, well-rounded balance between avoiding causing people who make many widely used packages needing to include software to process language tags, whilst still formally recognizing the opportunity for language tags to be used to advantage in appropriate special circumstances. I feel that that is a magnificent compromise wording which will hopefully be widely applauded. In using Unicode on the DVB-MHP platform I am thinking of using Unicode characters in a file and the file being processed by a Java program which has been broadcast. The file PropList.txt just does not enter into it for this usage, so it is not a problem for me as to what is in that file. My thinking is that many, maybe most, multimedia packages being broadcast will not use language tags and will have no facilities for decoding them. However, I feel that it is important to keep open the possibility that some such packages can use language tags provided that the programs which handle them are appropriately programmed. There will need to be a protocol. Hopefully a protocol already available in general internationalization and globalization work can be used directly. If not, hopefully a special Panplanet protocol can be devised specifically for DVB-MHP broadcasting. On the matter of using Unicode on the DVB-MHP platform, readers might like to have a look at the following about the U+FFFC character. http://www.users.globalnet.co.uk/~ngo/ast03200.htm Readers who are interested in uses of the Private Use Area might like to have a look at the following. They are particularly oriented towards the DVB-MHP platform but do have wider applications both on the web and in computing generally. http://www.users.globalnet.co.uk/~ngo/ast03000.htm http://www.users.globalnet.co.uk/~ngo/ast03100.htm http://www.users.globalnet.co.uk/~ngo/ast03300.htm The main index page of the webspace is as follows. http://www.users.globalnet.co.uk/~ngo William Overington 7 February 2003
Re: VS vs. P14 (was Re: Indic Devanagari Query)
At 01:52 AM 2/7/03 -0800, Andrew C. West wrote: Ah, but decorative motifs are not plain text. Ah, but it could be. Ah, but it wouldn't be Unicode. A(h)./
Re: VS vs. P14 (was Re: Indic Devanagari Query)
Asmus Freytag asmusf at ix dot netcom dot com wrote: Unicode 4.0 will be quite specific: P14 tags are reserved for use with particular protocols requiring their use is what the text will say more or less. I didn't know the question of what to do about Plane 14 language tags had already been resolved. If that is the case, it might make sense to add an explanatory note to the Public Review item on Plane 14 tags, or simply to remove the item. -Doug Ewell Fullerton, California
VS vs. P14 (was Re: Indic Devanagari Query)
James Kass wrote, (What happens if someone discovers a 257th variant? Do they get a prize? Or, would they be forever banished from polite society?) I was thinking about that. 256 variants of a single character may seem a tad excessive, but there is a common Chinese decoartive motif (frequently seen on trays and tea-pots and scarves and such like) comprising the ideograph shou4 (U+58FD, U+5900, U+5BFF) longevity written in 100 variant forms (called bai3 shou4 tu2 in Chinese). See http://www.tydao.com/sxsu/shenhuo/minju/images/mj17.htm for an example. A quick google on qian1 shou4 tu2 (the ideograph shou4 written in a thousand different forms) came up with a piece of calligraphy by Wang Yunzhuang (b.1942) which comprises the ideograph shou4 written in no less than 1,256 unique variant forms ! Googling on wan4 shou4 tu2 (the ideograph shou4 written in 10,000 forms) also had a number of hits, but these refer to a compilation of calligraphy by forty artists that took 16 years to create (written on a scroll 160 metres in length), so these may not all be unique variants. There are also a number of other auspicious characters, such as fu2 (U+798F) good fortune that may be found written in a hundred variant forms as a decorative motif. All in all the new variant selectors may be kept quite busy if applied to the ideograph shou4 and its friends ! Andrew
Re: VS vs. P14 (was Re: Indic Devanagari Query)
On Thursday, February 6, 2003, at 08:47 AM, Andrew C. West wrote: There are also a number of other auspicious characters, such as fu2 (U+798F) good fortune that may be found written in a hundred variant forms as a decorative motif. Ah, but decorative motifs are not plain text. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
VS vs. P14 (was Re: Indic Devanagari Query)
. Andrew C. West wrote, Is this not what the variation selectors are available for ? And now that we soon to have 256 of them, perhaps Unicode ought not to be shy about using them for characters other than mathematical symbols. Yes, there seem to be additional variation selectors coming in Unicode 4.0 as part of the 1207 (is that number right?) new characters. (What happens if someone discovers a 257th variant? Do they get a prize? Or, would they be forever banished from polite society?) The variation selectors could be a practical and effective method of handling different glyph forms. But, consider the burden of incorporating a large amount of variation selectors into a text file and contrast that with the use of Plane Fourteen language tags. With the P14 tags, it's only necessary to insert two special characters, one at the beginning of a text run, the other at the ending. Jim Allan wrote, One could start with indications as to whether the text was traditional Chinese, simplified Chinese, Japanese, Korean, etc. :-( But I don't see that there is anything particularly wrong with citing or using a language in a different typographical tradition. ... Neither do I. I kind of like seeing variant glyphs in runs of text and am perfectly happy to accept unusual combinations. Perhaps those of us who deal closely with multilingual material and are familiar with variant forms are simply more tolerant and accepting. ... A linguistic study of the distribution of the Eng sound might cite written forms with capital letters from Sami and some from African languages, but need not and probably should not be concerned about matching exactly the exact typographical norms in those tongues, for _eng_ or for any other letter. On the one hand, there's a feeling that insistence upon variant glyphs for a particular language is provincial. On the other hand, everyone has the right to be provincial (or not). IMO, it's the ability to choose that is paramount. If anyone wishes to distinguish different appearances of an acute accent between, say, French and Spanish... or the difference of the ogonek between Polish and Navajo... or the variant forms of capital eng, then there should be a mechanism in place enabling them to do so. Variation selectors would be an exact method with the V.S. characters manually inserted where desired. P14 tags would also work for this; entire runs of text could be tagged and those runs could be properly rendered once the technology catches up to the Standard. Neither V.S. nor P14 tags should interfere with text processing or break any existing applications. There are pros and cons for either approach. Best regards, James Kass .
VS vs. P14 (was Re: Indic Devanagari Query)
. Peter Constable wrote, The plain-text file would be legible without that -- I don't think this is an argument in favour of plane 14 tag characters. Preserving culturally-preferred appearance would certainly require markup of some form, whether lang IDs or for font-face and perhaps font-feature formatting. Any Unicode formatting character can be considered as mark-up, even P14 tags or VSs. The advantages of using P14 tags (...equals lang IDs mark-up) is that runs of text could be tagged *in a standard fashion* and preserved in plain-text. Best regards, James Kass .
Re: VS vs. P14 (was Re: Indic Devanagari Query)
At 06:24 PM 2/5/03 +, [EMAIL PROTECTED] wrote: The advantages of using P14 tags (...equals lang IDs mark-up) is that runs of text could be tagged *in a standard fashion* and preserved in plain-text. The minute you have scoped tagging, you are no longer using plain text. The P14 tags are no different than HTML markup in that regard, however, unlike HTML markup they can be filtered out by a process that does not implement them. (In order to filter out HTML, you need to know the HTML syntax rules. In order to filter out P14 tags you only need to know their code point range.) Variation selectors also can be ignored based on their code point values, but unlike p14 tags, they don't become invalid when text is cutpaste from the middle of a string. If 'unaware' applications treat them like unknown combining marks and keep them with the base character like they would any other combining mark during editing, then variation selectors have a good chance surviving in plain text. P14 tags do not. Unicode 4.0 will be quite specific: P14 tags are reserved for use with particular protocols requiring their use is what the text will say more or less. A./
Re: VS vs. P14 (was Re: Indic Devanagari Query)
On 02/05/2003 12:24:39 PM jameskass wrote: The advantages of using P14 tags (...equals lang IDs mark-up) is that runs of text could be tagged *in a standard fashion* and preserved in plain-text. Sure, but why do we want to place so much demand on plain text when the vast majority of content we interchange is in some form of marked-up or rich text? Let's let plain text be that -- plain -- and look to the markup conventions that we've invested so much in and that are working for us to provide the kinds of thing that we designed markup for in the first place. Besides, a plain-text file that begins and ends with p14 tags is a marked-up file, whether someone calls it plain text or not. We have little or no infrastructure for handling that form of markup, and a large and increasing amount of infrastructure for handling the more typical forms of markup. I repeat, plain text remains legible without anything indicating which eng (or whatever) may be preferred by the author, and (since the requirement for plain text is legibility) therefore this is not really an argument for using p14 language tags. IMO. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: VS vs. P14 (was Re: Indic Devanagari Query)
At 16:47 -0500 2003-02-05, Jim Allan wrote: There are often conflicting orthographic usages within a language. Language tagging alone does not indicate whether German text is to be rendered in Roman or Fraktur, whether Gaelic text is to be rendered in Roman or Uncial, and if Uncial, a modern Uncial or more traditional Uncial, whether English text is in Roman or Morse Code or Braille. We have script codes (very nearly a published standard) for that. By the way, modern uncial and more traditional uncial isn't really sufficient I think for describing Gaelic letterforms. See http://www.evertype.com/celtscript/fonthist.html for a sketch of a more robust taxonomy. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: VS vs. P14 (was Re: Indic Devanagari Query)
. Asmus Freytag wrote, Variation selectors also can be ignored based on their code point values, but unlike p14 tags, they don't become invalid when text is cutpaste from the middle of a string. Excellent point. Unicode 4.0 will be quite specific: P14 tags are reserved for use with particular protocols requiring their use is what the text will say more or less. This seems to be an eminently practical solution to the P14 situation. If I were using an application which invoked a protocol requiring P14 tags to read a file which included P14 tags and wanted to cut and paste text into another application, in a perfect world the application would be savvy enough to recognize any applicable P14 tags for the selected text and insert the proper Variation Selectors into the text stream to be pasted. The application which received the pasted text, if it was an application which used a protocol requiring P14 tags, would be savvy enough to strip the variation selectors and enclose the pasted string in the appropriate P14 tags. If the pasted material was being inserted into a run of text in which the same P14 tag applied, then the tags wouldn't be inserted. If the pasted material was being inserted into a run of text in which a different P14 tag applied, then the application would insert begin and end P14 tags as needed. In a perfect world, in the best of both worlds, both P14 tags and variation selectors could be used for this purpose. Is it likely to happen? Perhaps not. But, by not formally deprecating P14 tags and using (more or less) the language you mentioned, the possibilities remain open-ended. Best regards, James Kass .
Re: VS vs. P14 (was Re: Indic Devanagari Query)
. Peter Constable wrote, Sure, but why do we want to place so much demand on plain text when the vast majority of content we interchange is in some form of marked-up or rich text? Let's let plain text be that -- plain -- and look to the markup conventions that we've invested so much in and that are working for us to provide the kinds of thing that we designed markup for in the first place. Besides, a plain-text file that begins and ends with p14 tags is a marked-up file, whether someone calls it plain text or not. We have little or no infrastructure for handling that form of markup, and a large and increasing amount of infrastructure for handling the more typical forms of markup. We place so much demand on plain text because we use plain text. We continue to advance from the days when “plain text” meant ASCII only rendered in bitmapped monospaced monochrome. We don’t rely on mark-up or higher protocols to distinguish between different European styles of quotation marks. We no longer need proprietary rich-text formats and font switching abilities to be able to display Greek and Latin text from the same file. I repeat, plain text remains legible without anything indicating which eng (or whatever) may be preferred by the author, and (since the requirement for plain text is legibility) therefore this is not really an argument for using p14 language tags. IMO. Is legibility the only requirement of plain text? Might additional requirements include appropriate, correct encoding and correct display? To illustrate a legible plain text run which displays as intended (all things being equal) yet is not appropriately encoded (this e-mail is being sent as plain text UTF-8): 푰풇 풚풐풖 풄풂풏 풓풆풂풅 풕풉풊풔 풎풆풔풔풂품풆... 풚풐풖 풎풂풚 풘풊풔풉 풕풐 풋풐풊풏 푴푨푨푨* 풂풕 퓫퓵퓪퓱퓫퓵퓪퓱퓫퓵퓪퓱퓭퓸퓽퓬퓸퓶 (*헠햺헍헁 헔헅헉헁햺햻햾헍헌 헔햻헎헌햾헋헌 헔헇허헇헒헆허헎헌) Clearly, correct and appropriate encoding (as well as legibility) should be a requirement of plain text. Is correct display also a valid requirement for plain text? It is for some... Respectfully, James Kass .