Re: [Private Use Area] Audio Description, Subtitle, Signing
about the Private Use Area being rewritten for Unicode 4.0. Is there any chance of someone posting the Unicode 4.0 text into this discussion please? It remains to be seen what will be decided as the built-in font for the European Union implementation of the DVB-MHP specification. It might be the minimum font of the DVB-MHP specification or it might be more comprehensive. For example, should Greek characters be included? Should weather symbols be included? These and many other issues remain to be decided. The minimum font for any specification for Europe should be the MES-2. If you are talking to these people, tell them. Now, I have never heard of the MES-2 whatever that is. However, I do not have deep knowledge of the various standards which exist. Could you possibly say some more about MES-2 please. The minimum character set for DVB-MHP is in Annex E of the DVB-MHP specification, available from the http://www.mhp.org webspace. It is all in one huge pdf file. I am hoping that the European Union will specify a rather more comprehensive font. It may be that a lot will depend on how much unused space, if any, there is in the read-only memories which are used for the built-in font. The Cenelec group to which I refer is the DigitalTV_WG and readers who might like to join could ask at [EMAIL PROTECTED] as new members have joined at various times. Names of members are listed in the internal email facility of the forum. Membership is free and there appear to be people from outside the European Union as well as within it, as what is decided for Europe may be adopted by countries in other parts of the world. Readers might also be interested in the related TVforALL_WG forum which discusses issues of access to broadcasts for people with disabilities. The Audio Description, Subtitle, Signing logos issue has been posted in both of these forums. Please enquire at [EMAIL PROTECTED] by email if you are interested. Membership is also free for that forum. William Overington 17 July 2003
Re: Combining diacriticals and Cyrillic
Peter Constable wrote as follows. William Overington wrote on 07/15/2003 07:22:22 AM: No, the Private Use Area codes would not be used for interchange, only locally for producing an elegant display in such applications as chose to use them. Other applications could ignore their existence. Then why do you persist in public discussion of suggested codepoints for such purposes? If it is for local, proprietary use internal to some implementation, then the only one who needs to know, think or care about these codepoints is the person creating that implementation. The original enquiry sought advice about how to proceed. I posted some ideas of a possible way to proceed. If the idea of using a eutocode typography file is taken up and software which uses it is produced, then it would be reasonable to have a published list of Private Use Area code points for the precomposed characters which are to be available, as in that way the output stream from the processing could be viewed with a number of fonts from a variety of font makers without needing to change the eutocode typography file if one changed font. I have not published many of my suggested code points in this forum precisely because a few people do not want them published here. For example, there is the ViOS-like system for a three-dimensional visual indexing system for use in interactive broadcasting. Publishing a list of Private Use Area code points would have absolutely no purpose at all. mean that such display could be produced using a choice of fonts from various font makers using the same software Now you are talking interchange. Interchange means more than just person A sends a document to person B. It means that person A's document works with person B's software using person C's font. (An alternate term that is often used, interoperate, makes this clearer.) Exactly. This is why publishing the list of Private Use Area code point assignments for the precomposed characters is a good idea. Person B can display the document and then wonder if it might look better with that font made by person D and have a try with that font. If the list of Private Use Area code point assignments for the precomposed characters has been published and both C and D have used the list to add the extra Cyrillic characters into their fonts, then the published list of Private Use Area code point assignments for the precomposed characters has helped to achieve interoperability. I feel that an important thing to remember is the dividing line between what is in Unicode and what is in particular advanced format font technology solutions And best practice for advanced format font technologies eschews PUA codepoints for glyph processing. Who decides upon what is best practice? You've been told that several times by people who have expertise in advanced font technologies, an area in which you are not deeply knowledgable or experienced, by your own admission. Well, it is not a matter of an admission as if dragged out of me under examination by counsel in a courtroom. I openly stated the limits of my knowledge in that area, not as a retrospective defence yet as an up-front expression of the limitation of my knowledge when putting forward ideas, specifically so as not to produce any incorrect impression as to expertise in that area. yet they are not suitable for platforms such as Windows 95 and Windows 98, whereas a eutocode typography file approach would be suitable for those platforms and for various other platforms. Wm, if someone wanted, they could create an advanced font technology to work on DOS, but why bother? Who's going to create all the new software that works with that technology, and make it to work within the limitations of a DOS system? Yet I am not suggesting a system to work on DOS. Your idea is at best a mental exercise, and even if you or someone else built an implementation, what is not needed is some public agreement on PUA codepoints for use in glyph processing. When you say agreement I am not suggesting agreement in some formal manner. It is more like the authorship of a story where people may read it or not as they choose. Yet if people do read the story, or watch a television or movie implementation of it, a common culture may come to exist amongst the readers which can be applied in other circumstances. For example, it's as if on a holodeck and a character says 'arch' and is something which people who have watched Star Trek The Next Generation may use as a cultural way of expressing something. The original enquiry referred as if a number of people are trying to solve the problem. If a list of the characters is published with Private Use Area code points from U+EF00 upwards, then they could all, if they so choose, use that set of code points and it might help in font interoperability, certainly if they choose to implement a eutocode typography file system and maybe in some other implementations. I suggested U
Re: [Private Use Area] Audio Description, Subtitle, Signing
Peter Constable wrote as follows. William Overington wrote on 07/15/2003 05:33:22 AM: William, CENELEC is an international standards body. Such bodies either create their own standards or use other international standards. They do not use PUA codepoints. Well, the fact of the matter is that Cenelec is trying to achieve a consensus for the implementation of interactive television within the European Union And that does not require PUA codepoints; moreover, your response does not escape the fact I was pointing out that a standards body will not be publishing standards that make reference to PUA codepoints. Please have a look at what Cenelec is do in trying to achieve a consensus for the implementation of interactive television within the European Union. This particular project for the European Commission is trying to achieve a consensus for the implementation of interactive television within the European Union. Your comments seem to relate to standards bodies generally or as to how Cenelec proceeds generally. This project is a particular project trying to achieve a consensus for the implementation of interactive television within the European Union. The difference is that things need to move forward promptly. There are lots of aspects, such as how many buttons to have on a hand-held infra-red control device for end user interaction with a running Java program (that is, the _minimum_ twenty of the DVB-MHP specification, or some more) and such as whether mouse events should be accessible to end users (as the DVB-MHP specification has mouse event access as optional in interactive televisions) and so on. What you write in relation to most projects carried out by standards bodies may well be true, yet I was writing specifically about one particular project being run by Cenelec. In view of the fact that the interactive television system (DVB-MHP, Digital Video Broadcasting - Multimedia Home Platform http://www.mhp.org ) uses Java and Java uses Unicode it is then a matter of deciding how to be able to signal the symbols in a Unicode text stream. And they won't be standardizing on symbols encoded using PUA codepoints. The deciding is not about something to incorporate into the DVB-MHP standard. It is a matter of trying to gain a consensus as to how to signal those symbols at the present time and in the near future (that is, until (if and when) some regular Unicode code points are achieved) within Java programs which run upon the DVB-MHP platform and in fonts which are used upon the DVB-MHP platform. It is essentially a matter for end users of the system, just as the two Private Use Area characters being suggested in another thread of this forum in relation to Afghanistan are a matter for end users of the Unicode Standard and does not affect the content of the Unicode Standard itself. In view of the fact that the process of getting regular Unicode code points for the symbols would take quite a time, and indeed that there is as yet no agreement on which symbols to use, and that the implementation of interactive television needs to proceed, it seems to me that putting forward three specific Private Use Area code points for the symbols at this time is helpful to the process. Then you obviously don't understand the process. Well, maybe I don't. However, the fact of the matter is that sooner or later some code points are needed to signal those symbols. I have put forward three suggested code points. I also mentioned them in this mailing list. My specific suggestions are in the Private Use Area and do not clash with various uses of the Private Use Area known to me. So three specific code points have been mentioned and I suggest that having those three code points published both in the Cenelec forum and here is beneficial as if they are used then various potential problems which could have arisen if some other choices (such as three unused code points in regular Unicode or several different sets of three code points in regular Unicode) were used. Such things are *not* useful. They do not achieve consistency, not in the short term, and most certainly not in the long term. If consistency is needed, the standardization process is used to established standardized representations. Well, what is the alternative? The alternative to agreeing on a standard? None, but why would you need an alternative? Code points for the symbols are needed now or in the near future. The symbol designs are not yet agreed. Obtaining regular Unicode points, if achievable, would take quite a time. With my suggested code points published, decisions on which symbol designs to use and getting them into use with everyone using the same code points could happen within a few days. The code points are in the Private Use Area, so the suggestion avoids the possibility of a non-conformant use of a regular Unicode code point. That is hardly the concern. Standards are designed to be international agreements
Re: [Private Use Area] Audio Description, Subtitle, Signing
which I have suggested for a chess font with pieces on both white and black squares. Although I am hoping that my eutocode graphics system will become widely used in interactive television systems, I accept that it is a specialist application which may be of great interest to some people and of no interest to many other people. Likewise the code points for a chess font, particularly those for chess variants such as Carrera's Chess. However, the symbols for Audio Description, Subtitle, Signing have very widespread use possibilities and so posting my suggested code point allocations for them here in a short note seemed, and still seems, reasonable to me. William Overington 15 July 2003
Re: Combining diacriticals and Cyrillic
Tex Texin wrote as follows. William, You understand Unicode well enough by now, to know that this is an abhorent suggestion. The word abhorent seems rather strong! :-) As the characters can be represented in Unicode by using Cyrillic plus combining diacriticals, to create a proprietary set of codes in the Private Use Area would introduce incompatibilities with other applications that support these characters in the recommended form. No, the Private Use Area codes would not be used for interchange, only locally for producing an elegant display in such applications as chose to use them. Other applications could ignore their existence. Publishing a list of Private Use Area code points would mean that such display could be produced using a choice of fonts from various font makers using the same software to produce the purely local text stream to produce a display without locking together the provision of the software and the provision of the font to the same supplier using an unpublished Private Use Area encoding. Following your recommendation would cause searching, sorting and interchange of Vladimir's data to fail in applications that properly support these characters. No, the Private Use Area codes would not be used for interchange, only locally for producing an elegant display. And it is likely difficult to get other applications to buy into supporting a proprietary solution. Well, the set of Private Use Area codes and the software algorithm of the eutocode typography file could be used or not used or even ignored as each person chooses. It is easier to address the rendering problem that Vladimir has than to unravel the mess your suggestion would create. It isn't even a good recommendation for short term use. Well, as far as I can tell the eutocode typography file and using the Private Use Area to hold the glyphs for the precomposed forms used locally and not for interchange does address the rendering problem which Vladimir asked about. The benefit of a eutocode typography file is that if a software application is produced which uses the information in a eutocode typography file, then, as the eutocode typography file is a Unicode plain text file, the software can be customized using a plain text file. Thus the same software program could be used for languages of the Indian subcontinent, accented Cyrillic characters or indeed many other language characters which someone might want to use, simply by providing a eutocode typography file which includes the rules to translate from Unicode sequences to Private Use Area code points for that particular use. Did I miss something? Why are you recommending the PUA for this use? Well, did you read this bit? quote Software would need to be developed (by you or by other interested people), yet essentially what is needed is software to take an input document and process it according to information in a eutocode typography file. In this way the Private Use Area codes would not be used for interchanging information, yet would be used locally so as to produce an elegant display. end quote I feel that an important thing to remember is the dividing line between what is in Unicode and what is in particular advanced format font technology solutions which some other organizations supply. Those advanced font format technologies may be very good, I do not know as I have no experience of using them, yet they are not suitable for platforms such as Windows 95 and Windows 98, whereas a eutocode typography file approach would be suitable for those platforms and for various other platforms. I am hoping that the eutocode typography file approach with display glyphs added into the Private Use Area will be a useful technique in many areas, including, yet not limited to, interactive broadcasting. William Overington 15 July 2003
[Private Use Area] Audio Description, Subtitle, Signing
There is presently discussion about the symbols to be used to indicate the availability of Audio Description, Subtitle and Signing in television broadcasts. This is being discussed in the Digital_TV and TV_for_All discussion forums at the http://www.cenelec.org webspace. I am suggesting that the following Private Use Area code points be used for the symbols at the present time. This could lead to a useful consistency of encoding for use with interactive television systems. Hopefully regular Unicode code points will be established at some time in the future, these Private Use Area code point suggestions are simply to help in achieving consistency in the mean time. U+F2F0, decimal 62192, Audio Description U+F2F1, decimal 62193, Subtitle U+F2F2, decimal 62194, Signing William Overington 14 July 2003
Re: Combining diacriticals and Cyrillic
A possibly useful thing to do would be to make a list of those characters which you which to produce which are not already encoded as precomposed characters in Unicode, sort them into alphabetical order and publish a list of them with code point assignments in the Private Use Area starting at U+EF00. This would mean that fonts could be produced with each of those precomposed glyphs accessible from a Private Use Area code point. Please know that you can use any code points in the Private Use Area which you choose, yet I am suggesting U+EF00 upwards so that the code points would be consistent with my suggested use of the Private Use Area for interactive television broadcasts. For producing graphics files for the web or for local hardcopy printing it would be possible to use those glyphs directly from the Private Use Area, thereby producing an elegant graphic. As Unicode code point information is not placed in a graphic when lettering is added to a graphic, the result would not show that the Private Use Area had been used. I have devised a method called a eutocode typography file for use with languages of the Indian subcontinent. It would seem potentially useful for your application as well. http://www.users.globalnet.co.uk/~ngo/ast03300.htm As far as I know the eutocode typography file has not yet been implemented in any software applications, it is primarily a suggestion for the future in relation to interactive television yet may be useful elsewhere. http://www.users.globalnet.co.uk/~ngo/ast0.htm Software would need to be developed (by you or by other interested people), yet essentially what is needed is software to take an input document and process it according to information in a eutocode typography file. In this way the Private Use Area codes would not be used for interchanging information, yet would be used locally so as to produce an elegant display. The best long term solution, in my opinion, would be to send in a proposal to the Unicode Consortium to add the precomposed glyphs into regular Unicode. However this takes time and may not be successful and a Private Use Area solution does permit progress to be made now. Please know that my suggestion of publishing a list of Private Use Area code points may be regarded as controversial by some readers of this list and it is possible that you may be advised not to do it by some other readers. However, in my opinion, publication of code points for some uses of the Private Use Area does have some benefits for some applications. In this case it would at least achieve some consistency amongst those font makers who might like to add the precomposed characters into existing fonts. In relation to advanced format fonts the use of the Private Use Area code point in addition to the encoded access method does have the benefit of allowing access to the glyphs to people who are using a PC which does not have facilities for using the encoded access method of the advanced format font. William Overington 14 July 2003 -Original Message- From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Date: Thursday, July 10, 2003 10:23 AM Subject: Combining diacriticals and Cyrillic Dear Ladys and Gentlemen, Currently there is an ongoing effort in Bulgaria trying to resolve an issuie concerning the way we write in Bulgarian. Our problem is: Usually a bulgarian regular user does not need to write accented characters. There is one middle-sized exclusion of this, but generally we do fine without accented characters. The problem is that in some special cases or more serious lingustic work, one definetely needs to be able to write accented characters (accented vowels). One of the ideas is to invent a new ASCII-based encodings, containing the accented characters we need. This would introduce an additional disorder in the current mess of cyrillic encodings, and would introduce problems with automated spellcheck. Generally I beleive it would be best to invent a Unicode based solution. Such a solution is for example, combining diacritical signs with the cyrillic symbols. I composed a demo page: http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/ and then made 10-20 shots of the results on Opera and IE on Linux, Windows 98 and Windows XP: http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/shots.html You can see that this approach yields _quite_ incosistent and useless results, depending on the font, application and operating system being used. Finally, I wonder if you could give us some advice: 1. Is it possible somehow to improve this approach? I imagine eg., if the font can provide prepared combined symbols whenever the application asks for a combined cyrillic+diacritical, instead of leaving the application to do the combination. 2. Do you see other unicode based approach to the Bulgarian problem? 3. Do you beleive the approach should be looked for outside Unicode? Please excuse me for wasting your time
Re: Revised N2586R
Michael Everson wrote as follows. At 08:44 -0700 2003-06-25, Doug Ewell wrote: If it's true that either the UTC or WG2 has formally approved the character, for a future version of Unicode or a future amendment to 10646, then I don't see any reason why font makers can't PRODUCE a font with a glyph for the proposed character at the proposed code point. They just can't DISTRIBUTE the font until the appropriate standard is released. That's correct. Well, certainly authority would be needed, yet I am suggesting that where a few characters added into an established block are accepted, which is what is claimed for these characters, there should be a faster route than having to wait for bulk release in Unicode 4.1. If these characters have been accepted, why not formally warrant their use now by having Unicode 4.001 and then having Unicode 4.002 when a few more are accepted? These minor additions to the Standard could be produced as characters are accepted and publicised in the Unicode Consortium's webspace. If the characters have not been accepted then they cannot be considered ready to be used, yet if they have been accepted, what is the problem in releasing them so that people who want to get on with using them can do so? Some fontmakers can react to new releases more quickly than can some other fontmakers, so why should progress be slowed down for the benefit of those who cannot add new glyphs into fonts quickly? For example, symbols for audio description, subtitles and signing are needed for broadcasting. Will that need to have years of waiting and using the Private Use Area when it could be a fairly swift process and the characters could be implemented into read-only memories in interactive television sets that much sooner? Why is it that it is regarded by the Unicode Consortium as reasonable that it takes years to get a character through the committees and into use? Surely where a few characters are needed the Unicode Consortium and ISO need to take a twenty-first century attitude to getting the job done for people's needs rather than having the sort of delays which might have been acceptable in days gone by. The idea of having to use the Private Use Area for a period after the characters have been accepted is just a nonsense. William Overington 26 June 2003
Re: Revised N2586R
Peter Constable wrote as follows. the name is simply a unique identifier within the std. Well, the Standard is the authority for what is the meaning of the symbol when found in a file of plain text. So if the symbol is in a plain text file before or after the name of a person then the Standard implies a meaning to the plain text file. A name may be somewhat indicative of it's function, but is not necessarily so. Well, that could ultimately be an issue before the courts in a libel case if someone publishes a text with a symbol next to someone's name. A key issue might well be as to what is the defined meaning of the symbol in the Standard. Certainly, the issue of what a reasonable person seeing that symbol next to someone's name might conclude is being published about the person might well also be important, even if that meaning is not in the Standard. You could call it WHEELCHAIR SYMBOL, but that engineering of the standard is not also social engineering, and people may still use it to label individuals in a way that may be violating human rights -- we cannot stop that. No matter what we call it, end users are not very likely going to be aware of the name in the standard; they're just going to look for the shape, and if they find it, they'll use it for whatever purpose they chose to. Certainly. Yet a plain text interchangeable file would not have the meaning built into it by the Standard. I agree though that there may well still be great problems. William Overington 26 June 2003
Re: Nightmares
Tom Gewecke wrote as follows. My personal idea of an Orwellian nightmare would to have a committee of vigilant freedom protectors evaluating the political and social implications of encoding symbols and passing judgement on whether particular characters should be encoded and what their names should not be. Yes, I agree that would be terrible. The difference of your personal idea of an Orwellian nightmare from what I am suggesting should take place is great. I am suggesting that everybody, as part of their activity in character encoding, be vigilant that what is encoded does not provide an infrastructure for an Orwellian nightmare to take place with computing systems such as databases. The difference is like a country having a special riot police force and having regular police who wear riot gear when the need arises. This distinction was stressed when police in riot gear were first seen on the streets in England, as the television news began by using the term riot police. So I am not suggesting such a committee, just ordinary regular people who encode characters being vigilant about the political and social implications of what they are doing, lest by not concerning themselves with such an important aspect of their work, namely the potential for causing misery, the opportunity for such misery to occur is unthinkingly provided or is not prevented when it easily could be prevented. Hopefully this will clarify my thinking to you and hopefully be of interest to people involved in character encoding discussions. One of the great issues of the last century was as to whether scientists should consider the political and social implications of their work or just work as if somehow separate from society and leave the application of the things which they discovered and developed to politicians and business people. This issue has arisen because of my concern that a particular symbol has been labelled as HANDICAPPED SIGN. I hope that the name will be changed to WHEELCHAIR SYMBOL. Yet what if my concerns over the need for vigilance were now dismissed? What characters might be encoded in the future with what names? After all, if no one is willing to be vigilant because that very vigilance is regarded as an Orwellian nightmare, there would then be no constraints. I am very much someone who believes in the need for checks and balances. I feel that we need checks and balances in what is encoded and what names are applied to symbols. I also feel that we need checks and balances as to how those checks and balances are carried out. William Overington 26 June 2003
Re: Revised N2586R
I am rather concerned that the name HANDICAPPED SIGN is being used without any justification or discussion of the name of the character. The Name Police approved. ;-) I am rather concerned about the Orwellian nightmare possibilities of this and believe that vigilance is a necessary activity to protect freedom. Oh, spare us. Well, it is like the Millennium bug problem. People took it seriously and spent a lot of time and effort in preventing it causing chaos. When nothing happened a news anchor on British TV in early January 2000 asked an expert in the studio if, as nothing had happened, all the concern had been just a lot of hype. The expert explained that it was only because of the concern and the care taken that nothing had gone wrong on 1 January 2000. In like manner I feel that it is very important that care be taken now over issues such as the possibility of an Orwellian nightmare then when it does not happen although we might not be sure whether our vigilance prevented it happening or whether it would not have happened at all, nevertheless it will not happen: whereas if we do not bother who knows what practices might exist with databases in ten or twenty years time. Likely WHEELCHAIR SYMBOL is a more accurate name. That is a good suggestion. Perhaps WHEELCHAIR SYMBOL could be used instead of HANDICAPPED SIGN please. A guiding principle for encoding symbols could be that the description applies to the symbol not to any person whom it might be used to describe in some applications. There is a DISABILITY SYMBOL http://www.mdx.ac.uk/awards/disable.htm which is different; it's called the TWO TICKS SYMBOL as well. Where I have seen the two ticks symbol in use is to indicate in brochures and advertisements that an organization claims to take care to treat people who have disabilities in a fair manner, doing what is necessary to help them use facilities or be employed. It is not applied, as far as I know, to individuals who have a disability. An Orwellian nightmare scenario of just encoding the symbols and leaving it to people who use Unicode as to how they use the symbols is not attractive. Rein in those hares, William, please. Well, I realize that what I say may, at first glance, possibly appear extreme at times, yet please do consider what I write in an objective manner. If Unicode has a WHEELCHAIR SYMBOL then that is a symbol, if Unicode encodes a HANDICAPPED SIGN then that is a description of someone to whom it is applied, a Boolean sign for all, whatever the disability may be, whether it is relevant to the matter in hand or not. I do wonder whether the encoding of the symbol as HANDICAPPED SIGN would be consistent with human rights as it would be assisting automated decision making with a Boolean flag and providing an infrastructure for such practices. However, hopefully those of you who have the power to vote on these matters will act to change the name from HANDICAPPED SIGN so as to take account of these concerns. For me, WHEELCHAIR SYMBOL seems fine as the name simply describes the symbol. However, it may be that other people might have other views on the name. William Overington 25 June 2003
Re: Revised N2586R
Michael Everson wrote as follows. I do the best I can. At the end of the day my document won its case and the five characters were accepted. This raises an interesting matter. In that the document proposes U+2693 for FLEUR-DE-LIS it would seem not unreasonable for fontmakers now to be able to produce fonts having a FLEUR-DE-LIS glyph at U+2693. However, what is the correct approach? Is it that the characters must remain either unimplemented or else implemented as Private Use Area characters until Unicode 4.1 or whatever is published, notwithstanding that the hardcopy Unicode 4.0 book is not yet available? That will probably take quite some time. It appears to me that there should be some system devised so that when a few extra symbols are accepted into an already established area that those characters can be implemented in a proper manner much more quickly than at present. However, such speeding up of the process might not always be a benefit. For example, the proposed U+267F which has, in the document the name HANDICAPPED SIGN could, if there were a fast track process, be all the more quickly incorporated into databases as a way for officials to make automated decisions about people much more conveniently without considering the individual circumstances of each person so tagged. I am rather concerned that the name HANDICAPPED SIGN is being used without any jusitication or discussion of the name of the character. The character has now been accepted it appears. I am rather concerned about the Orwellian nightmare possibilities of this and believe that vigilance is a necessary activity to protect freedom. Just think, data about someone can be expressed with one character which can be sent around the world to be stored in a database which is not necessarily in a jurisdiction which has laws about data protection. Automated decision making is a matter covered by United Kingdom data protection law, yet does the law have any effect in practice? For example, some credit card application documents now have in the small print items about the applicant agreeing to accept automated decisions. And also, does every user of computer equipment obey the law? I gather that in the United States there is a concept of a Social Security number and that it has now become the widespread practice that people who are nothing to do with the administration of social security now routinely ask (and maybe even require) someone to state his or her social security number before they can do anything. I wonder what is the effect of saying that the number is for social security purposes and one is not willing to state what it is. Perhaps even questioning why that information is needed will go against one. The issue of the name for what Michael has named as HANDICAPPED SIGN needs, in my opinion, some discussion. If that discussion widens into what purposes for which Unicode could or should be used and whether the political and social implications of encoding symbols is something of which people should be aware, then fine. For example, would DISABILITY LOGO be a better name? I have seen the logo used in signs in shops with the message Happy to help referring to help for people with any disability where help is wanted, not just for people in wheelchairs. So having the logo in fonts so that such signs could be printed might well be helpful. Yet I feel that some discussion about the implications of encoding this logo need to take place, particularly as the N2586R document suggests as seemingly obvious the potential for use in databases. For example, could the sign be made as not to be interchanged? Is it best not to encode it in Unicode at all as being too dangerous in some of its potential applications? If this symbol is implemented without some protection for rights, could there be a basis for compensation by someone disadvantaged by the use of such a symbol in a database? An Orwellian nightmare scenario of just encoding the symbols and leaving it to people who use Unicode as to how they use the symbols is not attractive. William Overington 24 June 2003
Re: Address of ISO 3166 mailing list
Tex Texin wrote as follows. Marion, It is very easy to start your own list at http://www.yahoogroups.com You can create lists for 3166, as well as for hiberno-english etc. Other Unicode folks have created specialized lists for their own purposes. A feature of Yahoo groups is the Yahoo rules about intellectual property rights regarding postings and also the indemnity rules. As regards intellectual property rights, if someone posts then if later he or she wishes to publish a book and the publisher asks if any person or company owns any intellectual property rights in relation to the material in the book, then the answer might properly be, yes, Yahoo. That then may mean that exclusive rights cannot be assigned to a publisher and then the publisher cannot make a claim against anyone for infringement of copyright because the publisher does not have exclusive rights. I am not a lawyer, yet I do urge caution as to what intellectual property rights problems may be caused if one posts in a Yahoo group, which do not occur if one posts in this forum. There is also the indemnity rule. It appears that if someone posts in a Yahoo group and someone somewhere claims against Yahoo, then the poster and maybe the person who started the group are liable to Yahoo for expenses, including lawyers fees. There appears to be a danger that if someone made even a wild, spurious claim in a court and Yahoo needed nevertheless to defend it lest it win by not being answered, then the person who starts the Yahoo group could be liable for the cost of Yahoo's lawyers. William Overington 5 June 2003
Re: Rare extinct latin letters
Peter Constable wrote as follows. William Overington wrote on 06/02/2003 01:06:25 AM: I am wondering whether the range from U+F200 through to U+F2FF is being used by anyone for anything. This is a nonsense question. It should never matter to person A whether others are using particular PUA codepoints *unless* person A needs to interchange with person B, in which case A and B need to agree on that range if A intends to use it in interchaning with B. Suppose person Ai and person Bi are both people with an interest in the texts which contain these particular rare extinct ligatures.and wish to exchange documents, which they have keyed themselves, over the internet and view them using a package such as, say, Microsoft WordPad. I use Ai and Bi to mean some particular pair of persons A and B. You wrote never, so one counter example will disprove the generality of your claim. Neither Ai nor Bi has facilities to make fonts, so they need to rely on having a font made by a third party. They have a better chance of having a font to use if the characters are added into an existing font which already has many other characters in it, such as the basic latin alphabet and punctuation, so that only the rare extinct latin letters represent special drawing work, rather than the whole font. So, if they look at fonts such as, for example, Code2000, Gentium and Junicode and observe which Private Use Area code points are already in use within that font, then choose code points for the rare extinct latin letters which code points are not used in the fonts at which they look, then the chances of getting their chosen characters implemented in those fonts will be increased. For example, considering my own Quest text font. If Ai and Bi choose to place their characters in the U+E7.. block or the U+EB.. block, then I would not implement them in Quest text. However, if they place them in the U+F2.. block, then I might well try to have a go at adding them in. I recognize that the lettering style of Quest text might not be appropriate to those characters and Quest text might not be liked as a display face by Ai and Bi, yet please allow me some latitude in this as I am trying to explain my thoughts without speculating about the thoughts of some other person who produces a font which might have a face design considered more appropriate to the particular application. So, bearing in mind my knowledge of some uses of the Private Use Area I thought that the U+F2.. block looks prima facie reasonable, in that it avoids code points used for Tengwar, for Phaistos Disc, for Ewellic, for golden ligatures and courtyard codes, while also avoiding the very top end of the Private Use Area. So, instead of simply sending a private email response I posted to the mailing list in the hope that the readers of this forum might like to help the process along of helping the gentleman be able to use those rare extinct latin letters which interest him, in a practical manner. Your question seems to be assuming the community of Unicode users at large can share agreements on PUA assignments, Well, surely they can if they choose to do so. Please note that I am not saying should, must, will or whatever: you used the word can and I answer about can. and in response I'd say that effectively you must assume that every last PUA codepoint is being used by somebody somewhere. I accept that that assumption needs to be made in generalized theoretical considerations, yet in a practical situation of trying to get a few special characters added into one or more existing fonts, it is highly relevant to know which code points are already in use and which are not already in use in a selection of fonts as that information can then be used so as to devise a Private Use Area encoding scheme for the desired characters which has a higher chance of being implemented. (And I can assure you that somebody has their own usefor F200..F2FF.) Well, unless it is a secret or confidential it would be helpful if you could please say what it is, as that information could be used to consider whether a font needing both collections of characters would be likely to be needed for one particular document produced by an end user. William Overington 2 June 2003
Re: Rare extinct latin letters
Patrick Andries wrote as follows. [PA] I believe the need of an encoding may be pragmatically ascertained, I don't known about the « real linguistic value » of an alphabet. I have, by the way, no problem if someone says : « Sorry, too idiosyncratic and excentric ! Use the private user area if you need such characters. » This may well be the case. I suggest that a good idea would be for you to produce a list of which characters you would like and encode them as a Private Use Area encoding and publish the list. That would bring the possibility of being able to use the characters in a Unicode compatible environment one step closer. If they are one day promoted to regular Unicode then fine, otherwise there would nevertheless be a consistent encoding available for anyone who chooses to use it, which would help in interoperability. If you choose to encode them in the Private Use Area, it is entirely up to you which code points you specify within the range U+E000 through to U+F8FF. However, you might like to take into account the code ranges already being used by various fonts which use the Private Use Area as avoiding a clash might increase the chances of the characters becoming added into established fonts such as Code2000, Gentium and Junicode, as well as being added into fonts designed specifically for older French texts. If a set of code point allocations is widely available, then the chances for implementation itself and implementation in an interoperable manner are increased. I am wondering whether the range from U+F200 through to U+F2FF is being used by anyone for anything. So perhaps, if you choose to encode the rare extinct latin letters in the Private Use Area, if anyone who reads this knows of whether U+F200 through to U+F2FF is being used by anyone for anything perhaps he or she might draw attention to the fact in this forum please. William Overington 2 June 2003
Re: default ignorable posts (was Re: Is it true that Unicode isinsufficientfor Oriental languages?)
Peter Constable wrote as follows. Moreover, a while back, I took a look at the forum in which DVB-MHP is being discussed to see how people there responded to your ideas, and discovered that nobody there was interested (as indicated by lack of any response to your posts). If it's not worth discussing in that place, where it is centrally on topic, it's not worth discussing here. A lack of response to a post is not in any way any indication of lack of interest. It might perhaps be that nobody was interested, yet a lack [sic] of any response is no measure of interest or otherwise. If people simply agreed, or thought it interesting and something to possibly bear in mind for the future then there would be no need to reply. Part of the process of the publication option of getting an invention implemented is to place the information before people so that as many of one's ideas as possible are there when the idea gets taken up. Once it is taken up, various people may start adding items as they are needed: the more that the inventor has published and placed before people before taking-up takes place the more of the inventors ideas are likely to be in the implemented system. So publishing the details is important. For example, it might be that my list of Private Use Area code point allocations for multimedia programmed learning authorship within Unicode text files might be printed out and filed by industrial librarians. Although Private Use Area code point allocations have no standing in relation to the Unicode Standard, there is no reason why they should not be used consistently and widely within a specialist domain, such as, for example, digital interactive broadcasting. Indeed, Private Use Area code points could be widely used for some activities such as multimedia authoring generally. I feel that it needs to be pointed out that many people are not allowed to post in public forums or to comment publicly on technical matters and ideas which relate to their employment, so lack [sic] of response to my ideas is no indication of any lack of interest. However, it might indeed be that there is no interest in my code point allocations, yet that is the chance which I, as an inventor, need to take when trying to follow the publication option to get an invention implemented. It worked for my telesoftware invention however, as that invention is now at the centre of digital interactive television systems and the word telesoftware is in the Oxford English Dictionary. William Overington 28 May 2003
Re: Ancient Greek
Chris Hopkins wrote as follows. quote I am a new list member interested in implementing archaic, classical and Hellenistic Greek glyphs in a Unicode font. My initial questions will be focused on handling multiple alternate glyphs for each character, and how to organize a font with several thousand Hellenistic monograms. Is this the appropriate discussion list? If not, I'd appreciate a pointer. end quote This looks an interesting discussion and I hope that you will ask your questions in this forum. The matter of multiple alternate glyphs for each character seems at first a font issue, and it is partly a font issue, yet it is also a Unicode issue once one starts trying to encode a document which is intended to apply those glyphs in some controlled selection manner. For example, are you going to have some texts such as Author A uses the symbol X for beta whereas author B uses the symbol Y for beta. where X and Y are just two of the multiple alternate glyphs which you mentioned? What please is a Hellenistic monogram? I am wondering whether this is going to be a good application of the Private Use Area, either on a permanent basis or on a temporary basis pending making a formal encoding application. In either case, reading about the Private Use Area in Chapter 13 of the Unicode specification available from the http://www.unicode.org webspace may prove interesting. William Overington 4 April 2003
Re: Exciting new software release!
Doug Ewell wrote as follows. quote What happened to LTag? Well, as everybody knows, the Unicode Technical Committee strongly discourages the usage of these tags, to the point were they were almost deprecated earlier this year. They are permitted only in special protocols, and are certainly frowned upon for use in arbitrary plain text, which is what LTag was for. So, in an attempt to restore some of my lost Unicode street cred I removed LTag from my site. I still keep the program around, but only as a reference to ISO 639 and 3166 codes. end quote Well, whether the tags were (italics) almost (end italics) deprecated earlier this year I do not know, yet the fact is that, after a lengthy and extended Public Review process as to whether to deprecate them, the tags were not deprecated but the situation was left broadly unaltered but with some additional notes to be included in the Unicode 4.0 document. It remains to observe what is to be put about tags in the Unicode 4.0 book. Whether tags will be used in interactive broadcasting as a feature used in (italics) some (end italics) content, such as with (italics) some (end italics) generic file handling packages for distance education, remains for the future, yet the option remains open. William Overington 4 April 2003
Re: Exciting new software release!
Doug Ewell wrote as follows. I'll mail it, or maybe repost it, after I finish applying a nice, THICK coating. I'm thinking about one of those expired-shareware message boxes where the OK button is disabled for the first five seconds. But I'd like to get this third-subtag question resolved first. Could you possibly consider making the checking facility a checkbox option please, which comes up already checked, so that explicit unchecking needs to be done in order not to have the checking. I am not thinking of going against recognized standards but always having checking might end up causing problems as time goes on. William Overington 4 April 2003
Re: Exciting new software release!
Stefan Persson wrote as follows. quote Well, let's say that I make a plain text document and include a mathematical formula or funtion such as cos x, it would still be legal to use an italic x from the mathematical block, wouldn't it? This is what those characters are intended for, right? end quote In the days of letterpress printing, something such as y = cos x would have been set with the cos in roman type, probably from an ordinary serifed font, as might be used for ordinary book printing, and the y and the x in the italic version of the same typeface. I remember that the typeface Modern Roman, a serifed face with an upward hook on the end of a capital R character and a very open lowercase e character, was often used, though not exclusively. How should that be set in Unicode plain text? Is it to use the letters for cos from the range U+0020 to U+007E and then use U+1D466 for the y and U+1D465 for the x? I note that U+1D465 MATHEMATICAL ITALIC SMALL X in the code chart has the following text accompanying the definition, following a symbol which looks like a wavy equals sign with the word font within angled brackets which I will not place in this email in case it upsets any email systems, so I will herein use parentheses. (font) 0078 x latin small letter x Yet there would seem to be missing the concept that the character is an italic of a serifed font. When trying the MathText program I tried, as I mentioned before, to try to get MathText to produce Greek characters. This was mainly out of curiosity, having been studying, as part of the process of studying MathText, the U1D400.pdf code chart document rather than any immediate need, though with the thought that such a facility might be useful sometime and that, should such a situation arise, I could perhaps use MathText to generate the codes. Yet which Greek characters would I wish to use? Subsequent study of the U1D400.pdf document raises an interesting matter. I would probably want to use some of those in the range U+1D6FC MATHEMATICAL ITALIC SMALL ALPHA through to U+1D71B MATHEMATICAL ITALIC PI SYMBOL. However, whereas I might well want to use U+1D6FC, the U+1D71B is a symbol which I have not seen before and indeed wonder what it is, bearing in mind the existence of U+1D70B MATHEMATICAL ITALIC SMALL PI. Yet the interesting point which has arisen is this. The most common use of such italic letters would seem to be, from my own potential usage, would be for angles theta, phi and psi for expressing rotation angles. U+1D713 MATHEMATICAL ITALIC SMALL PSI for psi. U+1D703 MATHEMATICAL ITALIC SMALL THETA for theta rather than using U+1D717 MATHEMATICAL ITALIC THETA SYMBOL. U+1D719 MATHEMATICAL ITALIC PHI SYMBOL for phi, rather than using U+1D711 MATHEMATICAL ITALIC SMALL PHI. I seem to remember a discussion in this group about the two versions of phi in relation to ordinary Greek characters some time ago. William Overington 4 April 2003
Re: Exciting new software release!
It certainly is exciting! I learn a lot from your fun Doug. I remember when we had The Respectfully Experiment and I asked you how you managed to get the U+E707 character into your message and you mentioned the SC UniPad program from the http://www.unipad.org webspace. That program is very useful for various purposes, I have used it in relation to preparing text with colour codes for research about broadcasting and indeed I have been using it to analyze the output from using your MathText program. Some information about the colour code experiments, and a link to a font with which one can experiment, are in the following web page. http://www.users.globalnet.co.uk/~ngo/font7001.htm I used a file, produced using Notepad, named mathin.txt with the following text. This is a test. I processed this file through MathText using the Fraktur style using mathout.txt as the output file. I then used File | Open in SC UniPad to open the file mathout.txt as a UTF-8 file. There was the display in Fraktur letters. Wow! So, I then did an Edit | Select All on the Fraktur text, followed by an Edit | Convert | Unicode to UCN. This gave a stream of ordinary text in \u and \U format, each \u sequence having four hexadecimal characters after the \u and each \U sequence having eight hexadecimal characters after the \U. Wow again! I did not realize that SC UniPad would do such a conversion! These tests were carried out on a PC running Windows 98. I am now wondering whether I can convert the text into surrogate pairs so that I can both read the \u sequences for the surrogate pairs in SC UniPad and so that I can copy the surrogate characters themselves onto the clipboard for pasting into the text box of a Java applet. Have you considered the possibility of a similar program to encode a string of ASCII characters as plane 14 tags please, with an option checkbox to include the U+E0001 character at the start and an option checkbox to include a U+E007F character? That would be a very useful program which could be used in conjunction with SC UniPad to marshall plain text which uses language tags. Such a program would be a very useful tool to have available for access level content production for use for producing content for free to the end user distance education for broadcasting around the world upon the DVB-MHP platform for interactive television. Recently I was thinking about the possibility of defining a few Private Use Area characters in one or both of planes 15 and 16. This being so as to try to gain experience of applying those Private Use Areas up in the mountains for use if and when such use becomes desirable. I am thinking of the long term possibility of a music font being defined there as one possible application. However, for the moment, something more general, such as a few symbols for vegetables, just to gain experience of what is involved. For example, how would one produce a display (not necessarily a web page display) of the text of the following song together with a few graphics of vegetables if the whole document were encoded as plain text with the illustrations of the vegetables encoded as Private Use Area characters from plane 15 or plane 16? http://www.users.globalnet.co.uk/~ngo/song1015.htm As a direct consequence of using SC UniPad with characters from beyond plane 0 as a result of your posting I have found that the CTRL Q facility of SC UniPad may be used to enter five and six character hexadecimal sequences which are within the Unicode code space and that such characters may then be converted to the \U and eight hexadecimal characters format. Looking at the U1D400.pdf document for which you provided a link in your document about the program and considering the MathText dialogue box, I am wondering whether one can set out on with an ASCII file produced with Notepad and use MathText to reach the various mathematical Greek characters shown in the U1D400.pdf document. Is that possible at present? I tried with an Alt 130 and an Alt 225 in the .txt file following A and B and before C and D and requested Bold of MathText just to see what happened, but only the A and B came out. Thank you for posting details of an interesting program which is a catalyst for interest in applying the higher planes of Unicode. William Overington 3 April 2003
Displaying languages of the Indian subcontinent upon the DVB-MHP platform.
John Clews wrote as follows. quote In fairness, you ought to take account of the fact that languages of the Indian subcontinent have been displayed on TV systems in India for nearly ten years, based around ISCII. Is there a reason for doing anything different for DVB-MHP? Or are mappings and similar accounted for in your paper? end quote Many thanks for your note. The whole text of my document is in the posting. I know very little about Indian languages. It is just that I know that DVB-MHP uses Java and Unicode and that the built-in font for the minimum DVB-MHP television set is very European languages oriented. Please see Annex E of the DVB-MHP specification, available from the http://www.mhp.org webspace. I have suggested in another document in the DigitalTV forum in the http://www.cenelec.org webspace some additional characters which I feel would be good additions for the built-in font for the European Union interactive television. This is entirely in compliance with the DVB-MHP specification, as the specification provides many options for local implementation and specifies a minimum implementation. Within the European Union there will potentially be a local implementation though covering the whole of the European Union, so suggesting some extra characters to be in the built-in font of all such televisions is just part of the process of deciding which options to include in the local implementation for the European Union. I am simply trying to point out that using languages of the Indian subcontinent upon the DVB-MHP system with its PFR0 font system may cause problems, in the hope that experts will look at the problem soon as there at present seems to be little (or maybe no) intersection in the set of people who know about DVB-MHP and the set of people who know about languages of the Indian subcontinent expressed in Unicode. I am concerned that if nothing is done there will be problems in a few years time with lots of interoperability problems, whereas looking at the problem now could save a lot of problems later. The possibilities for using the DVB-MHP system for education around the world are enormous. The Indian subcontinent is one major potential area of such use. I am concerned to try to ensure that there is a good infrastructure in place so that the languages of the Indian subcontinent may be used in a Unicode manner upon the DVB-MHP platform in a straightforward manner. On the specific matters which you mention, I had no knowledge of the fact that languages of the Indian subcontinent have been displayed on TV systems in India for nearly ten years, based around ISCII. That is interesting to know. Is that on a teletext system or what? However, as far as I know ISCII is an 8-bit encoding system (and I do mean as far as I know because I am not certain of that) whereas DVB-MHP uses Unicode. So not including mention of ISCII in my document is no problem as far as I know. Mappings from ISCII to Unicode are not mentioned in my document. I started from considering that someone had encoded some text written in a language of the Indian subcontinent into Unicode and that it was in a text file ready for broadcasting and considered the process of getting the text displayed upon the screen of a DVB-MHP television. I then pointed out what, to me, seems a problem which presently exists, in that a PFR0 font, as far as I can tell, is not a smart font format. I am hoping that, by having published the paper in the DigitalTV forum in the http://www.cenelec.org webspace that the matter may be resolved within the context of the setting of content authoring guidelines for interactive television which are to be produced for the European Union. Certainly, I cannot resolve the matter myself as I do not have the linguistic knowledge necessary to do so. I suggest using glyphs mapped to the Private Use Area from U+EC00 for this specific application of Unicode upon the DVB-MHP platform, though not broadcasting using those code points in relation to languages of the Indian subcontinent, just using them locally for font access after they are generated using a eutocode typography file. I have since been wondering what is the position of displaying Arabic text using a PFR0 font upon the DVB-MHP platform. Does a similar problem exist? Would the set of Arabic presentation forms encoded into Unicode be sufficient for the task, so that a Private Use Area encoding would not be necessary? The right to left display of Arabic text is another factor which needs consideration in relation to the DVB-MHP system. Thank you for your interest. William Overington 3 April 2003
Re: Exciting new software release!
In the interests of some fun research in the hope that the fun will lead to learning in some serendipitous manner I am starting off some Private Use Area codes for vegetables. U+10F700 POTATO U+10F701 CARROT U+10F702 PARSNIP U+10F703 PEA U+10F740 PEAS IN A POD U+10F780 LEAF OF MINT U+10F781 LEAF OF SAGE These should be enough to get started in experimenting with the way that Private Use Area characters from plane 16 can be applied and finding out what the problems are in relation to any particular platforms, file formats and font technologies. William Overington 3 April 2003
Displaying languages of the Indian subcontinent upon the DVB-MHP platform.
I have now completed and published my document on the topic of displaying languages of the Indian subcontinent upon the DVB-MHP platform. DVB-MHP stands for Digital Video Broadcasting - Multimedia Home Platform. Details of the DVB-MHP system are available from the http://www,mhp.org webspace. There is also the http://forum.mhp.org webspace which may be joined online. DVB-MHP is likely to become the common interactive television standard throughout much of the world. However, the standard provides many options, defining a minimum system and leaving open many options for implementation decisions. The document which I have recently completed is published in the DigitalTV forum in the http://www.cenelec.org webspace. This forum is a specialist forum regarding the implementation of interactive television, using the DVB-MHP system, within the European Union, involving such issues as interoperability. Readers interested in joining this forum may like to know that an email address for making application is [EMAIL PROTECTED] and that there is also at present a notification about the purpose of the forum available using a link in the http://www.cenelec.org webspace. The fact of membership is visible to other participants. Although not mandatory, it is quite likely that what is decided for use within the European Union will be used in many countries which are not within the European Union. The file has the following name, in accordance with the file naming conventions of the forum. DigitalTV_WJGO0005_Languages_of_the_Indian_subcontinent.txt A transcript of the text of the document is below. William Overington 2 April 2003 Displaying languages of the Indian subcontinent upon the DVB-MHP platform. I wonder if I may please draw your attention to a potential problem with the displaying of the languages of the Indian subcontinent upon the screens of DVB-MHP interactive televisions. The DVB-MHP system uses Unicode. The DVB-MHP system also uses a Portable Font Resource PFR0 font. I am not a linguist so I am simply mentioning the following document. http://www.unicode.org/book/ch09.pdf It is Chapter 9 of The Online Edition of The Unicode Standard, Version 3.0, the chapter being entitled South and Southeast Asian Scripts. It appears, as far as I can tell, that a PFR0 font cannot display the languages of the Indian subcontinent directly from a sequence of Unicode characters. The Online Edition of The Unicode Standard can be downloaded from the following web page, chapter by chapter. http://www.unicode.org/book/u2.html The main index page of the Unicode web site is as follows. http://www.unicode.org I have thought out what I consider to be a way to solve the problem of displaying the languages of the Indian subcontinent using software within a Java program running upon the DVB-MHP platform. The method is described in the following document. http://www.users.globalnet.co.uk/~ngo/ast03300.htm The method uses what I have called a eutocode typography file. However, in order for the method to be highly effective for the DVB-MHP platform at an interoperability level, what is really needed is a standardized list of glyphs for displaying the languages of the Indian subcontinent so that those glyphs may be mapped to U+EC00 onwards of the Private Use Area of Unicode. This would not be essential, yet is, I feel, highly desirable, because if such a list can be produced and the same list used by all content authors who produce content for broadcasting upon the DVB-MHP platform using languages of the Indian subcontinent, then lots of repeated work can be avoided in the future and there will be advantages for interoperability of font generation. For the avoidance of doubt, please know that I am not suggesting that those Private Use Area code points be used for broadcasting text. Text would be broadcast using regular Unicode code points. The reason for assigning the glyphs to code points is so that the incoming text stream can be converted into a local, within the television set, text stream which can be used to access the PFR0 font so as to enable the correct glyphs to be displayed upon the screen. Study of the document above will show that that particular choice of Private Use Area code points could also protect against any broadcasting of the languages of the Indian subcontinent using those Private Use Area codes to access the glyphs directly as those code points when broadcast could be regarded as data for a vector graphics system which could be used for drawing illustrations within a document. The vector graphics data does not need to access the font, so the locations in the font can be used for this purpose on a local, within the television set basis. If such a list can be produced within the context of the setting of content authoring guidelines which are to be produced for the European Union, with appropriate liaison with the government of India, then the task can be carried out once within
Re: Characters for Cakchiquel
. Actually, I was rather hoping that the start of a Private Use Area encoding might be produced by a few interested people fairly quickly, perhaps in this thread or in some email correspondence. Once that is done, then font support could gradually be produced. William Overington 29 March 2003
Re: Characters for Cakchiquel
Phil Blair wrote as follows. quote 2.The Jesuits and other missionaries of the Age of Exploration worked and published intensively in then-exotic languages on four continents. There are scholars and groups of scholars now attempting to look systematically at that body of work. I suspect that there is no stange character that could turn up in a Maya text from that period that wouldn't also turn up in texts about South American, Asian, or African languages, and when we do deal with these characters it would be best to do it in a systematic and comprehensive way. They will all reflect a common origin in the missionary training institutions of Europe. end quote That research sounds fascinating. Do you have any details of who is doing the research please? I am not a linguist yet do have a great interest in the typographical aspects of the way special characters were printed by the early printers. I also have an interest in history so such a project would be doubly interesting for me. I suggest that a good idea would be if those of us who are interested could research the typography and printing aspects and that a Private Use Area encoding could be made of the special characters. Then various craft fontmakers might all use the same encoding and start to produce fonts which contain the characters. For example, as a first suggestion, if U+E400 and upwards were used for that purpose, would that be a suitable choice for the various font makers who might like to consider adding such characters into their existing fonts? The long term goal would be to get the characters promoted into regular Unicode, yet using the Private Use Area would allow documents to be encoded rather sooner than if one needs to wait for encoding into regular Unicode and any such documents encoded could be converted by an automated process at a later date. Indeed, using the Private Use Area in this manner and having font availability might help the research. My suggestion of U+E400 is as a basis for discussion: does anyone happen to know if the researchers have already started a Private Use Area encoding please as that possibility needs to be checked before starting a new encoding? Does anyone happen to know if any of the metal fonts, or matrices, of such characters survive from the sixteenth century please? From the general history of printing there does seem to be a great lack of surviving early printing type, which has always seemed strange to me, as well as unfortunate. William Overington 28 March 2003
Re: Custom fonts
Pim Blokland asked as follows. quote Now my suggestion was the browser program which displays this file should be able to look at the font information in the XML file, open the font file and retrieve the names of all characters in it, so it can show the hwesta; character (and all other characters) without needing a long list of ENTITY entries in the XML. Anyone else think this would be a good idea? end quote Well, I think it would be a good idea. Could you explain it further please? For example, starting from a golden ligatures collection character ct ligature, which I have designated as U+E707 within the Private Use Area within the golden ligatures collection. Does this mean that for each Private Use Area item which I specify I would need to specify a single word name for use in such constructs? I am happy to do that, thinking that g_ct would be a suitable name for the golden ligatures ct item. I could fairly easily devise such names for most of the golden ligatures collection and with a little thought will hopefully be able to devise suitable names for the rest. Am I right in thinking that this system will only really work if the names are unique, so that if someone else devises a code for ct at some other code point then it is important that the name for that usage is other than g_ct or is it not essential, though just desirable, for the names to be unique? Can you possibly post an example of what files would need to carry which information please so that the g_ct name could be used in the manner which you suggest? William Overington 19 March 2003
Re: List of ligatures for languages of the Indian subcontinent.
might like a copy. It is not a Unicode font, so that it can be easily used with the Paint program, though I am considering a Unicode version yet wondering quite how best to encode it. http://www.users.globalnet.co.uk/~ngo/OLD_NEW_.TTF William Overington 18 March 2003
List of ligatures for languages of the Indian subcontinent. (from Re: per-character stories in a database)
And nobody out there is volunteering to do it. I would do it gladly, but I do not have any skills at Indian languages. My opinion is that the list is important for the future of digital interactive broadcasting so I am trying to get the list done so that it is ready for use in displaying distance education texts in interactive broadcasting situations across the Indian subcontinent using my telesoftware invention. I was told that I could commission it. I described what I thought was a good design brief for the list and asked how much it would cost. I am still waiting to find out. A lot of the information needed to prepare the numbered list is apparently in files, it is just that it is not available to people. If the Unicode Consortium really does not wish to include this important project within its scope, then it will need to be achieved in some other manner. I would have thought that whether the Unicode Consortium will take this project on or not should go to a formal board meeting of the Unicode Consortium so that there can be no doubt whatsoever of the provenance of any decision. William Overington 17 March 2003
Re: per-character stories in a database (derives from Re: geometric shapes)
then be such that royalties go to the United Nations for ever to help with health care around the world. Just think, the operas of Gilbert and Sullivan go out of copyright in about two years time, so perhaps they could be the providers of such money. The idea is not perhaps as far fetched as it might at first sound. Please look at the provisions in British Law (in an Act of 1988 about intellectual property I think) where there is a specific provision in relation to the one work Peter Pan. Where the author of a work is a corporation the last seventy years of copyright starts ticking right from publication, so perhaps some of the great movies would soon produce such income. Just an idea at present, but maybe once this posting goes around the world lots of people might think about it and maybe someone can get it done. William Overington 15 March 2003
Re: Ligatures fj etc (from Re: Ligatures (qj) )
Yesterday, 13 March 2003, I wrote as follows. quote So I reasoned that the system might scan through a font when it is loaded and decide upon the lowest point for the whole font and then proceed on that basis. end quote An email correspondent has kindly written to me privately and I now know that it is not necessary for an application such as a wordprocessing package to make a complete survey of all the glyphs in a font as the font is being loaded, because the information on what are the high and low points for the font is readily available in predefined locations within the font. I expect that many readers of this list already know that, yet I feel that I should post this note in case some readers do not because I would not want to have set them off on a wrong way of looking at how a system works. William Overington 14 March 2003
Unicode 4.0 chapter headings and numbering.
I wonder if you could please say whether the Unicode 4.0 book will have the same chapter headings and numbering as the Unicode 3.0 book? My reason for asking is that I am writing a paper about the possible problems with using languages of the Indian subcontinent on the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) interactive television platform where the PFR0 Portable Font Resource system is used for those fonts which are broadcast. DVB-MHP uses Java and Unicode. I want to refer to Chapter 9 South and Southeast Asian Scripts as the place to look for the details of what is necessary, yet the paper needs to be usable both before and after the publication of Unicode 4.0, so I would like to know if the chapter headings and numbering will be unchanged please as I would like simply to refer readers to Chapter 9 South and Southeast Asian Scripts of the Unicode Standard at the http://www.unicode.org webspace. Also, if unchanged, is that a matter of continuing stability for future issues as well, or is it just for Unicode 4.0 please? William Overington 14 March 2003
Re: per-character stories in a database (derives from Re: geometric shapes)
Markus Scherer wrote as follows. quote It has been suggested many times to build a database (list, document, XML, ...) where each designated/assigned code point and each character gets its story: Comments on the glyphs, from what codepage it was inherited, usage comments and examples, alternate names, etc. I am talking about both code points and characters on purpose, and I would go a step beyond documenting what's there. All the characters that can be represented by a sequence of assigned Unicode characters should be listed, with that sequence (or those sequences), and with further explanation if necessary. end quote Yes, that is a very good point. I have become interested in the languages of the Indian subcontinent from the standpoint of trying to ensure that they can be displayed properly using interactive television using portable font technology, however I am not a linguist and I find it strange that the Unicode Standard does not codify the ligatures which can be produced with the languages of the Indian subcontinent at display time using specific sequences of regular Unicode characters so that someone skilled in the art of font design may design a font from the code charts. Later he wrote. quote Now we just need to - find someone to sponsor this effort technically and with humanpower - squeeze the existing information out of the standard, the mailing lists, FAQs, and of course out of the Unicode veterans before they retire by Unicode 6... end quote Well, how about an approach like Project Gutenberg uses for proofreading transcripts of classic books. If there were a database where people could post items about particular characters and people could read them and either confirm what is said or put some other view or just add some other information, then maybe the database could just sort of gradually become generated over a period of years. How big would that be? About 100 thousand code points at, say, 200 words for each on average at about 5 or 6 characters per word on average with a space following each word would be about 130 megabytes in total. I fully realize that the phrase sort of gradually might easily be quoted in a response to this posting, yet if the database facility were there, accessible directly from the web, there may well be many people who would stop by for a while and review what has been entered and add a little more to the database. PS: Sorry, I am not in a position to volunteer... Well, it could be more of an informal thing. If the facility were set up, then people who are interested could simply visit the web site when they felt like participating. Certainly there might be a core of people who had the ability to throw out rubbish and to convert fragments of text into a good English narrative so that there was some overall structure to it all, yet it does not necessarily need to be as formal and rigid as if it were a commercial project with a time deadline, particularly if the alternative is that it does not get done at all. William Overington 14 March 2003
Ligatures fj etc (from Re: Ligatures (qj) )
Thank you both for your responses. Yes, U+2502 or U+2503 would achieve the desired effect for which I devised U+E700 STAFF without resorting to the Private Use Area. The only reason for my not using one of those was that I was unaware of those codes as such. An interesting point is that they appear to be usable with fonts which have descenders yet still fill the entire height of the font. I suppose that when I had, some time ago, when looking through what Unicode offers, in a general context, not looking for the STAFF effect at that time, seen the box drawing characters I thought of those characters in the context of the character set of the old PET computer from the 1970s and of the way that some software on older non-graphics terminals on mainframe computers makes an attempt at message windows using such characters to construct boxes. Indeed, an interesting footnote to U+2502 states = Videotex Mosaic DG14. I cannot quite remember what Videotex was. I remember Videotext (with a t at the end) and seem to remember that Videotex (no t at the end) was a different system, possibly from the USA or maybe France. There was also a system which started called NAPLPS, which was an acronym for something like North American something and the word Presentation was in it, though I forget the exact acronym derivation. I was unaware of the VDMX table and so had a look at http://www.yahoo.com and found a couple of useful documents. However, VDMX appears to refer specifically to OpenType rather than ordinary TrueType. My reason for including the STAFF character, the intended effect of which I can now produce using U+2502 or U+2503, was that, being fairly new to producing fonts and just, thus far, using the Softy editor to produce ordinary TrueType fonts, I had noticed, when trying it out in 2002, that if I produce a font with a b c d e f then the font displays with lines packed togather, yet that if I then add g the line spacing for all lines increases, even if there is no g in that line. So I reasoned that the system might scan through a font when it is loaded and decide upon the lowest point for the whole font and then proceed on that basis. Now, in defining Quest text I wanted to have the possibility of accents on capital letters and descenders such as y and g and always look clear, so I decided effectively to lock some leading into the font and set the maximum height right from the start. Features of Quest text are that it is designed so that characters are produced directly from drawings in the Softy editor, not from template graphics, and that Quest text is designed, as far as possible, by the application of a set of rules, such as that verticals are all 256 font units wide, with both edges at a font unit value which is a multiple of 256 and that horizontals are all 168 font units in vertical height with one edge at a font unit value which is a multiple of 256, corners which are curved are curved with a single Bézier curve which has an action length, as I call it, of 128 font units in both horizontal and vertical directions. Some characters, such as x and k are exceptions to the general rules, yet Quest text is largely made up of horizontals and verticals, including for letters such as A O e and s. The idea is that hopefully Quest text will be very clear at both 12 point and 18 point and that, as point size increases, it will display its artistic look. At 300 point, Quest text looks smooth and rounded with an elegant combining of wider verticals with narrower horizontals, almost as if drawn with a pen with a nib 256 font units wide and 168 font units high. The rules do produce the effect though that capitals look lighter than lowercase letters as they are overall wider and yet use the same width verticals. I am wondering whether to consider that a fault or a feature! :-) An important part of the development process of Quest text is to display some text at 12 point in WordPad, make a Print Screen graphic and paste it into Paint and then study the graphic at 8x magnification. Hopefully Quest text combines great clarity with an artistic look. William Overington 13 March 2003
Ligatures fj etc (from Re: Ligatures (qj) )
displayed, though it can be displayed for test purposes if desired. William Overington 12 March 2003
Re: Ligatures (was: FAQ entry)
Pim Blokland wrote as follows, responding to Doug Ewell. quote I suspect it would end when you start talking about combinations like qj and f that are unlikely to appear in natural language text. At least gj exists in Hungarian. fb, fh and fk are very common in Dutch (much more so than fj). f exists in Icelandic; at least I've found arfegi. However I don't speak Icelandic, so I've no idea if this is a combination of two subwords. end quote During the spring and summer of 2002 I produced a number of web pages about encodings for ligatures, the encodings using the Private Use Area. Some of the characters mentioned are encoded within the golden ligatures collection. http://www.users.globalnet.co.uk/~ngo/golden.htm I will try to add qj gj and f thorn in due course. Where I have an f ligature I have added an ff ligature into the encoding scheme, so I expect to add ff thorn in as well, just in case it is needed, though I have no knowledge of whether it is ever used, though I was unaware of the possibility of an f thorn ligature until reading this thread. While I am adding some more ligatures to the collection, if anyone wants any other characters added in, please email me privately. I found that encoding the golden ligatures collection led to me learning about a number of interesting aspects of typography of which I was previously unaware, so it was an educational experience for me as well as being fun and useful in practice within its limits. Naturally, my production of the golden ligatures collection does not of itself produce fonts which contain these ligatures, yet it does help a little in making the possibility topical, so maybe a few of the font designers who read this list might perhaps include more ligatures in their fonts. An interesting aspect of my codification of ligatures is that any documents produced using them will not be standard Unicode documents. However, the encodings might be very useful so that someone may make artistic typography fonts using a font production program such as the Softy shareware program and be able to produce pages of hardcopy print out locally using such a font, where a ligature character such as ct may be encoded as U+E707. Naturally, there is nothing to stop anyone encoding a ct ligature however he or she chooses within the Private Use Area, yet my collection of encodings is a published, consistent set which would help with interoperability of fonts from various artists. I am currently producing a typeface which I am calling Quest text so that I can have a typeface available which has whatever ligatures I choose. I have so far produced all of the lowercase letters, the digits, full stop and twelve capitals and also lowercase long s, ash, eth and thorn. I am hoping that the font will be useful for English, Old English and Esperanto in particular, though I can add characters where I choose, using both regular Unicode code points and Private Use Area code points, both from the golden ligatures collection and from other published Private Use Area encodings. I am producing Quest text using the Softy program and am finding it a very effective program. More recently, a new development, designed primarily as a means to produce displays of languages of the Indian subcontinent upon the screens of interactive televisions using the font format capability of those interactive televisions using the ligatures of those languages, may be a very useful way to use the ligature encodings of the golden ligatures collection as well. http://www.users.globalnet.co.uk/~ngo/ast03300.htm So, a document in which one wishes to have a ct ligature would have the ct ligature encoded as ct or maybe c ZWJ t depending upon the circumstances, and a .etf file would have one or both of the following lines, depending upon the application. ct U+EBEF U+E707(that is, four characters) c ZWJ t U+EBEF U+E707(that is five characters) Thus the combination of the golden ligatures collection, an .etf file and various software tools to use them could be an effective way of allowing people to use ligatures on a wide variety of platforms while having the documents containing the original texts encoded using regular Unicode characters only. A text file containing codes from the golden ligatures collection would thus only be used locally on a temporary basis for a current task, though to useful effect. Some of my small fonts produced using Softy are available at the following web page. http://www.users.globalnet.co.uk/~ngo/font7001.htm William Overington 10 March 2003
Unicode 4.0 beta characters.
I have now produced a small font which contains my implementation of the U+2614 Umbrella with rain drops character, which is one of the new characters in the Unicode 4.0 beta documents. http://www.users.globalnet.co.uk/~ngo/font7001.htm I have had a go at producing a glyph for U+26A0 Warning sign but am finding it a learning exercise to make it both crisply legible at 12 point yet artistic at larger sizes within the tight constraint of a triangular surround which must itself be clear. When producing the Unicode Standard, is there a point size at which glyphs should be recognizably displayable which is part of the criteria for characters? For example, is it regarded as fine if some characters in some languages cannot be displayed clearly below, say, 24 point? The map flags look interesting yet hopefully straightforward and I hope to have a go at them too, also the high voltage sign. The high voltage sign has the note best glyph to be found in the beta document U40-2600.pdf and I wonder what is the significance of that note please? In the same U40-2600.pdf document are six Yijing monogram and digram symbols. I wonder if someone could please say something about these characters as to their meaning. Also, and this is I feel an important issue for the beta process which could be of importance for other characters, could someone please give some guidance as to how these characters should be implemented as a piece of electronic type as there is no indication in U40-2600.pdf as to how this set of six symbols should sit within a character cell and relate to one another as to whether they should join to each other or must be clear of each other when side by side and how they should line up with text characters if in a font which contains many characters of various types. William Overington 24 February 2003
[Private Use Area application] A font for research in multimedia authorship.
Following discussion yesterday in another thread about changing text colour in multimedia text files, I have today produced a font as a tool for research in multimedia authorship. I have devised glyphs for 19 of the courtyard codes relating to text colour and encoded them in a font. The font is available for free download from our family webspace at the following address. http://www.users.globalnet.co.uk/~ngo/COURTCOL.TTF People interested in having a copy of this font may find the following documents useful in applying the font. http://www.users.globalnet.co.uk/~ngo/courtcol.htm http://www.users.globalnet.co.uk/~ngo/court000.htm I have been experimenting with using the font with WordPad and Word 97 on a PC, where the glyphs give a monochrome indication to an author of which colour is being used. For example, I mixed English text in the Arial font with codes from this font in one document. The whole text of English and colour codes can then be copied onto the clipboard and pasted into SC UniPad (downloadable from the http://www.unipad.org webspace) in order to produce a compact file without the text formatting of Word or Word 97. I am hoping to carry out some experiments whereby such text can then be pasted into a text box of a Java applet and produce appropriately coloured text. Readers who would like to comment about the design of the glyphs or about the research are welcome to email me. I have found it interesting to design glyphs to represent colours in monochrome. William Overington 22 February 2003
Re: XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)
specification relates to plane 14 tags and how the Unicode specification relates to element names in an XML file. I feel that that is the essential point which I am trying to convey. 1. The text MUST be transmitted in UTF-8 (because the CEO of Overington Inc. thinks that UTF-8 is cute). Well, I, as an individual, was thinking in terms of UTF-16. 2. The transmission protocol MUST implement some form of language tagging (the details of the protocol are up to me). Particularly, the system needs to distinguish English text from Italian text, because the two languages will be displayed in different colors (green and red, respectively). Green for English, red for Italian. Are you by any chance a fan of the liveries of motor racing cars of the 1950s? 3. The OveringtonHomeBox(tm) can only accept UTF-8 plain text interspersed with escape sequences to change color. The escape sequences have the form {{color=1}}, where 1 is the id of a color (blue, in this case). If I were writing a one-off program I would use U+F3E2 for red and U+F3E5 for green. http://www.users.globalnet.co.uk/~ngo/court000.htm http://www.users.globalnet.co.uk/~ngo/courtcol.htm However, the issue is not, in my opinion, about one-off programs and proprietary encodings. The issue is ensuring that plane 14 tags are not totally deprecated so that, as an option for use with particular protocols, they continue to be available so that encodings for general computing usage, for general and widespread information availability, on a rigorous non-proprietary encoding basis may be used. Certainly, within certain multimedia programs which might at some future time run upon the DVB-MHP platform, codes such as U+F3BC might be particularly useful, yet that is a matter which an individual programmer needs to consider when writing such a program: it is not a standard system, though it is not a proprietary system either in the usual sense of the word as those codes are published with the hope of being a consistent set which people may use if they so choose. Please note that, notwithstanding your pretend scenario of a company, that that is not the way I am proceeding with my research. I invented the telesoftware concept and am doing what I can to get it used effectively and to ensure that it can have scope for future development of content. I regard the continued availability of plane 14 tags as important, as it means that content authors can then use codes which do the job by finding them in an international standard, without having to use what I suggest. I could devise all manner of codes using plane 16 if I wish, copying the plane 14 tags across as a start, yet those codes, no matter how fine, no matter how well publicised in research papers or in a book or whatever, those codes would never have the provenance of the codes in an international standard. That is why, although Private Use Area codes do certainly have a use for research and for concept proving, and also for limited use between two or more people studying something special topic, Private Use Area codes, and XML element names made up by a programmer or even by a committee which is not a standards committee, simply do not come into the same class of provenance quality as plane 14 tags which are in the Unicode standard. That is why I hope that the Unicode Technical Committee will not totally deprecate tags and will leave open the possibility of considering adding additional tag types at some time in the future. 4. The text files being transmitted MUST be small (bandwidth is limited!). Yes, keep the text file size down, bandwidth is limited. 5. The processing program must be small (on-board memory is limited!). No, for DVB-MHP the on-board memory is fairly large. The transmission link is the key issue. 6. A working prototypes must be ready by tomorrow. Well, this is about the way that these things will be done well into the future. The idea of Unicode is that it will last, not be swept away within ten or twenty years because it is outdated for future needs. I have had a look through the example solutions, but, I do need to spend some more time studying them and hopefully trying out the executable programs with some other data files. In the meantime I would be interested to know any further views of Marco and the views of others on this topic. Thank you for taking the time to write your post and prepare the programs. I feel that it is important that this matter be studied thoroughly. William Overington 21 February 2003
Leonardo da Vinci and printing.
I recently enjoyed watching a two-part television programme entitled Leonardo's Dream Machines on Channel 4, which is a television channel in England. Television programmes often get shown around the world and I can certainly recommend this one if you get the chance to watch it on a television channel where you live. As I am interested in typography I noticed the typeface which was used for the captions and the end credits. The font turns out to be Da Vinci forward and can be viewed at the following website. It is based on the handwriting of Leonardo da Vinci. http://www.p22.com In addition, it can be tried out on-line at the typecaster facility which is at the following web address. http://www.p22.com/typecaster/caster.html I tried out various phrases and indeed made a few Print Screen copies of the texts which I produced. Leonardo da Vinci was born in 1452 in Vinci, a village near Florence, Italy. In 1456 the first printed book was published, in Mainz, a city in what is now Germany. I began to wonder how Leonardo da Vinci relates to the invention which took place at about the time he was born, and how that compares and contrasts with how people today relate to the computer, the internet and the web. Leonardo da Vinci could read and write. Searching the web earlier today the only reference to Leonardo da Vinci and printing that I could find was a short note that he had made some prints of plant leaves. Does anyone happen to know if Leonardo da Vinci read or owned printed books please? Was he involved in printing technology or letter design for fonts or for plaques or stone engraving? Are there any individual copies of books surviving today for which there is provenance that Leonardo da Vinci ever sat reading it, even by circumstantial evidence such as for example perhaps a reference to reading some book while in the service of someone and that person's collection of books having survived with provenance to the present day? I find it quite fascinating that Leonardo da Vinci lived in Europe at the same time as when printing with movable type developed in Europe and wonder whether when he first became aware of printed books whether they were an amazing new thing to him or just came along as an everyday part of how things were as but one of the things he found out about as he grew up. By the way, if you do have a look at the typecaster facility at the website mentioned above, various fonts may be tried. I tried various fonts and particularly like the Morris Troy font. I found that characters such as e acute and A umlaut are available in this font, using Alt 130 and Alt 142 respectively when keying text into the typecaster window. However, it is not clear to me as to how those characters are stored in the font itself, that is whether they use Unicode layout or an older layout. William Overington 21 February 2003
Hot Beverage font.
Thinking that the new to Unicode 4.0 symbol U+2615 Hot Beverage might be very useful in the preparation of meeting agendas and the like and also wishing to try to design a glyph which would look good particularly at a 12 point size in documents, I have produced a font named Hot Beverage which I have now added into our family webspace. The font can be downloaded from the web from the following address. http://www.users.globalnet.co.uk/~ngo/HOTBEVER.TTF The font contains the Hot Beverage glyph which I have designed, accessible at U+2615, which is decimal 9749, and also at lowercase h, for convenience of use when used with the Microsoft Paint program. The font also includes a space character. I have tested the font with WordPad and Word 97. It looks quite good at 12 point in black, as in an agenda document for a meeting. It also looks good in the colour which WordPad calls green, which is a dark green colour, at 300 point, and also looks good in a fun logo at 36 point following the wording Peppermint Tea Shoppe in Old English Text in the same dark green colour. Various other sizes mostly look good though 18 point does look a bit strange. An experiment with PowerPoint produced a nice slide with the following, centred in a text box, in black at 36 point. There will now be an intermission for refreshments. Below which, in colour red=51, green=204, blue=51 at 72 point, centred in a text box, the Hot Beverage glyph. Hopefully this font will be a useful item on computers around the world. William Overington 18 February 2003
Re: Plane 14 Tag Deprecation Issue
in plane 14. William Overington 15 February 2003
Re: Plane 14 Tag Deprecation Issue
the range U+EC00 through to U+EFFF for data and some codes from the range U+EB00 through to U+EBFF for control codes, some of these codes being particular to the eutovios system and some, such as the codes for the colours of the objects, being the same codes used for specifying colours in the eutocode graphics system generally. The objects thus all have symmetry about the vertical axis, which makes drawing out a scene simpler than if objects such as cubes were in use. The spheres display as discs, the cylinders display as filled rectangles and the cones display as filled triangles, each displaying the same shape regardless of the angle from which they are viewed: they do change size though depending upon how near they are to the present viewing point. An interesting activity is thinking about what objects have a shape which is symmetrical around a vertical axis and which would look good in such a program and which are expressible with a minimum number of supplied parameter values once one knows which type of object has been chosen. It is essentially just some of those objects which could be produced in brass using a lathe only and without using any of the screw cutting features of a lathe. I have tried to find out what has happened to ViOS. Does anyone know or remember having seen a news item in a magazine about what has happened please? I recognise that this question is somewhat off-topic but I have tried to find out in various places and have been unable to do so and this list does seem to have an ability of providing answers to many questions. Anyway, in relation to plane 14, I am hoping that in time it will be possible for such a graphics system, including various three-dimensional capabilities to become formally encoded in plane 14 as a ring-fenced option for use with particular protocols. It is at an early stage at present, so what becomes encoded may have far greater possibilities than what is being encoded now. Yet what is being encoded now does work and works well. It allows a stream of Unicode characters from a text file to produce a three-dimensional scene through which an end user can then move and select objects. This is all very futuristic and needs a lot more doing to it. At present I use a Java applet which is an extension of the original eutocode graphics test system which is on the web. http://www.users.globalnet.co.uk/~ngo/eutocodegraphics.htm The test system for the eutovios system has buttons to simulate the push buttons of an infra-red remote control device of a DVB-MHP television set. Testing is by preparing a string of Private Use Area characters in the SC UniPad program obtainable from http://www.unipad.org and then using a copy and paste so as to paste the string into the text box of the applet, the draw button of the applet then being pushed to produce the starting point display. However, I feel that I do need to mention this now as the Unicode Technical Committee is about to consider what to do about tags and this is a related issue because it relates to plane 14. Perhaps all of plane 14 needs to be declared an area considered as deprecated in general terms, yet where codes for use with particular protocols can be defined by the Unicode Technical Committee, so that the potential for using such futuristic developments and encoding them within the Unicode framework is preserved? William Overington 14 February 2003 For discoveries, In Private Use Area Phaistos Disc Script waits Haiku written by William Overington.
Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: IndicDevanagari Query))
Friday 14 February 2003 is the stated closing date for responses to the Public Review of the issue of whether to deprecate plane 14 tags. I am writing to enquire if the solution to mark the tags as deprecated within the file PropList.txt yet including a wording in the Unicode Specification that whereas the plane 14 tag characters are marked as deprecated within the PropList.txt file that they are not deprecated in the full sense of being deprecated but are in fact classed as reserved for use with particular protocols , would be acceptable to all. In particular, would such a solution be acceptable to Doug Ewell and others as satisfying all of the points made in his paper In defense of Plane 14 language tags which was posted in this list on 2 November 2002? If so, I wonder if I might please suggest that people discussing the matter within the Unicode Technical Committee might like to consider Doug's paper in some detail and perhaps consider making reference within the Unicode Specification to some of the ideas which Doug pointed out, such as the potential for using tags for speech synthesis and so on. In addition, if the tags are described as reserved for use with particular protocols , then it would seem reasonable to keep open the possibility to allow other types of tags to be specified in the future if a need arises, as Doug suggests, rather than using plane 14 tags only for languages as at present. It would seem entirely reasonable that the Unicode Technical Committee could possibly at some future meeting define one or more additional types of tag within the unused lower part of plane 14 within the ring-fenced reserved area. William Overington 13 February 2003
Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))
I feel that as the matter was put forward for Public Review then it is reasonable for someone reading of that review to respond to the review on the basis of what is stated as the issue in the Public Review item itself. Kenneth Whistler now states an opinion as to what the review is about and mentions a file PropList.txt of which I was previously unaware. Recent discussions in the later part of 2002 in this forum about the possibilities of using language tags only started as a direct result of the Unicode Consortium instituting the Public Review. The recent statement by Asmus Freytag seems fine to me. Certainly I might be inclined to add in a little so as to produce Plane 14 tags are reserved for use with particular protocols requiring, or providing facilities for, their use so that the possibility of using them to add facilities rather than simply using them when obligated to do so is included, but that is not a great issue: what Asmus wrote is fine. Public Review is, in my opinion, a valuable innovation. Two issues have so far been resolved using the Public Review process. Those results do seem to indicate the value of seeking opinions by Public Review. As I have mentioned before I have a particular interest in the use of Unicode in relation to the implementation of my telesoftware invention using the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system. I feel that language tags may potentially be very useful for broadcasts of multimedia packages which include Unicode text files, by direct broadcast satellites across whole continents. Someone on this list, I forget who, but I am grateful for the comment, mentioned that even if formal deprecation goes ahead then that does not stop the language tags being used as once an item is in Unicode it is always there. So fine, though it would be nice if the Unicode Specification did allow for such possibilities within its wording. The wording stated by Asmus Freytag pleases me, as it seems a good, well-rounded balance between avoiding causing people who make many widely used packages needing to include software to process language tags, whilst still formally recognizing the opportunity for language tags to be used to advantage in appropriate special circumstances. I feel that that is a magnificent compromise wording which will hopefully be widely applauded. In using Unicode on the DVB-MHP platform I am thinking of using Unicode characters in a file and the file being processed by a Java program which has been broadcast. The file PropList.txt just does not enter into it for this usage, so it is not a problem for me as to what is in that file. My thinking is that many, maybe most, multimedia packages being broadcast will not use language tags and will have no facilities for decoding them. However, I feel that it is important to keep open the possibility that some such packages can use language tags provided that the programs which handle them are appropriately programmed. There will need to be a protocol. Hopefully a protocol already available in general internationalization and globalization work can be used directly. If not, hopefully a special Panplanet protocol can be devised specifically for DVB-MHP broadcasting. On the matter of using Unicode on the DVB-MHP platform, readers might like to have a look at the following about the U+FFFC character. http://www.users.globalnet.co.uk/~ngo/ast03200.htm Readers who are interested in uses of the Private Use Area might like to have a look at the following. They are particularly oriented towards the DVB-MHP platform but do have wider applications both on the web and in computing generally. http://www.users.globalnet.co.uk/~ngo/ast03000.htm http://www.users.globalnet.co.uk/~ngo/ast03100.htm http://www.users.globalnet.co.uk/~ngo/ast03300.htm The main index page of the webspace is as follows. http://www.users.globalnet.co.uk/~ngo William Overington 7 February 2003
The result of the plane 14 tag characters review.
As the Unicode Consortium invited public comments on the possible deprecation of plane 14 tag characters, will the Unicode Consortium be making a prompt public statement of the result of the review as soon as the present meeting of the Unicode Technical Committee is completed, or even earlier if the decision of the Unicode Technical Committee has already been finalized? William Overington 8 November 2002
Re: A .notdef glyph
designed a .notdef glyph in response to the exercise. Certainly, now that there has been a discussion about .notdef glyphs and references to various documents and examples, I might now think about another design. However, what I produced was a work of art produced without the knowledge which might have constrained my thoughts had I previously known about the various documents and examples beyond the plain black rectangle. A sort of primitive art, unconstrained by the chains of knowing about what is usually expected of such a design? William Overington 8 November 2002
Re: A .notdef glyph
Michael Everson wrote as follows. At 18:29 + 2002-11-06, William Overington wrote: Thank you for the design brief. Oh, my stars. If anyone wants to make a graphic involving stars using Microsoft paint, he or she might like to have a look at the following. http://www.users.globalnet.co.uk/~ngo/pai3.htm These graphics were produced using 1456 object code programs. http://www.users.globalnet.co.uk/~ngo/14563100.htm http://www.users.globalnet.co.uk/~ngo/1456.htm Here is my design. Better hurry and copyright it. Actually, since you mention copyright, my understanding is that, under United Kingdom law, that copyright existed in the design from the moment that I put it into permanent form. As there are various international treaties and conventions about copyright, then I think that there is copyright in my design through most of the world. Copyright is a very interesting aspect of the law. Copyright does not depend upon any assessment of artistic or literary merit at all. Copyright is a very important and valuable intellectual property right and my understanding is that copyright licensing earns the United Kingdom economy a great amount of money every year. Providing evidence to support a claim of copyright is another matter. However, I think that the fact that my design is archived in the mailing list archive of the Unicode Corporation is high quality evidence in relation to copyright. The design consists of a single contour in as large a square box as is possible for the particular font. In my prototype I used a box 2048 font units by 2048 font units. In this case, the value of n is 1024. The contour has seven points, the first point and the last point being at the same place. Point 1 is at (0,0) and is on the curve. Point 2 is at (0,2n) and is off the curve. Point 3 is at (2n,2n) and is on the curve. Point 4 is at (2n,n) and is on the curve. Point 5 is at (n,n) and is on the curve. Point 6 is at (n,0) and is on the curve. Point 7 is at (0,0) and is on the curve. What curve? Your specification here produces a rectangular figure. Thank you for trying it out. The shape is a one piece solid within the area of a square, though the square is not drawn. The design idea came from wanting to have an arc which goes against the normal arc of design of a graphical user interface of the input screen of a computer program. I started with a two arc design, from point 1 to point 3 as at present, with another arc going back from point 3 to point 1 influenced by an off curve point in the bottom right corner of the square. This was rather like the hysteresis curve of a magnet. Yet it was too symmetrical. The third example was what is the present design. The second example had a curve from the present point 4 to the present point 6 influenced by an off curve point at the present point 5. Yet it looked too symmetrical. I wanted a design which would be awkward-looking in the display, so as to draw the eye to it. I used the Softy shareware program. There is one curve and four straight lines in the contour. The curve is a quadratic Bézier curve from point 1 to point 3, note please how point 2 is off the curve. That is, point 2 influences the direction of the curve from point 1 to point 3. The curve starts off from point 1 instantaneously heading for point 2, but quickly turns away from that direction so that it can make the smooth transition in direction which is necessary so that the curve appears to arrive at point 3 instantaneously as if it had come from the direction of point 2. I hope that you like the design. But it fails to express .notdef in any meaningful way. I think I understand what you mean. Yet the meaning of symbols is often part of the culture in which they exist. So, as time goes by, perhaps this symbol will become to have the meaning of being a .notdef symbol (in the sense of one of the various possible .notdef symbols in widespread use) perhaps being known as the .notdef symbol which features in that famous thread in the archives of the Unicode Consortium's mailing list. Perhaps a whole thread on symbols and their meaning is on the point of starting in this mailing list. For example, U+2603 has two meanings, one the picture meaning, one the other meaning stated in the text of the U2600.pdf document. Do U+2622 and U+2623 convey their now well-known meanings in any different manner to the way in which my design conveys the .notdef concept? How does U+2658 express the meaning of which directions are permissible to move? How is it that U+2678 brings thoughts of models of locomotives and U+2677 does not? So, maybe it is not a matter of my design failing to express .notdef in any meaningful way, perhaps it is a matter that my design, an abstract shape, does express .notdef in a meaningful way because now lots of people know that that is the intended meaning. Expressing meaning is a very interesting matter. Some readers might perhaps be interested
Re: ct, fj and blackletter ligatures
Peter Constable wrote as follows. You'll probably come back to say, But I was talking about 'ordinary TrueType fonts'. No I won't. It's not my personality type to do so. Have a look at the Myers Briggs Type Indicator for personality type, the key message is that not everybody has the same personality type. I may argue a point if I consider it right to do so, but I do not argue something just for the sake of arguing or because of some notion of not being willing to lose face or something like that in accepting that I did not previously know something. I mean, that is pointless and is a waste of time. Anyway, it is not my nature to be like that. So, I did not know the correct situation and you have helped me by explaining more about it. Thank you. If you insist on an invalid assumption, there's no way to argue against it. It's like saying, software with a character-mode UI is not capable of displaying bitmap graphics -- true, but irrelevant. But I won't, it's not my personality to so so. I genuinely did not understand and I am grateful to you for explaining the matter to me. If you really want a dialog box to popup providing notification to the user, I'm wondering how many times as the file is opened and a page is rendered you'd like this popup to appear? Once. A notification in a dialogue box that the problem exists with a button to click for further detailed information as to which character or characters, how many times for each, and on which pages and lines. 17 times if there are 17 instances of c, ZWJ, t that are not rendered as a ct ligature? No, just the once. Not on my system, thank you. Certainly not! Thank you for explaining the matter about the TrueType fonts. William Overington 7 November 2002
Re: ct, fj and blackletter ligatures
John Hudson wrote as follows. At 02:18 11/5/2002, William Overington wrote: Not at 02:18, it was 09:18. Well, I suppose it depends upon what one means by a file format that supports Unicode. The TrueType format does not support the ZWJ method and thus does not provide means to access unencoded glyphs by transforming certain strings of Unicode characters into them. All three of the current 'smart font' formats are extensions of the TrueType file format. Structurally, the only difference between a TrueType font and an OpenType font is the presence of *optional* layout tables that support glyph substitution and positioning. Officially, the only difference is the presence of a digital signature. I am unsure as to whether, in formal terms, TrueType is a file format that supports Unicode as it does not allow the ZWJ sequences to be recognized. Of course TrueType allows ZWJ sequences to be recognised. ZWJ is a character that can appear in Unicode text and in the Unicode cmap of a TrueType font. If a font does not contain a ligature for the sequence, or does not contain layout information to render the sequence as a ligature, the text is still processed according to the Unicode Standard, i.e. nothing happens. I am thinking here of ordinary TrueType fonts on a Windows 95 platform and on a Windows 98 platform. I was under the impression that the reason that an ordinary TrueType font will not process a ZWJ sequence on those platforms was that both the operating system and the ordinary TrueType font do not have the capabilities to process ZWJ sequences. My understanding is that even an OpenType font with ZWJ sequence facilities will not work on a Windows 95 or Windows 98 platform. However, I thought that the ordinary TrueType format would not support ZWJ sequences in itself and that not only would a later operating system be needed but that also an OpenType font would be needed and that an ordinary TrueType format would not be able to do the job. Was I wrong in that thinking? My experience of fonts is very limited. I have tried making a few example TrueType fonts using the Softy shareware facility and I wonder whether I have got it wrong as to what an ordinary TrueType font will do when an ordinary TrueType font is made with an expensive professional font making program. To say that a font only supports Unicode if it can process and render as a ligature every usage of the ZWJ character is foolish: every font would have to contain glyphs and substitution lookups to support every potential use of ZWJ in every possible c+ZWJi+i+ZWJi+r+ZWJi+c+ZWJi+u+ZWJi+m+ZWJi+s+ZWJi+t+ZWJi+a+ZWJi+n+ZWJi+c+ZWJ i+e. I have had a long think about this. Suppose that a sequence of Unicode characters in a plain text file is mostly in English and has the sequence c ZWJ t in it at various places. Suppose that the font is an advanced format font which does not have a special glyph for the sequence c ZWJ t yet will simply render it as ct just as if the ligature had not been requested. As far as I know, there is no requirement in Unicode that the rendering system should notify, perhaps using an Alert dialogue box or similar, the end user that the ZWJ request has been made yet not fulfilled. Can an advanced format font supply such a message to the rendering system for onward notification of the end user? It seems to me that having the ZWJ mechanism in the Unicode Standard yet having no reporting mechanism if a specific request is not fulfilled is unfortunate. As a font could have its own set of ZWJ sequences which it recognizes, anything from an empty set to a set consisting of a full complement of ligatures for Fraktur, it seems to me that whilst every font would certainly not have to contain glyphs and substitution lookups to support every potential use of ZWJ in every possible circumstance it would not be unreasonable to hope that fonts could have a standardized reporting mechanism as to whether a request for a particular ZWJ sequence has been fulfilled. Also, perhaps there could be a method for asking a font to please display all its ZWJ sequences and their results. Now it might be that some advanced font formats can do such things, I do not know at present. While on this topic, perhaps a standradized method of a font reporting that it has no glyph for a character which it is asked to render might be a good idea. I am aware that a black line box could be displayed, yet in a long document, one of those might easily slip past a general viewing of the text in a printshop. Also, perhaps some method of asking a font to declare a list of the code points for which it has a specific glyph would be helpful. Again, perhaps some advanced font formats have these abilities, I do not know at present. There seems to be a gap between the Unicode Technical Committee encoding characters into a file and the process of making sure that the desired text is rendered correctly on an end user's platform with good provenance. I feel
A .notdef glyph (derives from Re: ct, fj and blackletter ligatures)
John Hudson wrote as follows. Here's an exercise for your enthusiasm, William: devise the form of the perfect .notdef glyph. It needs to unambiguously indicate that a glyph is missing, i.e. it should be something that can easily be mistaken for a dingbat, and it needs to be easy to spot in proofreading in both print and onscreen (some applications, e.g. Adobe InDesign, make the latter a bit easier by applying colour highlight to the .notdef glyph). Thank you for the design brief. Here is my design. The design consists of a single contour in as large a square box as is possible for the particular font. In my prototype I used a box 2048 font units by 2048 font units. In this case, the value of n is 1024. The contour has seven points, the first point and the last point being at the same place. Point 1 is at (0,0) and is on the curve. Point 2 is at (0,2n) and is off the curve. Point 3 is at (2n,2n) and is on the curve. Point 4 is at (2n,n) and is on the curve. Point 5 is at (n,n) and is on the curve. Point 6 is at (n,0) and is on the curve. Point 7 is at (0,0) and is on the curve. This has the effect of making the glyph easy to draw, solid enough to be specifically noticeable, distinctively shaped with both a curved line and straight lines so that it stands out and in an arc which goes against the normal arc of design of a graphical user interface of the input screen of a computer program so as also hopefully to make it more noticeable. In addition, the design has white space set out in a manner such that where several copies of the glyph appear in sequence on a page of text, they are easily counted. I hope that you like the design. William Overington 6 November 2002
Re: ct, fj and blackletter ligatures
Thomas Lotze wrote as follows. William Overington wrote: I don't know for certain but I suspect that it is that font designers do this so that people can use an application such as Microsoft Paint to produce an illustration using the font. In the absence of regular Unicode code points for the ligatures, a font designer has either to use the Private Use Area and be Unicode compatible or make a non-Unicode compatible font, if the font designer wishes people to be able to have direct access to the ligature characters. Judging from what I' learned by now, this is not true: If a font designer wants to make a Unicode-compatible font, he has to use a font file format that supports Unicode, and those formats provide means to access unencoded glyphs by transforming certain strings of Unicode characters into them. Well, I suppose it depends upon what one means by a file format that supports Unicode. The TrueType format does not support the ZWJ method and thus does not provide means to access unencoded glyphs by transforming certain strings of Unicode characters into them. I am unsure as to whether, in formal terms, TrueType is a file format that supports Unicode as it does not allow the ZWJ sequences to be recognized. Please note that my sentence did have if the font designer wishes people to be able to have direct access to the ligature characters. However, certainly, a font designer using an advanced font format may well not wish people to be able to have direct access to the ligature characters. The paragraph was replying to your question as to why someone who wants to set and print out a page of Fraktur at present is in practice likely to have to use a font with the ligatures encoded with code points less than 255. Please know that I am not seeking to be pedantic over the meaning of the phrase a file format that supports Unicode, it is just that I get the impression that you might possibly have not quite understood that some font formats widely used for Unicode encoded characters, such as the TrueType format, do not support the ZWJ glyph substitution process or, in fact, any glyph substitution process, such as noticing the two letter ct sequence and substituting a ct ligature glyph within the font. And if I understand it correctly, Unicode compliance can only be achieved with all of compliant documents, fonts, and renderers. So there appears to be no need for direct accessibility of ligatures, alternates etc. I said compatible, I did not say compliant and did not mean compliant. I was meaning compatible, in the sense that, if one wishes to produce a font using the TrueType format and that font is to include glyphs for ligatures such as ct and ppe, how does one do it so that the method used does not conflct with Unicode. Using Private Use Area code points avoids conflicting with the regular Unicode code points used for other characters. There are some articles about using WordPad and Paint to produce graphic effects with large characters and gold textures and so on in our family webspace, together with the gold texture file and some other texture files too. And what's the relevance to Unicode of that? Well, in direct terms probably nothing. However, as this is a widely distributed mailing list it might be that some readers, having read about the matter of using ligature characters in Paint and the way that one needs a font with code points less than 255 in order to access the ligature characters from Paint, might like to have a go at producing such graphics, so, having available some articles on the matter, I mentioned them. If one considers the Gutenberg sample font, the ct ligature is available as well, at Alt 0201 using Paint. One could use Wordpad to get the character as well. Yet, suppose that one has an advanced format font with a ct glyph within it yet where the font does not provide a direct code point access glyph, but only allows a ct ligature to be displayed using a combination of computer hardware and software which supports the advanced font format. How is one going to get that ct ligature to display if one does not have access to that hardware and software combination? Now certainly the attempt has been made to trivialise the matter by reference to very very old computer systems, yet here the problems arise with PCs manufactured in 1999. May I add that this posting is trying to be helpful to answer questions which you have posed, I am not seeking to reopen the discussion of whether the Unicode Technical Committee should encode any more precomposed ligatures. I raised that issue before the August 2002 meeting of the Unicode Technical Committee, the committee discussed the matter at the meeting, formed a consensus view and that consensus view was minuted and the minutes have been published. It is simply a matter that the Unicode Technical Committee is not going to encode any more ligatures, I have my golden ligatures collection on the web and if people choose
Re: ct, fj and blackletter ligatures
Thomas Lotze asked. Why below 255? I don't know for certain but I suspect that it is that font designers do this so that people can use an application such as Microsoft Paint to produce an illustration using the font. In the absence of regular Unicode code points for the ligatures, a font designer has either to use the Private Use Area and be Unicode compatible or make a non-Unicode compatible font, if the font designer wishes people to be able to have direct access to the ligature characters. There is an interesting experiment which one can try if one wishes. At the http://www.waldenfont.com website there are various Fraktur fonts for sale. There is a bundle of sample fonts available for download which have only some of the letters and ligatures in the fonts. The Gutenberg font has the ppe ligature within it and indeed a number of other ligatures and abbreviations and, in fact, a complete set of ten digit characters. There is the manual gbpmanual.pdf available for download as well. On page 14 of that document the ppe ligature is listed as being at 0171. If on a PC one installs the sample Gutenberg font and then starts the Microsoft Paint program and draws some text, selecting the Gutenberg font, if one holds down the Alt key and keys 0171 using the digit keys at the far right of the keyboard, hopefully the ppe ligature in the Gutenberg font will appear on the screen. In fact Paint only allows text up to 72 point. However, if one uses WordPad, then one can make the text something like 200 point in size if one wishes and use the Print Screen facility to copy the display image onto the clipboard. On can then paste the image from the clipboard into Paint so that one then has a 200 point Gutenberg ppe ligature in the Paint program. There are some articles about using WordPad and Paint to produce graphic effects with large characters and gold textures and so on in our family webspace, together with the gold texture file and some other texture files too. http://www.users.globalnet.co.uk/~ngo William Overington 4 November 2002
Re: Names for UTF-8 with and without BOM
As you have UTF-8N where the N stands for the word no one could possibly have UTF-8Y where the Y stands for the word yes. Thus one could have the name of the format answering, or not answering, the following question. Is there a BOM encoded? However, using the letter Y has three disadvantages for widespread use. The letter Y could be confused with the word why, the word yes is English, so the designation would be anglocentric, and the letter Y sorts alphabetically after the letter N. However, if one considers the use of the international language Esperanto, then the N would mean ne, that is, the Esperanto word for no and thus one could use the letter J to stand for the Esperanto word jes which is the Esperanto word for yes and which, in fact, is pronounced exactly the same as the English word yes. Thus, I suggest that the three formats could be UTF-8, UTF-8J and UTF-8N, which would solve the problem in a manner which, being based upon a neutral language, will hopefully be acceptable to all. William Overington 2 November 2002
Re: ct, fj and blackletter ligatures
The matter of ligatures arises fairly often in this discussion forum, often in relation to German Fraktur, but also in relation to English printing of the 18th Century and the use of fj in Norwegian. In relation to regular Unicode the policy is that no more ligatures are to be encoded. My own view is that this should change. However, that is unlikely to do so. Earlier this year, following from a posting about Fraktur ligatures, I produced some encodings for ligatures using the Private Use Area. I have published them on the web at the following place. http://www.users.globalnet.co.uk/~ngo/golden.htm These are my own Private Use Area code point allocations for various ligatures. They are not in any way a standard yet they are a consistent set which may be useful to those who wish to use them. The only use I know of any of them in a published font is in the Code2000 font, produced by James Kass. James uses the code points of this set for ct, fj and ffj in his Code2000 font. I feel that it might well be of interest to you, for your background knowledge, to have a look at the encodings which I have produced, yet I mention that these Private Use Area encodings are a matter of some controversy. Using them could lead to documents existing which could not be text sorted alphabetically, or spellchecked. However, if someone is just wishing to produce a print out of some text with some ligatures in the text, then the golden ligatures collection can be useful. There seems to be a lot of theoretical possibilities for doing ligatures with Unicode fonts using advanced font technology using the latest computers, yet if, say, someone wants to set and print out a page of Fraktur, that possibility does not seem, as far as I know, to be a practically achievable result at the present time using a piece of text encoded in regular Unicode using a font which uses only regular Unicode encoding. Indeed, it seems more likely that one would need to use a Fraktur font with ligatures encoded with a code number below 255, that is, a font which is not Unicode compatible. The golden ligatures collection is Unicode compatible, though, as I say, it is not a standard. It is just one person's self-published writing. I like to think of it as an artform, much as if I had produced a painting and placed a copy of the painting on the web. That is, it exists, it may be interesting to people, yet it does not in any way prevent anyone else from doing something different and it does not require anyone else to take any notice of it, yet it is a cultural item in the world of art. So, it depends what one is wanting to do. If your enquiry is solely in relation to formal encoding of ligatures in regular Unicode, then the golden ligatures collection will be of no use to you. However, if you are producing a black letter font as part of your studies and would like to encode ligatures, then the golden ligatures collection might perhaps be of interest to you. For example, if such a font were encoded using advanced font technology, then the golden ligatures collection code points would not be the way to approach the problem, though they could, if you so chose, be used to provide an additional way of accessing the glyphs for people who were trying to produce printouts using, say, a Windows 95 or a Windows 98 system. If, however, such a font were produced as an ordinary TrueType font, then in order to access the ligature glyphs you would need code points in order to access the glyphs, one code point for each glyph. In order to be Unicode compatible, those code points would need to be in the Private Use Area range of U+E000 to U+F8FF. There is essentially complete freedom of choice as to which code points to use, though the lower part is perhaps best due to the suggestions about Private Use Area usage in the Unicode specification. However, the golden ligatures collection of code points is there for your consideration if you wish. Within my collection of code point allocations, ct is U+E707, fj is U+E70B, ch is U+E708, ck is U+E709, tz is U+E70F. These are all in the following document. http://www.users.globalnet.co.uk/~ngo/ligature.htm The ffj is encoded at U+E773 in the following document. http://www.users.globalnet.co.uk/~ngo/ligatur2.htm There are some black letter ligature encodings including pp at U+E76C and ppe at U+E77E in the following document. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm The Private Use Area is described in Chapter 13, section 13.5 of the Unicode specification. There is a file named ch13.pdf available from one of the pages in the http://www.unicode.org website. The main index page of our family web site is as follows. http://www.users.globalnet.co.uk/~ngo William Overington 2 November 2002 -Original Message- From: Thomas Lotze [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Date: Friday, November 01, 2002 12:28 PM Subject: ct, fj and blackletter ligatures Hi
Re: New Charakter Proposal
Kenneth Whistler wrote the following. I think Marku's suggestion is correct. If you want to do something like this internally to a process, use a noncharacter code point for it. If you want to have visible display of this kind of error handling for conversion, then simply declare a convention for the use of an already existing character. My suggestion would be: U+2620. ;-) Then get people to share your convention. I find this suggestion curious, particularly coming as it does from an officer of the Unicode Corporation. The U2600.pdf file has U+2620 under Warning signs and has = poison in its description. Suppose for example that the source document encoded in UTF-8 is a document about chemicals found around the house and that the U+2620 character is used to indicate those which are poisonous. If U+2620 is also used to include in visible form an indication of an error found during decoding, then finding a U+2620 character in the decoded document would lead to an ambiguous situation. One solution would be for the Unicode Consortium to encode an otherwise unused character especially for the purpose. If, however, the way forward is for an individual to declare a convention, then I suggest that a sequence of at least two characters, the first being a base character and the one or more others being combining items be used so as to produce an otherwise highly unlikely sequence of characters. For example, the character U+0304 COMBINING MACRON could be a good choice, as it could be used to indicate a Boolean not condition with a character which is otherwise unlikely to carry an accent. As to which character to use for the base character, I am undecided, however it should, in my opinion, not be U+2620 as that is a warning sign meaning poison and could lead to confusion if looking at a document. The advantage of a two character sequence is that a special piece of software may be used to parse all incoming documents. Only occurrences of the otherwise highly unlikely sequence will be regarded as indicating a conversion problem with the encoding. If either of the two characters used for the sequence is encountered other than with the rest of the sequence, then it will not indicate the special effect. In my comet circumflex system I use a three character detection sequence. This means that in order to enter the markup universe then all three characters of the sequence need to be present in sequence. Thus, a piece of software can scan all incoming text messages, even those which are not designed to fit in with the comet circumflex system, and not indicate a comet circumflex message if, say, a U+2604 COMET character arrives as part of a message. Using a two or three character sequence which is otherwise highly unlikely to occur is, in my opinion, a good way to indicate the presence of a special feature as it allows one to monitor all text files for the special feature without causing undesired responses on text files which have been prepared without any regard to the special feature. I feel that the influence of posting a suggestion in this mailing list is often greatly underestimated. If you do post a suggested two or three character sequence for the purpose that you seek, perhaps, if you wish, after further discussion in this group, my feeling is that that sequence may well become well known and accepted for the purpose very quickly, simply because where there is a need for such a sequence then, in the absence of any good reason not to do so, people will often happily use the suggested format. William Overington 1 November 2002
Re: The comet circumflex system.
fascinating to get immersed in the simulation. Hopefully it can go live on the web when I can get it finished, then I can respond in comet circumflex language to any emails in comet circumflex language which arrive from within the simulation. The use of the Songs about Landscape font is remarkably effective in producing a web site for the purpose of the simulation, as headings and paragraphs can be set out. William Overington 30 October 2002
Re: Character identities
Summary: Would it be possible to define the U+FE00 variant sequence for a with two dots above it to be a with an e above it, and similarly U+FE00 variant sequences for o with two dots above it and for u with two dots above it, and possibly for e with two dots above it as well? I may not have got the details right about this suggestion, but, if the general idea is thought good, I am sure that one of the experts on this list could codify it properly. It seems to me that there is middle ground between the two views being expressed. Suppose, for example, hypothetically, that there is a font available in Germany, named Volksmusik which is a display font intended for setting headings in modern German, such as for the headings in advertisements for restaurants and so on, and that in that font the a umlaut, o umlaut and u umlaut are all expressed using a mark which is something like a small letter e. Then, it seems to me that if a theatre restaurant manager has set out the text required for a menu for the restaurant for some special gala evening to be held soon using a plain text editor on a PC using a font such as Arial, with a umlaut characters appearing many times, sometimes in headings and sometimes in the main body of the text, then stored the text on a floppy disc and walked down the road to the print shop and explained to the print shop manager that here is the text content for the menus in Arial, could the print shop please supply 500 menus using that text content yet jazzing it up a bit so that the headings on each of the four pages is in a fancy typeface in a different colour, then it should be quite straightforward for the print shop manager to copy the text onto the clipboard from the Arial file, and paste it into some other file, then change the font for each of the page headings to the Volksmusik font, and make the font for the rest of the menu some plainer font. Thus, some a umlaut characters originally keyed by the restaurant manager would display on the final menu as a with two dots above and some a umlaut characters keyed by the restaurant manager would display on the final menu as a with a small letter e above. The restaurant manager is, however, studying part-time for a research degree at the local university. This involves producing essays about various aspects of the printing of German literature, including quoting passages from earlier times, taking care to distinguish clearly between a with two dots above it and a with an e above it, all within using a plain text file, so that there is maximum portability in sending copies of the essay to various people, including the project supervisor at the University and the editors of various learned journals. How is the a with an e above it set, bearing in mind that there is no precomposed a with an e accent above character in regular Unicode and also that it would be nice if the text could be searched for keywords using just the usual search methods? Would it be possible to define the U+FE00 variant sequence for a with two dots above it to be a with an e above it, and similarly U+FE00 variant sequences for o with two dots above it and for u with two dots above it, and possibly for e with two dots above it as well? I may not have got the details right about this suggestion, but, if the general idea is thought good, I am sure that one of the experts on this list could codify it properly. William Overington 30 October 2002
Re: Unicode plane 14 language tags.
John Cowan commented. William Overington scripsit: It seems to me that deprecating these language tags might be a bad thing as the language tags could well have potential use in plain text files on the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) platform in order to signal to a Java program accessing a text file the language in which any particular text is written. Of course, deprecation does not mean that the characters cannot be used, still less (what it means in most standards bodies) that they may be removed in future. Once in Unicode, always in Unicode. Oh, that is interesting. So what exactly is the public consultation about deprecating the plane 14 language tags about? If the Unicode Technical Committee decided to deprecate the plane 14 language tags, what would be the effect of that decision? Nevertheless, on the facts described, I agree that this is an appropriate use of Plane 14. However, I am somewhat skeptical that the facts *are* as described: is it really the case that *plain* text files are being used here? The DVB-MHP platform is a platform. Java programs are broadcast in a unidirectional, cyclic manner so as to produce effectively a disc in the sky. This uses my telesoftware invention. The word telesoftware, and its etymology, are in the Oxford English Dictionary, second edition, volume 17. The telesoftware concept was also featured in the USA in the first issue of the magazine Personal Computing, published back in the 1970s. The Java programs are authored by content authors. The Java programs may be self-contained or may use support files. The format of those support files is up to the author of each Java program, though some formats such as png (Portable Network Graphics) have special standing. Plain text files is one of the choices which a content author may choose to use. A content author could also use a fancy text format if he or she so chooses. I am not suggesting that all the files used by the Java programs which are broadcast as telesoftware programs will be plain text, only that plain text files could be used. As the DVB-MHP system uses Java, and Java uses Unicode, then the DVB-MHP system uses Unicode, and what is contained in Unicode is thus of interest to content authors who would like to author content for the DVB-MHP platform. The DVB-MHP system is up and running on a regular basis in Finland and Germany. There is worldwide interest in the DVB-MHP system. Certainly, from my own perspective, I feel that plain text files may be very important for information content upon the DVB-MHP channel. I feel that language tags could be very useful as a feature in such use. William Overington 29 October 2002
Re: Unicode plane 14 language tags.
Doug Ewell wrote as follows. [snip] Right off the bat, though, I thank the UTC for initiating this public review process which allows non-members like me to get their two cents in regarding Unicode policies. (Hmm, two American-specific figures of speech in one sentence -- perhaps it should have been tagged en-US.) Yes, I too am grateful for this public review process. I do note however that review 3 refers to a document which is only available to Unicode Consortium members, which seems a strange thing if views of interested individuals are being sought. Also, it is a pity that this new era of Unicode glasnost (displayed with a ligature? :-) ) comes so shortly after the last Unicode Technical Committee meeting the minutes of which state the consensus about no more ligatures being added to the U+FBxx block. Surely the matter of ligatures would be a good topic upon which to conduct such a public review. So, I wonder if at the meeting due to be held from 5 November 2002, perhaps it might please be considered as to whether consultation 5 - precomposed ligatures could be made a topic of a public review in this manner ready to be considered at the Unicode Technical Committee meeting after that meeting, so that there is the time and opportunity for widespread consideration to take place. William Overington 29 October 2002
The comet circumflex system.
Readers interested in internationalization using Unicode might like to know that I have recently added some documents about the comet circumflex system to the web. The introduction and index page are as follows. http://www.users.globalnet.co.uk/~ngo/c_c0.htm The main index page of the webspace is as follows. http://www.users.globalnet.co.uk/~ngo William Overington 29 October 2002
Re: Character identities
John Hudson commented. At 02:46 10/26/2002, William Overington wrote: I don't know whether you might be interested in the use of a small letter a with an e as an accent codified within the Private Use Area, but in case you might be interested, the web page is as follows. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm I have encoded the a with an e as an accent as U+E7B4 so that both variants may coexist in a document encoded in a plain text format and displayed with an ordinary TrueType font. If anyone were interested, he could do this himself and use any codepoint in the Private Use Area. The meaning which I intended to convey was as follows. I don't know whether you might be interested in having a look at a particular example of the use of a small letter a with an e as an accent codified within the Private Use Area by an individual with an interest in applying Unicode, but in case you might be interested in having a look at that particular example, the web page is as follows. If, following from your response to the way that you read my sentence, someone were interested in defining a codepoint in the Private Use Area then certainly he or she could do that himself or herself and use any codepoint in the Private Use Area. However, exercising that freedom is something which could benefit from some thought. If someone wishes to encode an a with an e as an accent in the Private Use Area, he or she may wish to be able to apply that code point allocation in a document. If he or she looks at which Private Use Area codepoints are already in use within some existing fonts, then selecting a code point which is at present unused in those fonts might give a greater chance of his or her new character assignment being implemented than choosing a code point for which those fonts already have a glyph in use. Searching through such fonts takes time and requires some skill. If someone does wish to use a Private Use Area code point for an a with an e accent, then by using U+E7B4 does give a possible slight advantage in that the code point is already part of a published set of code points available on the web, for, even though that set of code points is not a standard, it is a consistent set and other people might well use those codepoints as well. However, anyone may produce and publish such a set of code point allocations of his or her own if he or she so wishes, or indeed keep them to himself or herself. Yet I was not seeking to make any such point in my posting. I simply added to a thread on a specialised topic what I thought might be a short interesting note with a link to a web page at which some readers might like to look. The web page indeed provides two external links to interesting documents on the web. Maybe it is time to include a note in the Unicode Standard to suggest that 'Private' Use Area means that one should keep it to oneself Well, at the moment the Unicode Standard does include the word publish in the text about the Private Use Area. I have published details of various uses of the Private Use Area on the web yet not mentioned them in this forum. For example, readers might perhaps like to have a look at the following. http://www.users.globalnet.co.uk/~ngo/ast07101.htm Anyone who chooses to do so might like to have a look at the following file as well, which introduces the application area. http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm This is an application of the Unicode Private Use Area so as to produce a set of soft buttons for a Java calculator so that the twenty hard button minimum configuration of a hand held infra-red control device for a DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) television can be used in a consistent manner to signal information from the end user to the computer in the television set. I am very pleased with the result. The encoding achieves a useful effect while being consistent for information handling purposes with the Unicode specification, so that an input stream of characters may be processed by a Java program without any ambiguity over whether a particular code point is a printing character or a calculator button (or indeed mouse event or simulated mouse event as mouse events are also encoded using the Private Use Area in my research). William Overington 29 October 2002
Re: Character identities
I don't know whether you might be interested in the use of a small letter a with an e as an accent codified within the Private Use Area, but in case you might be interested, the web page is as follows. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm I have encoded the a with an e as an accent as U+E7B4 so that both variants may coexist in a document encoded in a plain text format and displayed with an ordinary TrueType font. http://www.users.globalnet.co.uk/~ngo William Overington 25 October 2002
Unicode plane 14 language tags.
On the http://www.unicode.org/ website is a link entitled Public Issues for Review which link leads to the http://www.unicode.org/review/ web page. The first such issue upon which comments are invited is the following proposal. Deprecate the Plane 14 Language Tags It seems to me that deprecating these language tags might be a bad thing as the language tags could well have potential use in plain text files on the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) platform in order to signal to a Java program accessing a text file the language in which any particular text is written. At the present time I have no plans to use the Unicode language tags myself, yet it does seem to me a pity that just as DVB-MHP, which uses Unicode, is starting to be run in more than one country that an existing method of encoding information about languages is possibly to be formally deprecated. Now, there may be good reasons for the deprecation, yet none are stated on that web page. I feel that I would like to mention the matter of the possibility of using language tags upon the DVB-MHP platform so that that can be taken into account by the Unicode Technical Committee when it discusses the matter. Certainly I am hoping to send in an informed comment upon the matter in the manner mentioned on the web page using the online contact form. However, before doing so, I am wondering if perhaps the reasons for suggesting the deprecation of plane 14 language tags could please be discussed in this mailing list. DVB-MHP broadcasts have recently begun in Germany, there is information on the http://www.mhp-forum.de website. The text information is in German, though there are lots of pictures and for many of them clicking upon them enlarges them. I found the language translation facility at http://www.google.com very useful for translating the text. Germany follows Finland in introducing regular DVB-MHP broadcasts. Information on the DVB-MHP system is available at the http://www.mhp.org website, in English. There is also the discussion forum at the http://forum.mhp.org website. William Overington 26 October 2002
Re: XML Primer (was Keys. (derives from Re: Sequences of combining characters.))
Shawn Steele wrote to the [EMAIL PROTECTED] list, not directly to me, yet began by writing. Mr. Overington, There is then a long document of very helpful information, for which I am grateful. Mr Steele then concludes with the following. I hope that this example improves your understanding of XML and how it may be applied to your inventions. As others have mentioned, this topic is digressing from the purpose of this message board and would be best discussed off line or in a different forum. Well, a letter addressed to me could have been sent by private email. - Shawn Shawn Steele Software Developer Engineer Microsoft Unfortunately, this is then followed by the following. My comments in no way endorse the original Well, that is fine, the letter has been posted to the Unicode list from a Microsoft address, so a clarification makes the situation clear just in case anyone had thought that in some way it might. and are not intended to confer legitimacy, Ah! That is not fine. The original is entirely legitimate and there is no need for legitimacy to be conferred at all, also the conferring of legitimacy is not something which is within the powers of Microsoft to confer, as Microsoft is a corporation and does not vote in public elections, let alone have jurisdiction in such matters. Mentioning legitimacy in that way in a document from Microsoft, a member of the Unicode Consortium, is very unfair. rather they are merely intended to be educational. Well, they are merely intended to be educational. No rather about it. This posting is provided AS IS with no warranties, Well, that is fine, the letter has been posted to the Unicode list from a Microsoft address, so a clarification makes the situation clear just in case anyone had thought that in some way it might. and confers no rights. What rights are being referred to here? William Overington 27 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable commented as follows. On 09/26/2002 06:05:45 AM William Overington wrote: Dallas is 6 hours behind England on the clock. I'm going to refrain from commenting on anything beyond the markup issues As you wish. Though did you stick to that even in the same sentence? -- and I'm continuing with that only because it's an easy follow-on to what I already wrote, As you wish. even though there is every indication that the sensibility of it will be ignored. This did not appear to have meaning. I checked on the meaning of the word sensibility just to make sure. Did you intend to convey the meaning the good sense of what I write rather than the sensibility of it? Yet what indication whatsoever do you have that I ignore what you write? I do not always agree with you, yet where specific references to documents on the web are made I always attempt to obtain them and study the points you make. Certainly, I may not agree with you. Sometimes I agree, sometimes I do not agree and sometimes I am undecided in a matter. That surely is the nature of critical scholarship and research. A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/ which are only slightly more verbose, but which follow a widely-implemented standard that can be parsed by lots of existing software, for which there are a large number of tools available, and which a vast number of indivuals, businesses and other agencies have an interest in. Your markup convention is completely proprietary, Thank you. That is excellent. I designed the comet circumflex key with the specific intention that it was creatively original whilst being expressible using a standard all-Unicode font. it has no existing software support, and nobody but you has any interest in it. You have no basis whatsoever for claiming that nobody other than me has any interest in it. Maybe you are not interested, maybe some people you know are not interested, yet I feel that it is unfair for you to make such a statement without evidence when writing from an established organization as that remark may prejudice people from taking an interest in helping to develop the idea because of a political dimension of going against the tide. You have your position and I feel that you should allow someone who does not have such a position an even-handed chance to put forward an idea and have it considered on its merits. You tell me which one is more likely to result in productive work and adoption by others. Likelihood of success and what actually happens are not the same thing. I do not know which is more likely as I do not know of what has happened already. Some people may have deleted the email, some may have read it and disregarded it, yet it is possible that some people might have tried to produce a comet circumflex button on the screen using an all-Unicode font and might be considering the possibilities of how the system could be applied or might even be writing an experimental software program which can take comet circumflex sequences and process them through a database. Look, for example, at The Respectfully Experiment in the Unicode mailing list archives. There a result was assumed and something different was observed in practice. that it is because I am an inventor, interested in pushing the envelope as to what is possible scientifically and technologically. Marco asked me a specific question, so I answered what he had asked. Perhaps there is an [EMAIL PROTECTED] list somewhere where you might find greater interest in your ideas than here. That is unfair of you. You have chosen to respond to my posts and I have answered the questions which you asked. You even stated in the same post. quote I'm going to refrain from commenting on anything beyond the markup issues end quote The topic of keys generally which I have introduced is potentially a far-reaching development in the application of markup in Unicode based systems. My own comet circumflex system may be highly useful in business communications and distance education. I am happy to respond to questions and to consider documents which people suggest. None of us here mind invention, but I think most would believe that inventiveness is most productive when building off the advancement of others rather than reinventing wheels or widgets. XML exists, and it works. XML exists and it uses U+003C in a way that makes using U+003C with the meaning LESS-THAN SIGN in body text intermixed with markup sections awkward. That feature of XML may not matter for situations involving encoding simply literary works, yet for a comprehensive system which can include the U+003C character with the meaning LESS-THAN SIGN in body text and in markup parameters, it does not suit my need. Beside the fact that your proposed markup convention is not a good idea, it has nothing
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable wrote as follows. On 09/26/2002 03:42:16 AM William Overington wrote: Well, it might have been 03:42:16 AM where you are, indeed it probably was, as Dallas is six hours behind England on the clock, but I would not want people to think that I write my posts in the middle of the night! On the one hand, you say XML does not suit my specific need as far as I can tell. But you also said Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system, so the codes which you suggest would be unsuitable. In that quote the codes which you suggest was your list of specific Unicode code points as follows. quote Sorry to be blunt, but that's silly. If you need a special-purpose character (a code-sequence, to be more precise) for use within your specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1, 2FFFE... 10FFFE, 10. They are non-characters available for exactly this use. end quote I maintain that they are unsuitable for use in documents which are to be sent from one end user to another. Yet the first part of my sentence which you have quoted could by going to the final comma and converting it to a full stop form a sentence on its own as follows. Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system. So, I will reason from that. You also quote me as stating the following sentence. XML does not suit my specific need as far as I can tell. I am happy with that. The two sentences are entirely consistent. Are you perhaps trying to make a deduction by the fallacy of the undistributed middle, along the following lines. William's need is a markup system. XML is a markup system. William's need is XML. It may well be that XML could be used to carry the comet circumflex code numbers which I am devising. I am not saying that it could not be so used. I am simply saying that XML, as I understand it, does not suit my specific need. For example, if I understand it correctly, XML uses U+003C in a document in such a manner that its use for the meaning LESS-THAN SIGN in the body of the text cannot be used directly. For me, that is a major limitation of XML. Now, I am not trying to make some big issue out of this by criticising XML as I am not trying to criticise XML, yet to my mind that is a very big legacy issue of which I do not want to have the problem with my research in language translation and distance education. Maybe one day Unicode will encode special XML opening and closing angle brackets so that XML can operate without that problem. However, as XML uses the U+003C character in that manner at the moment, for me it is a problem and it has led me to use the key method using a comet circumflex key. Also, I do not need to have all those characters and = characters and / characters within messages. One of the things that is especially useful about XML and related technologies is the facility with which data can be repurposed. You have one schema for marking up data, and stylesheets that transform it as needed for different publishing / usage contexts. Also, I don't see how it can be that a character sequence such as U+003C U+0061 U+003E can't be useful to you when some ridiculous character sequence like U+2604 U+0302 U+20E3 is. Well, U+2604 U+0302 U+20E3 is not ridiculous. It is entirely permissible within the Unicode specification. I have used combining characters productively, in accordance with the rules set out in the specification. Please see section 7.9. The button displays using an all-Unicode font. If you think it ridiculous then maybe that is good evidence of its originality as a piece of creativity. A comet circumflex key could be viewed as a piece of original art. I specifically designed it so as to be a design which involves an inventive leap so as to produce something new and unexpected, which someone skilled in the art would not produce as the application of skill in the existing art without invention, yet which would display properly using an all-Unicode font. The sequence U+003C U+0061 U+003E is unsuitable because it begins with a U+003C character and I do not want the use of U+003C to mean LESS-THAN SIGN to be unavailable in a simple direct manner. I want to be able to use the comet circumflex translation system in documents which contain mathematics and software listings as well as literary text. So, I have decided to use a straightforward system which allows me to do that without problems. An added bonus of using the comet circumflex key is that documents containing comet circumflex codes do not necessarily need to contain any characters from the Latin alphabet. William Overington 27 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable commented as follows. On 09/25/2002 05:55:02 AM William Overington wrote: For example, I am looking at using the following sequence so as to produce a special purpose key within documents. U+2604 U+0302 U+20E3 Hopefully that sequence will be so unlikely to occur other than in my specialised application that the sequence can be used uniquely for that specialised application. Sorry to be blunt, but that's silly. If you need a special-purpose character (a code-sequence, to be more precise) for use within your specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1, 2FFFE... 10FFFE, 10. They are non-characters available for exactly this use. Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system, so the codes which you suggest would be unsuitable. If you need real character sequences for markup, there's this thing called XML. Perhaps you've heard of it. It's worth taking a look at; I think it really might catch on some day. I have heard of XML, though I know little about it. I have read some introductory documents about XML. XML does not suit my specific need as far as I can tell. William Overington 26 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
inert and not translated, or else translated in some parameterized form. Mr Cimarosti added the following. Mr. Overington, why do you have this irresistible compulsion to mix up apples and horses? (I feel that the usual apples and oranges is not enough to convey the idea fully.) I like the phrase apples and horses. I have not heard it before, is it your original? It has inspired me to write a song. http://www.users.globalnet.co.uk/~ngo/song1018.htm I suppose that the answer to your question is that, if indeed it is a personality feature which can be described as you suggest, that it is because I am an inventor, interested in pushing the envelope as to what is possible scientifically and technologically. Sometimes such an approach is fruitless, yet at other times it can be very successful. In relation to the keys technique which I have suggested generally, and to the Comet Circumflex system in particular, whether these ideas will be successful or fruitless is something which cannot presently be determined. William Overington 26 September 2002
Keys. (derives from Re: Sequences of combining characters.)
The recent discussion on sequences has led me to have a look through the various combining characters and I have found the following. U+20E3 COMBINING ENCLOSING KEYCAP It has occurred to me that the use of a sequence of a base character, then one or more combining characters so as to produce a sequence which would be otherwise unlikely, followed by U+20E3 might be a very effective way to include specialised markup systems within a plain text file without disrupting the normal textual information conveying capabilities of a file. An all-Unicode font would then produce a graphic representation of the key, without any prior arrangement being necessary, so that such marked-up sequences could be produced using just a regular all-Unicode plain text editor. A receiving program with a specialized plug-in could then decode the markup, or it could be decoded manually in some cases. For example, I am looking at using the following sequence so as to produce a special purpose key within documents. U+2604 U+0302 U+20E3 Hopefully that sequence will be so unlikely to occur other than in my specialised application that the sequence can be used uniquely for that specialised application. I am also thinking in terms of using the following sequence to indicate the end of the markup sequence. U+2604 U+0302 U+20E2 I have it in mind that characters in the range U+2460 through to U+2473 could be used before parameters within the markup system. Also, I have noticed that in the document U02D0.pdf that U+20E4 is shown, in the listing, in magenta whereas U+20DF is shown in black. Could someone say what significance the magenta colouring in the document has please? Is it perhaps to indicate additions since the previous issue of the document? William Overington 25 September 2002
Re: entities with breve
Peter Constable wrote as follows. The answer would be to encoded characters comparable to U+0361. A combining double breve has already been approved for version 4.0. I intend to propose (unless someone gets around to it before me) a combining double inverted breve below. In the mean time, one can encode these as PUA characters (which is an interim solution we're going to be using, at least for some purposes). Could you please say some more about what is going to be encoded in regular Unicode and with which code points please? In relation to your encoding these characters as Private Use Area characters, I wonder if you could please say some more about this please, both in relation to which code points you are intending to use and also as to whether encoding a combining accent character or a combining double into the Private Use Area could lead potentially to any problems over a rendering system recognizing the character as being a combining character (please know that I have no specific reason to think that it would, it is just a possibility about which I wondered when considering various uses of the Private Use Area). William Overington 25 September 2002
Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)
Kenneth Whistler wrote, as part of a longer response to my original posting. William Overington asked: [snip] I wonder if consideration could please be given as to whether this matter should be left unregulated or whether some level of regulation should be used. I think this should depend first on a determination of whether there is a demonstrated need for an actual representation of these sequences -- which ought to be determined by the people responsible for the data stores which might contain them, namely the online bibliographic community. [further remarks here snipped] Actually, this matter to which I was intending to refer was as follows, being more general than just the romanization of Cyrillic characters. quote It seems to me that this matter of sequences of combining characters being used to give glyphs where different meanings are needed other than just locally and that glyphs for such meanings are only correctly displayed if a particular rendering system or a particular font are used touches at the roots of the Unicode system. It seems to me that the glyphs for such sequences are being left as if they were a Private Use Area unregulated system. I recognize that fonts have glyph variations in that, say, an Arial letter b looks different to a Bookman Old Style letter b, yet in that case the meaning is the same. I wonder if consideration could please be given as to whether this matter should be left unregulated or whether some level of regulation should be used. end quote In another post in the same thread, Ken states as follows. quote But that wasn't my point. There is no particular evidence that the ALA-LC conventions with the dot above the graphic ligature ties is in widespread use for romanizations of these particular languages, that I can see. So the *urgency* of solving this problem isn't there, unless the LC/library/bibliographic community comes to the UTC and indicates that they have a data interchange problem with USMARC records using ANSEL that requires a clear representation solution in Unicode. end quote The problem of which I am seeking discussion please is as to whether, in the present state of the rules, there would be any need for any bibliographic community to approach the Unicode Consortium over such a matter, and, if it is the case that they would not need to do so, would it be better to seek to change the rules now. It is convenient to consider the situation in relation to the romanization of Cyrillic characters, yet similar considerations may well potentially also apply to topics such as the Byzantine legal texts. There may well be other topics to which similar considerations may apply. For example, please suppose that there were a committee called the Romanization of Cyrillic Committee. Suppose that that committee were to have various meetings and decide that for a ts romanization ligature that t U+FE20 s U+FE21 suits them fine, and that for the ts with a dot above romanization ligature that t U+FE20 s U+FE21 U+0307 suits them fine and publishes a list of assignments and example glyphs. The glyph for the ts with a dot above ligature in that publication has the dot above the curved line, centred horizontally. It is only later that someone with expert knowledge of the Unicode standard sees the published list and notices that the glyph shown in the document is, in fact, not the way that the glyph should appear according to the Unicode standard. By this time, many copies of the document have been published and sent to libraries around the world! Databases having started to be converted to what that publication may well be calling the new Unicode based system. This might sound impossible, yet what is the present alternative? There is no way to formally register such sequences with the Unicode Consortium! I suggest that it might be a good idea to have an infrastructure whereby the Unicode Consortium registers sequences of combining characters and example glyphs, categorized as to application. This would have potentially far reaching benefits. Suppose, for example, that such an infrastructure existed, and that there is a mathematician, M, and a font designer, F, who do not know each other. M is writing a research paper on a particular branch of mathematics, where one of the key reference papers was written by an author whose name is written in Cyrillic characters, yet which name also has a romanized version. M finds that that romanization needs a character to represent the ts romanization ligature. How can M, who is using a word processor to prepare the research paper, insert that character into the document, because M is keen to insert the ts ligature in a form compatible with the standard bibliographic method for romanization of Cyrillic names? Fortunately, M finds that the word processor has available various special characters and finds a ts ligature and inserts it in the document. Behind the scenes the wordprocessor software inserts
Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)
In the discussion about romanization of Cyrillic ligatures I asked how one expresses in Unicode the ts ligature with a dot above. Regarding Ken's response to the Byzantine legal codes matter, it would appear possible that the way that the ts ligature with a dot above for romanization of Cyrillic could be represented in Unicode would be by the following sequence. t U+FE20 s U+FE21 U+0307 The ordinary ts ligature for romanization of Cyrillic being expressed as follows. t U+FE20 s U+FE21 The second example is from the recent thread on Romanized Cyrillic bibliographic data. In the recent thread about Byzantine legal codes, the following sequences were suggested. U+0069 U+0313 U+0301 U+0055 U+0313 The second of the above requiring a rendering different from what direct reading of the Unicode specification might suggest. Ken's reply seems to suggest that display of such sequences would be renderer dependent or font dependent. It appears to me that the ts ligature with a dot above, and a similar ng ligature with a dot above, are already needed for the Library of Congress romanization of Cyrillic system. The following directory contains a lot of pdf files. http://lcweb.loc.gov/catdir/cpso/romanization The ts ligature with a dot above can be found on page 2 of the nonslav.pdf file. The ng ligature with a dot above can be found on page 13 of the same file. Capital letter versions of the two ligatures are needed as well. The two sequences U+0069 U+0313 U+0301 and U+0055 U+0313 mentioned above, and possibly others, will be needed for the Byzantine legal codes. It seems to me that this matter of sequences of combining characters being used to give glyphs where different meanings are needed other than just locally and that glyphs for such meanings are only correctly displayed if a particular rendering system or a particular font are used touches at the roots of the Unicode system. It seems to me that the glyphs for such sequences are being left as if they were a Private Use Area unregulated system. I recognize that fonts have glyph variations in that, say, an Arial letter b looks different to a Bookman Old Style letter b, yet in that case the meaning is the same. I wonder if consideration could please be given as to whether this matter should be left unregulated or whether some level of regulation should be used. William Overington 18 September 2002
Re: ISRI SoEuro has just been created!!
One practical use of this code page which occurs to me is as follows. Suppose that on a Windows 95 PC, (I am preparing this email on a Windows 95 PC), suppose that someone wishes to produce a graphic which includes the words of an Esperanto poem or song, the graphic being prepared using the Paint program. If a font with the layout which is being suggested is used, then the text can be set within the Paint program. It appears that, with a suitable font, sequences of holding down the Alt key then keying 0 from the numeric keypad at the right of the keyboard, followed by a number from 128 to 255 keyed from the numeric keypad at the right of the keyboard then releasing the Alt key would permit the keying of the characters of the new 8-bit set. On a Windows 98 platform the text can be set in WordPad using a Unicode font using Alt sequences (for example Alt 264 for C circumflex, Alt 265 for c circumflex) and then the graphic image copied onto the clipboard using Print Screen then pasted into Paint, yet for WordPad on at least this Windows 95 machine that will not work. Certainly, on a Windows 95 machine if someone has Word 97 installed, then Word 97 can be used to set the Esperanto text before using a Print Screen operation, though Word 97 is a premium package not available to people using minimum systems and possibly not available to people using an open access PC in a public library. So, as far as Esperanto goes, this code page offers the chance, if someone will make available a font using those codings, that people using a minimum Windows 95 system, perhaps in a public library setting, could produce elegant graphics using the Paint program. It would appear, on the face of it, that this new code page suggestion makes that facility available not only for Esperanto but also for a number of other languages. I hope that a suitable font becomes published on the web using this set of code points so that people who are using Windows 95 systems can have this additional facility available to them. I know, for example, that there is a font available for Tamil which uses the 8-bit code space. I feel that having such facilities available does not detract from Unicode, I feel that they tend to get people interested in producing end results and that in the long term that that may well get them interested in Unicode. Or indeed, end users of such facilities may well have good knowledge of what is needed to use Unicode and might like to use Unicode but have to make the best of only having less than the very latest equipment available. I also know that there are various fonts available, such as some Fraktur fonts, which use the 8 bit codes from 128 to 255 for ligatures. Those fonts too are not using Unicode code point assignments, yet hopefully, as time goes on, those fonts will become updated so as to use Unicode code points, though that would appear to only be possible on operating system and software combinations which will recognize and act upon sequences using the U+200D ZERO WIDTH JOINER character so as to produce the ligatures, unless Private Use Area encodings for the ligatures are used. Unicode is very important, yet I feel that it is also very important that facilities are provided for people using the many older machines which are still in use around the world. This new code page may well help in the process of solving computing problems now. Those same problems can also be solved now using more modern equipment with later facilities, for those people that have access to those facilities. As time passes, maybe the Unicode solution will become universally the useful solution, yet for the present, the new code page may well have usefulness for some end users of computing equipment. While writing, can I please ask as to what characters A9 and B9 are meant to represent as they come out as black squares here? In using the Microsoft Paint program using the text tool I have found that some fonts such as Arial, Code2000 and Times New Roman offer various versions of the font with names such as Baltic within parentheses after the name of the font, which can be used using Alt ddd sequences and Alt 0ddd sequences, where ddd is a base 10 integer less than or equal to 255, to produce various sets of characters. How please does this mechanism work? I have tried various values of ddd with various of the language groups and found a wide range of characters, yet so far I have not found any way of getting Esperanto accented characters into Paint on a Windows 95 machine using that technique. Is it possible to do so? Are there any charts of these code point allocations available please? So, I am wondering if the new code page could be added into some of those fonts in some way as that would then make Esperanto poems and songs settable using Microsoft WordPad and Microsoft Paint on Windows 95 machines? Would that produce additional facilities for end users of Windows 95 machines? William Overington 12
Variation selector sequences for alternate glyphs. (derives from Re: various stroked characters)
Peter Constable kindly responded to my question in the original thread. Would the encoding that would be intended to be used in the long term use of Unicode be to use one of the characters from the range U+FE00 to U+FE0F following the main character code so as to indicate the glyph alternate? I have not to this point anticipated requesting variation selector sequences for these, but that is not beyond the realm of possibility. I wonder if the matter of variation selector sequences could be clarified for the general situation of applying variation selector sequences please, that is, not necessarily in relation to the particular topic of stroked characters. Would a variation selector sequence be something specified and encoded by the Unicode Consortium or by some other standardization body or would it be a matter for end users on much the same basis as Private Use Area allocation please? William Overington 9 September 2002
Re: various stroked characters
products be easily adapted or would new products be needed? A good benchmark test might be to send c ZWJ t in a document and, using U+E707 to access a precomposed ct ligature glyph, display the ligature on the screen of a computer which cannot use an advanced format font by means of the receiving software automatically producing a temporary local document wherein the U+E707 code is used instead of the c ZWJ t sequence of the transmitted document. How difficult is that benchmark to achieve please? Is it a major software development or could it be written into a macro by a knowledgeable person within a few hours? William Overington 6 September 2002
Re: Double Macrons on gh...
Kenneth Whistler wrote as follows. In practice, fonts might simply choose to have ligatures for the entire sequence, to avoid complications of calculating the accent positions dynamically. For more examples, just look in dictionary pronunciation guides. --Ken An interesting problem which may arise is that the Unicode Consortium will not be specifying particular ligatures to include in fonts and that font designers may not have available from any public source a list of such ligatures for which to prepare the glyphs to include in a font. This could then result in a muddle in the future when end users are trying to use such ligatures in a document and find that for some key ligatures which they wish to use that the implementation in some fonts is by default action rather than special glyph, which default action may, for some requested ligatures, result in a typographically awful display. This issue first came to my attention in the matter of the ligatures for the romanization of Cyrillic names and unknown words, where special ligatures would be desirable due to the need to have U+FE20 and U+FE21 act in both TS and iu ligatures. I wonder if, for the guidance of font designers, there should be a list of desirable ligatures for which font designers might choose to prepare specific glyphs for inclusion in an advanced format font, the list prepared by consultation between the various dictionary publishers, libraries and so on. Such a list, while not obligatory for anyone to use, would nevertheless be a useful collected guide which font designers could use so that fonts could be designed so as have individual glyphs for all of the ligatures on the list. The list could include the specific Unicode sequence to access each ligature. It may be that there would need to be more than one list, so as to provide for various specialised areas of activity without making a general list too large. Do you think that such a published list or lists would be useful? William Overington 31 August 2002
Re: Romanized Cyrillic bibliographic data--viable fonts?)
Edward H Trager wrote as follows. ... I was also thinking about the issue of how do you get the highly qualified designers interested in such a project? In answer to the specific question. One might consider the possibility of offering them a fee-paid assessment of a portfolio of their work with the hope of receiving a qualification or some formal academic credit from an appropriate body. In relation to obtaining highly skilled and experienced designers to participate in the project. The project could be organized so as to include a training facility, as a distance education process, for those of us who are not expert type designers so that we learn on the project and thus the reward for the time and skill which we spend on the project is enhanced skills and professional experience in participating in a typographic project of such world class standing. If the second suggestion could be combined with the first suggestion, then the prospect of a high quality distance education and participation opportunity without the participants having to pay any fees might be the way to get the result which you desire. It could be organized as if each participant were carrying out a consecutive series of final year undergraduate projects. This approach may not be something in which everybody would wish to participate, yet it could be regarded as a magnificent opportunity by some of us. The including of a training and learning opportunity in the project could be the factor that provides the necessary amplification of effectiveness factor to whatever funds are available so as to attract a good number of keen people who would put in a lot of effort. Participation in the project and being able to include that participation in a curriculum vitae would be a huge incentive to people who are not currently employed in the typography field and cannot realistically attend a full time course yet who would value the opportunity to gain professional quality skills and experience. This could be a wonderful opportunity! William Overington 31 August 2002
[Possibly off-topic] Fonts for experimental usage. (spins off from Re: Romanized Cyrillic bibliographic data--viable fonts?)
Peter Constable wrote as follows. On 08/27/2002 12:08:09 AM James Kass wrote: William Overington has mentioned the Softy editor. Please keep in mind that fonts are copyrighted material, and, mostly users are forbidden to modify them, even for internal use purposes. The best way to get characters added to a font is to ask the font's developer. I agree completely. Also, it's worth noting that font engineering involves rather more than just adding a few extra characters, especially when smart fonts are involved. Note, for instance, that some tools may trash the hints in a font. The overarching issue, though, as James pointed out, is that very often it is simply not legal to make such changes. - Peter James raises the important matter of intellectual property rights in fonts and suggests that the best way to get characters added to a font is to ask the font's developer. Peter agrees with James and adds some good computing reasons as to why, even if permission were available, simply adding a few extra characters without highly expert skills would not be an effective solution. The matter which concerns me is as to whether James' suggestion that the best way to get characters added to a font is to ask the font's developer, while probably quite true, is nevertheless, in effect, what the theory of procedural rules would, if that course of action were a formal motion for a meeting, term a pious motion. What I mean by this is that, for example, if someone does want a particular character added to a font, how effective, in practice, is such a request likely to be, qualitatively in terms of whether such a request would be accepted at all, and quantitatively, in terms of time scale and financial charge, as to how accessible such an addition would be to someone making such a request of a font's developer. Now, let me say at once that James has already shown in his posting that, in the particular case of the situation in the thread from which this discussion has spun, that he has reacted proactively in setting about adding U+FE20 and U+FE21 to his own font, and hopefully the results of that addition will be available to all at the next release of that font. Yet is that a response which could be anticipated as typical of font designers? James produces mainly one huge font which covers many Unicode characters and is continually adding items to produce a better version for a later release. What is the situation with other font developers? Is it perhaps the case that some font designers, or a team, produce a particular font and then wrap up the project, so that adding a few extra characters at a later date would mean a substantial restarting up of the project? I do not know the answer to this and I wonder if some font designers could perhaps comment upon the possibilities and the modalities of someone getting a font developer to add a few characters to an existing font please. The whole situation has led to me trying to think out the problem of how someone could get a few extra characters added to a font and, recognizing the issues and problems that James and Peter mention, I wonder if I may perhaps put forward a few thoughts on the matter, which might perhaps lead to a new infrastructural facility for end users of the Unicode system. Firstly, I mention that I know, at present, very little about font authorship. I have only used the Softy program and not all of the facilities in Softy yet. I am aware that there are various sophisticated font authoring packages available, which are expensive and not widely accessible by many end users of Unicode. When using Softy, one method of designing a glyph is to load a template, which is a .bmp file of a large monochrome image of the desired end result in a .bmp file, say about 200 pixels by 200 pixels or thereabouts, and then to use Softy to automatically outline the template so as to produce the Bézier curves for the glyph. The template file can be produced using a widely available package such as Microsoft Paint. I have had a lot of learning fun producing experimental glyphs for a few characters using Paint in this way, including using the line, ellipse and curve tools of Paint to produce two tengwar-inspired fantasy characters, namely a double thorn and a double thorn with tilde, in a manuscript style, by drawing upon a background grid of a different colour. I wonder whether it would be possible for some interested people to devise some basic grids in .bmp format with green and cyan lines upon them so that the containing boxes for letters x, h, p, a circumflex, A circumflex and so on were indicated, so that any interested end user could draw a desired character, using Paint, upon a copy of such a grid using black, then erase the green and cyan lines. This could have the effect that if, say, twenty to fifty end users each produced designs for five or more characters, that the artwork for an easily extendible font could become available. Clearly
Re: Romanized Cyrillic bibliographic data--viable fonts?
James Kass wrote as follows. Unless a font is fixed width, Latin combiners can't currently consistently combine well without smart font technology support enabled on the system. So, don't blame the Arial Unicode MS font if these glyphs don't always merge well. While awaiting Latin OpenType support, it might be a good idea to take a look at a well populated fixed width pan-Unicode font like Everson Mono. James had also previously written. Best regards, James Kass, who is now adding U+FE20 .. U+FE23 to the font here. I have had a look at the problem and decided, as the saying might go, that even the best cook cannot make herb and carrot sauce starting with parsnips and some huge quantity of thyme! The characters need to cover ligatures for both TS and also iu so they need to be high and arranged so that the result looks reasonable. So, I began to think that the best display option would probably, in the long term, be for an advanced format font to carry all of the necessary glyphs and to produce a glyph in response to an appropriate four character sequence. However, the problem remains for people with other than the very latest equipment, so I have decided to add some of these ligatures into the golden ligatures collection. This is quite an interesting task, as, starting from the reference to the pdf file I worked back to the directory and there found various other pdf files, some for other languages which use Cyrillic characters. http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf http://lcweb.loc.gov/catdir/cpso/romanization Thus far, I have made the following allocations for the golden ligatures collection. Please know that my approach is mathematical rather than linguistic. The first four pairs are for romanizing Russian names and unknown terms. U+E7A0 for T U+FE20 S U+FE21 U+E7A1 for t U+FE20 s U+FE21 U+E7A2 for I U+FE20 E U+FE21 U+E7A3 for i U+FE20 e U+FE21 U+E7A4 for I U+FE20 U U+FE21 U+E7A5 for i U+FE20 u U+FE21 U+E7A6 for I U+FE20 A U+FE21 U+E7A7 for i U+FE20 a U+FE21 The next two pairs are additions for Belorussian and Ukrainian. U+E7A8 for I U+FE20 O U+FE21 U+E7A9 for i U+FE20 o U+FE21 U+E7AA for Z U+FE20 H U+FE21 U+E7AB for z U+FE20 h U+FE21 I have started writing it all up for our web site, where it will hopefully be posted, making clear the use of these code point allocations for producing displays, not for storing text in databases that need to be searched and sorted. However, I would like to make the list of encodings more comprehensive and would welcome feedback on which ligatures to include. The files churchsl.pdf and nonslav.pdf from the above named directory are the source material that I have found so far which has not yet been covered in the above encodings. Suggestions of other source material are welcome. I have looked through them and found some very interesting characters, such as what looks like an o macron and t ligature and also a t s ligature with a dot above the whole ligature as well as other ligatures along similar lines. I would appreciate any information about expressing those in Unicode which anyone can provide please, either to the mailing list or, if a writer prefers, privately by email. The current documents about other ligatures already in the golden ligatures collection can be found from the following introduction and index page. http://www.users.globalnet.co.uk/~ngo/golden.htm William Overington 27 August 2002
SC UniPad 0.99 released.
As an end user of Unicode I was interested to learn recently that the latest version of SC UniPad, a Unicode plain text editor for various PCs, has been released. This latest version is SC UniPad 0.99 and is available for free download from the following address on the web. http://www.unipad.org A particularly interesting new feature is that one may hold down the Control key and press the Q key and a small dialogue box appears within which one may enter the hexadecimal code for any Unicode character. Upon pressing the Enter key, that character is entered into the document. SC UniPad contains its own font. Please note in particular the buttons in a column down the left hand side of the display. These alter the way in which some code points are indicated in the display. For example, if one clicks on the button labelled FMT (which controls Character Rendering: Formatting Characters)and selects Picture Glyph, then entry of U+200D into the text document shows a box with the letters ZWJ in it. I first learned of the existence of the UniPad program in a response to a question which I asked in this forum, so I am posting this note so that any end users of the Unicode system who are at present unaware of the existence of the UniPad program might know of the opportunity to have a look at it if they so choose. The web site has a facility to request email notification of developments to SC UniPad. It was by a such requested email notification that I became aware of the availability of SC UniPad 0.99. William Overington 26 August 2002
Re: Romanized Cyrillic bibliographic data--viable fonts?
J M Craig wrote as follows. [snipped] Any suggestions welcomed! Is there a tool out there that will allow you to edit a font to add a couple of missing characters? You might like to have a look at Softy, which is a shareware font editor for TrueType fonts. Softy can be used to produce new TrueType fonts and to edit existing TrueType fonts. http://users.iclway.co.uk/l.emmett/ There is some more information about Softy, including the correct email address for registrations, at the following page. http://cgm.cs.mcgill.ca/~luc/editors.html Having a look for Softy and Softy font at http://www.yahoo.com might be helpful. I am trying to obtain a copy of the tutorial by Grumpy, so far without success. I have found the other tutorial and it is very useful. I have had lots of fun with the Softy program and although I have not tried to implement the U+FE20 and U+FE21 which you mention, I have tried various experiments using Softy and have found it a very satisfactory package to use. Softy is shareware, so perhaps you might think it worth a try to find out if it will help you do what you want to achieve. Also, you might like to have a look at the SC UniPad program which I mentioned earlier today in another thread. When I was studying your posting I used SC UniPad to have a look at the various Cyrillic characters which you mentioned. As far as I can tell at present SC UniPad does not position the U+FE20 and U+FE21 characters as you might want them to appear, yet SC UniPad would seem like a good way to key in the text, ready to copy and paste it into another program which would be used to display the thus keyed text using a font of your choice. William Overington 26 August 2002
The Unicode Technical Committee meeting in Redmond, Washington State, USA.
As many readers may know, the Unicode Technical Committee was due to start a four day meeting yesterday at the Redmond, Washington State, USA campus of Microsoft, that is, on 20 August 2002. Here in England I am interested to know of what is happening and to learn of news from the meeting. So, from here in England I am starting this thread in the mailing list in the hope that some of the people at the meeting might like to post news of what is happening at the meeting please. This is not in the same news gathering league as having CNN and other reporters providing live reports from outside the venue and catching quotes from prominent members of the Committee as they arrive and depart and there being live press briefings from an official spokesperson, yet in its way it is still news gathering and hopefully will be of interest to other participants in this mailing list as well. It is the early hours of the morning in Washington State at present. It is hoped that when delegates get up for breakfast that they might look in their emails and make early morning responses, or perhaps arrange for an official briefing to be posted later in the day. If I were conducting a live interview with the committee chairman or with an official spokesperson I would ask the following questions. * What was discussed yesterday (Tuesday) please, and what formal decisions, if any, were taken please? * How many people attended please? * Is it only companies which are full members of the Unicode Consortium who send delegates to the meeting, or are there also representatives of organizations who do not vote in decisions present as well? * What is the agenda for today please? * What is the agenda for the rest of the week please? * Will there be a press statement at the close of the meeting please, and if so, will it also be posted in the Unicode mailing list please? Depending upon the responses to the above, I would, if the topics had not been covered, ask specific questions related to the following. * Has there been, or is there on the agenda, any discussion of the wording in the Unicode specification about the use of the Private Use Area and, if so, are any changes to that wording being implemented? * Has there been, or is there on the agenda, any discussion concerning the status of the code points U+FFF9 through to U+FFFC please? There has been some discussion recently in the Unicode mailing list about these code points, as regards issues of U+FFF9 through to U+FFFB as an issue, the issue of using U+FFFC as a single issue, and the issue of using U+FFF9 through to U+FFFC all together. Is the committee discussing these issues at all and, if so, are they discussing the matter of whether U+FFFC can be used in sending documents from a sender to a receiver please? Is there any discussion of a possible rewording, or changing of meaning, of the wording about the U+FFF9 through to U+FFFC code points in the Unicode specification please? * Are any matters concerning how the Unicode specification interacts with the way that fonts are implemented being discussed please? If so, is due care being taken that as font format is not, at present, an international standards matter that therefore the committee must take great care to ensure that Unicode does not become dependent upon a usage, express or implied, of the intellectual property rights or format of any particular font format specification? * Is there any discussion of the possibility of adding further noncharacters please, considering either or both adding some more noncharacters in plane 0 and a large block of noncharacters in one of the planes 1 through to 14? * Is the committee discussing the issue of interpretation, namely as to how, if various people read the published specification so as to have different meanings, how people may receive a ruling as to the formally correct meaning of the wording of the specification. This recently arose in relation to the U+FFFC character and has previously arisen in relation to what is correct usage of the Private Use Area, so there are at least two areas where the issue of interpretation has arisen. I am hoping that regular postings of what is happening in the meeting will appear as the meeting progresses so that there is both information for people who may be affected by what is decided at the meeting and also so that participants in the meeting might be able to gather end user feedback upon any topics that arise at the meeting before they make any decision which may affect end users. Is there an official press spokesperson for the meeting please? William Overington 21 August 2002
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
there is no question that the publication of a .uof file specification by the Unicode Consortium would prejudice the rights of anyone to use the U+FFFC character in any other manner. Publication of such a .uof file specification would also prevent U+FFFC being made into a noncharacter and keep the facility of using the U+FFFC character in interchanged documents available for all, whether they choose to use the .uof file format or some other format for explaining the meaning of any U+FFFC codes in a given document. Could this be discussed at the Unicode Technical Committee meeting next week please? William Overington 16 August 2002
Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
the name of the graphic file would seem perfectly suitable. Then there is this curious passage. Note that it is also *permissible* in Unicode to spell permissible as purrmisuhbal. That doesn't mean that it would be a good idea to do so, but the standard does not preclude you from doing so. You could even write a rendering algorithm which would display the sequence of Unicode characters p,u,r,r,m,i,s,u,h,b,a,l with the glyphs {permissible} if you so choose. --Ken Well, who is this you, certainly not me! :-) Thank you for your response, which has been very helpful. William Overington 16 August 2002
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
James Kass wrote as follows. William Overington wrote, No, it is a story about an artist who wanted to paint a picture of a horse and a picture of a dog and, since he knew that the horse and the dog were great friends and liked to be together and also that he only had one canvas upon which to paint, the artist painted a picture of a landscape with the horse and the dog in the foreground, thereby, as the saying goes, painting two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm in that he achieved two results by one activity. In addition the picture has various interesting details in the background, such as a windmill in a plain (or is that a windmill in a plain text file). :-) 1) It's gif file format rather than plain text.* 2) There isn't any windmill. The picture of the birds has been in our family webspace since 1998 as an illustration for the saying Painting two birds on one canvas. That saying, originated by me, is a peaceful saying meaning to achieve two results by one activity. I made the picture from clip art as a learning exercise. The picture of the birds is referenced as a way of illustrating the saying Painting two birds on one canvas. It is not the picture in the story about which Ken asked. I may well have a go at constructing such a picture, perhaps using clip art. The reference to a windmill is meant as a humourous aside to Don Quixote tilting at windmills. I am interested in creative writing, so when Ken asked about the story, I just thought of something to put in my response. Part of the training in, and the fun of, creative writing is to be able to write something promptly to a topic. William Overington 16 August 2002
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
cannot quite do everything which a .uof file could do, as far as I aware, though I am willing to learn if the situation is different. For example, suppose that a book is being made available as a Unicode plain text file and it is desired to add just a few illustrations without a major reformatting of the whole text, which uses CARRIAGE RETURNS to indicate paragraphs. A text editor could be used to insert a few U+FFFC characters at appropriate places in the file and a .uof file could be used to carry a list of the names of the illustration files in the order in which they are used. Conversion to HTML format would require a larger file and would limit the ways in which the file could be displayed to just the use an HTML browser. Also, as there are more than a million characters in Unicode, most are unused so far, so changing the meaning of just FFFC in this one context doesn't seem like a big win, considering also every line of code that might work with FFFC now needs to consider the context to determine its semantics. I don't follow what you mean. However, the meaning of U+FFFC is not, I hope, going to be changed at all. I have simply suggested that an optional way of indicating, outside of the plain text file which contains one or more U+FFFC characters, the extra information as to which object the U+FFFC character is anchoring. But every invention deserves to be implemented, we need not look at whether the invention satisfies some demand of its customers. I disagree with this. My view is that not every invention deserves to be implemented, and indeed that not every invention needs to be considered as consideration takes time and may cost money. However, I do feel strongly, and have for many years, that when an invention is considered it should be considered on its merits and without prejudice, such as, for example, when an invention is turned down because of not representing an organisation discrimination because the invention has been suggested by someone who is not representing a company. As to customer needs, certainly an invention that satisfies an existing need meets that criterion, yet it is also the case that sometimes the need does not exist until potential customers become aware of what has become possible and then begin to have a need, or desire, for it. I like the 2 birds picture and I assume it was a metaphor for the idea- one bird was html the other unicode. I was a little disappointed that you used html instead of .uof format though. The picture of the birds has been in our family webspace since 1998 as an illustration for the saying Painting two birds on one canvas. That saying, originated by me, is a peaceful saying meaning to achieve two results by one activity. I made the picture from clip art as a learning exercise. The picture of the birds is referenced as a way of illustrating the saying Painting two birds on one canvas. It is not the picture in the story about which Ken asked. I am interested in creative writing, so when Ken asked about the story, I just thought of something to put in my response. Part of the training in, and the fun of, creative writing is to be able to write something promptly to a topic. The two birds are not a metaphor for HTML and Unicode at all. Ken put two illustrations in his posting so I put one in mine. It all adds to the interest for readers. William Overington 16 August 2002
Re: Tildes on vowels
James Kass wrote as follows. Indeed, a program designed to display actual superscripts based on the notational form would work pretty much the same regardless of whether standard or non-standard characters are used, and the editing or input screen would also look essentially identical. Yes, yet using Private Use Area codes would not clash the meanings of the regular Unicode characters, so maybe that is preferable. Standard only amongst those end users who choose seems to be a way of saying non-standard. Almost but not quite. It is like describing someone's request for something as not unreasonable rather than as reasonable. A distinction of the use of the English language where Boolean alternatives are not quite the case. Thank you for your help. William Overington 14 August 2002
Re: Tildes on vowels
to problems for new end users. Especially given that no software whatsoever supports the codes, and if it did, one would have to work with custom application software (that knows that U+ means BOLD) and/or with special Courtyard-Code-compatible fonts that know about Golden Ligatures, of which there are none in existence today. The golden ligatures collection is of character glyphs and is a separate thing from courtyard codes which are mostly control codes for formatting and markup. As far as no fonts or software being available today, well that might well be true. However, that might change. Please know that I am not expecting a major reorientation of the software industry, it is just a matter that if someone does like to make a font which includes, in a Unicode compatible manner, whole precomposed ligature glyphs which are accessible directly from a code point, whether for ligatures such as ct or ch or for long s b or for ppe, then there is available a set of code point allocations, which, while not standard, are perhaps more likely to be consistent than any other set of code point allocations which might otherwise be used. As for writing software which recognizes courtyard codes, well maybe people might use courtyard codes in computing generally, and if so, good, yet a primary reason for introducing them is for using them in educational software packages to be broadcast on digital television channels throughout the world using the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system. DVB-MHP uses Java and Java uses Unicode, so DVB-MHP programs which are broadcast use Unicode. These telesoftware programs are a specialist niche in broadcasting, yet, though I say it myself, telesoftware is an extremely powerful computational technique and hopefully in the next few years will begin to fulfil its potential. There is much to be done, yet it is an exciting field and Unicode is a key feature in being able to use it effectively throughout the world. Thank you for your help. William Overington 14 August 2002
Re: Tildes on vowels
Marco Cimarosti wrote as follows. As you see, it is nowhere said that markup is necessarily something beginning with or any other character. The additional information (markup) can be in any format, in fact the definition says: It is expected that systems and applications will implement proprietary forms. Ah! The key point. So my courtyard codes are both fancy text and markup. The fact that they do not enter a markup bubble but instead use individual code points to convey the formatting information does not alter the fact that they are markup. [...] I am not knocking markup, [...] Of course you aren't! Your idea of defining format controls as PUA code point totally fits in the above definition. Yes. So, FARMYARD CODES ARE IS JUST ONE MORE FORM OF MARKUP. And text including the controls IS NOT PLAIN TEXT: it is William Overington's own proprietary form of rich text. I understand what you mean. However, as regards the second sentence in the above quote, so as not to seem to agree tosomething with which I am not agreeing, can I please say that in the dictionary before me at present, the word proprietary is stated as an adjective meaning belonging to owner; made by firm with exclusive rights of manufacture, so I would not wish courtyard codes to be regarded as a proprietary form of rich text. I fully accept that you were probably not using the word proprietary to convey that meaning but to convey a sense that I had made it up myself on my own initiative as between making it up myself on my own initiative and it being devised by a standardization body. You are out of Unicode rules not because you defined your Farmyard codes in the PUA (which is perfectly legal, as I explain below), but because you fail to accept (or understand) that these codes are a form of markup, and that text containing them is a proprietary form... of fancy text. Yes. I now understand. Thank you for the explanation. The only questionable usage of PUA that I can think of is duplicating existing characters. But this would be an absurd deed. Your other proposal of defining PUA ligatures goes near to this, but not quite. Well, I did not define codes for long s t ligature and st ligature in the golden ligatures collection because they are already in regular Unicode. Thank you for your help. William Overington 14 August 2002
Re: Double Macrons on gh (was Re: Tildes on Vowels)
U+0360 COMBINING DOUBLE TILDE U+035D COMBINING DOUBLE BREVE U+035E COMBINING DOUBLE MACRON U+035F COMBINING DOUBLE LOW LINE I also note U+0361 COMBINING DOUBLE INVERTED BREVE and U+0362 COMBINING DOUBLE RIGHTWARDS ARROW BELOW in the code chart. I wonder if someone could please clarify how an advanced format font would be expected to use such codes. I understand from an earlier posting in this thread that the format to use in a Unicode plain text file would be as follows. first letter then combining double accent then second letter As first letter and second letter could be theoretically almost any other Unicode characters, would the approach be to just place all three glyphs superimposed onto the screen and hope that the visual effect is reasonable or would a font have a special glyph within it for each of the permutations of three characters which the font designer thought might reasonably occur yet default to a superimposing of three glyphs for any unexpected permutation which arises? As a matter of interest, how many characters are there where such double accents are likely to be used please? Is it just a few or lots? While in this general area, could someone possibly say something about how and why U+034F COMBINING GRAPHEME JOINER is used please? William Overington 14 August 2002
The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
John Cowan wrote as follows. In essence, though not formally, U+FFF9..U+FFFC are non-characters as well, and the Unicode semantics just tells what programs *may* find them useful for. Unicode 4.0 editors: it might be a good idea to emphasize the close relationship of this small repertoire with the non-characters. That is not what the specification says. Something can only be emphasised if it is true in the first place! If it is desired to make U+FFF9 through to U+FFFC noncharacters then that needs to be done explicitly with a fair opportunity for people to object and make representations before a decision is made. A saying of my own is as follows. When goalposts are moved, aromatic herbs should be scattered around. It seems to me, not having known about annotation characters previously, yet, due to this thread now having read the published rules in Chapter 13 that these are not noncharacters. It appears to me that the use of the annotation characters in document interchange is never forbidden and is strongly discouraged only where there is no prior agreement between the sender and the receiver, and that that strong discouragement is because the content may be misinterpreted otherwise. So, if there is a prior agreement, then there is no problem about using them in interchanged documents. There appears to be nothing that suggests that U+FFFC cannot be used in an interchanged document. I know little about Bliss symbols, though I have seen a few of them and have read a brief introduction to them, yet it seems to me that annotating Bliss symbols with English or Swedish is entirely within the specification absolutely and would be no more than strongly discouraged even if there is no prior agreement between the sender and the receiver. Further, it seems to me from the published rules that these annotation characters could possibly be used to provide a footnote annotation facility within a plain text file, so that, if a plain text file is being printed out in book format, then a footnote about a word or phrase could be encoded using this technique so that the rendering software could place the footnote on the same page as the word or phrase which is being annotated, regardless of whether that word or phrase occurs near the start, middle or end of that page. It seems to me that the statement of the meaning of U+FFFA means that Figure 13-3 of the specification are just examples, though as the word exact is used, perhaps they are guiding examples and the use in footnotes is perhaps stretching the variation from the examples in the diagram. An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents. U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB That is, the annotated text is an object replacement character and the annotation is a caption for a graphic. It seems to me that if that is indeed permissible that it could potentially be a useful facility. On balance, it seems to me that if both sender and receiver are clear as to what is meant, then the use of annotation characters for Bliss symbols and for footnotes and for captions for illustrations harms no one, for a person skilled in the art seeking to use the file without knowledge of the interpretation agreement which should ideally exist between sender and receiver and who has only the Unicode specification to go on would probably be unlikely to get a wrong interpretation of the intended meaning, even if the actual graphical layout were imprecise, as the Unicode standard locks together the two parts of the annotation sequence and shows that one of the parts is the annotation for the other part. William Overington 15 August 2002 .
Re: Tildes on vowels
or defining markup for some of these solutions, instead of the PUA. By analogy, include some other tools in your repertoire, so that everything does not look like a code point ready to be hammered. [snipped] I hope that helps. I hope the message does not read as being harsh. Not at all. I intended to just be explanatory. Yes. My attempt to be concise and specific, gives this a more pointed tone then I intend I suspect, but please believe it is not intended. I am happy that that is your intention. I did not read it otherwise. I very much believe that people can have an academic debate without personalities being an issue. When people raise personality issues, they are just using power to get a win, without answering the underlying questions which still continue to exist, even if their asking has been made er, taboo! :-) William Overington 13 August 2002
Re: Tildes on vowels
Stefan Persson commented on my suggestion as follows. Well, why not go ahead and decide on two code points within the Private Use Area as values for and XXXY, post them in this list and perhaps that action will lead to that facility becoming available as a facility to document transcribers all around the world. There have been several messages sent to this list about why this would be inappropriate. Just read the answers to some of your recent discussions, and you'll understand what I mean. Stefan I wonder if you could please state exactly what you mean as I do not understand what is the point which you are trying to make. As far as I am aware, the particular set of circumstances relating to this particular topic have not been discussed previously. William Overington 10 August 2002
Re: Tildes on vowels
David Possin wrote as follows. quote In German it was common to use a macron over m and n to show mm and nn, I saw it being written this way up to the 1970's. But I never saw it used for any other double letters. Dave end quote There is a very interesting document entitled The Gutenberg Press available as a file named gbpmanual.pdf from the Walden Font website. The website address is as follows. http://www.waldenfont.com The address for the file is as follows. http://www.waldenfont.com/public/gbpmanual.pdf On page 14 are some special characters, ligatures and abbreviations, as used by Gutenberg. Searching through the table is great fun so I will only mention here the first entry in the table which shows a letter a with a horizontal line over the top which is stated as am, an in the pdf file. The Walden Font website also has some sample fonts showing some of the characters in each font. With the Gutenberg sample some of the special characters with a horizontal line over the top are in the sample. I managed to find them using the Insert Symbol facility of Word 97 on a Windows 98 platform. I have also experimented using WordPad on a Windows 98 platform and found that I could get one of the characters by using Alt+0200. I also managed to get that same character into WordPad on an older Windows 95 PC. I have not referred to the line over the top as a macron as I am not sure whether it is a macron. I say not sure because I am learning and am not sure in that context, not in any way because I am expressing a learned opinion on the matter or anything like that. The document refers to Gutenberg having 290 characters in his typeset. However, the Walden Font font seems not to have that many characters, so perhaps someone might like to say something about Gutenberg's character set please. An email correspondent recently informed me that Gutenberg used a qv ligature. Does anyone know please of what ligatures and abbreviations were used by Gutenberg, if any, which are not in Walden Font font please? I recently saw a television programme in the United Kingdom about Gutenberg not having used a reusable matrix for typecasting but having to make a new matrix for each casting, without the benefit of having a punch to make the matrix. This was discovered by really high magnification of characters in some of Gutenberg's printing. It appears that the type was reused on different pages but that no two versions of the same letter on any given page were congruently identical. William Overington 9 August 2002
Re: Tildes on vowels
character and to understand an indication of the presence of any regular Unicode character superscripted in the original document one would only need to have a Unicode font augmented with two arrow glyphs in the appropriate code points. Well, why not go ahead and decide on two code points within the Private Use Area as values for and XXXY, post them in this list and perhaps that action will lead to that facility becoming available as a facility to document transcribers all around the world. If the code points were published in this manner, maybe a font and maybe a UniPad soft keypad which use those code points will become available in time, and so researchers transcribing documents in libraries around the world would have a lasting enhancement of the facilities available to them. This method would not produce a visually correct display, yet in order to convey meaning in a research environment, this method could help in getting the transcribing done and thus would be a valuable addition to the facilities available. William Overington 9 August 2002
Re: Digraphs as Distinct Logical Units
obligated to use the golden ligatures collection code points for the direct access route. Also, the golden ligatures collection does not provide code points for all of the ligatures that might be needed by a font designer, however, if anyone does want code points for some other ligatures then I will be interested to try to add them into the golden ligatures collection upon request. In the mail list archive at http://www.unicode.org there are various discussions about ligatures. Recently there was some discussion about the golden ligatures collection and about a rather fun occurrence which is archived as The Respectfully Experiment. William Overington 3 August 2002 http://www.users.globalnet.co.uk/~ngo
Re: Subscript Superscript
Some time ago in this list, Mr Bernard Miller posted a note about his Bytext system. If one goes to http://www.bytext.org and then goes through to the documentation page at http://www.bytext.org/documentation.htm one may download a copy of the latest edition of The Bytext Standard. I chose to download the pdf file which is 606 kilobytes. On pages 34 and 35 of that document are details of arrow parentheses invented by Mr Miller. On page 72 is a statement concerning intellectual property rights. I feel that it would be very useful if these eight arrow parenthesis characters are used in a Unicode compatible environment. As some readers may know I have been researching on my courtyard codes system. http://www.users.globalnet.co.uk/~ngo/court000.htm Courtyard codes are placed within the Private Use Area of Unicode. The above document being indexed from an index page about some of my other uses of the Private Use Area. http://www.users.globalnet.co.uk/~ngo/golden.htm It occurs to me that if the eight arrow parenthesis characters were encoded into my courtyard codes system, then that would be potentially of great usefulness. I am thinking in terms of U+F388 through to U+F38F being used for this purpose, with the codes being assigned to the arrow parentheses in the order in which Mr Miller lists them in The Bytext Standard. If this happens then the way to express a subscript uppercase A character would be as follows. U+F38A U+0041 U+F38B The U+0041 is the code for A in regular Unicode, so immediately there is a general method for subscripting any Unicode character. Indeed subscripts of subscripts could be used by nesting the arrow parentheses. For example, a subscript A subscript B could be expressed as follows. U+0061 U+F38A U+0041 U+F38A U+0042 U+F38B U+F38B The U+0061 is the code for a in regular Unicode and the U+0042 is the code for B in regular Unicode. Arrow parentheses allow a mathematical expression involving superscripts, subscripts, integral limits, summation limits and various other items to be expressed in a linear manner, which makes those expressions able to be stored in a Unicode file in what is essentially a plain text storage format, though I mention that this will not be plain text as such as it involves the use of code points for what might be considered markup. I know little about XML so I do not know whether this suggestion will be a suitable solution for the requirement of the person who wrote to the Unicode Consortium. However, perhaps it will be a helpful suggestion. Certainly using the codings which I suggest would involve use of code points from the Private Use Area. However, as the need is now, then even if the arrow parenthesis characters are one day promoted to regular Unicode, the use of Private Use Area characters now may be what is needed to achieve the desired result. By placing these code point ideas into this posting to the Unicode mail list, they will be archived in the archives of the Unicode mail list and also sent to many people interested in Unicode around the world. So, although they are only Private Use Area encodings, it is possible that these encodings will be noted in many places by many people. It is simply speculation as to whether few or many people will choose to recognize such code point allocations for their own uses. The use of these code points would raise the question as to how a string containing them should be displayed. The idea is that in a plain text editor mode, the arrow parenthesis characters would be displayed with the glyphs shown by Mr Miller in The Bytext Standard. In a graphical display, the arrow parenthesis characters would not be displayed, yet would influence how characters included between matching pairs of arrow parenthesis characters are displayed. This is no more complicated in principle than viewing an HTML page in Internet Explorer then viewing the source code of the HTML page in Notepad then going back to the Internet Explorer display. Whether any font makers would add glyphs for the eight arrow parenthesis characters into the code positions U+F388 though to U+F38F remains to be seen, though I am cautiously optimistic in the matter. Also the possibility exists for the person who originally wrote to the Unicode Consortium to have his or her own font produced in addition to any font maker making such a font available. William Overington 31 July 2002 -Original Message- From: Magda Danish (Unicode) [EMAIL PROTECTED] To: unicode [EMAIL PROTECTED] Date: Tuesday, July 30, 2002 8:46 PM Subject: Subscript Superscript -Original Message- Date/Time:Tue Jul 30 12:26:40 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: FAQ Suggestion We need to know how to express a Subscript letter in Unicode. On your site, we've found in 2070-208E how to express a Superscript letter or number or a Subscript number, but there is no information about how to write a Subscript letter. We're
Teletext
In the United Kingdom there is a widely used information system known as teletext. It is also used in many other countries. Teletext is a digital technology used in conjunction with analogue television systems. Digital information is inserted in several of the otherwise unused lines of the television signal within what is known as the vertical blanking interval of the television picture. In the United Kingdom the government is to eventually switch off all analogue television broadcasts, as part of the already started process of migration to digital television technology. Thus teletext in its present form will finish. There are digital television text and graphics displaying systems which may continue the teletext name, yet the original teletext display format is likely to go. Teletext started in the early 1970s and the currently implemented specification essentially dates from 1976, (with the exception of the later fast text linking system). The government is thinking in terms of turning off the analogue transmissions sometime between 2006 and 2010. I am thinking that it would be a good idea to encode the archive copies of teletext pages that exist into a Unicode compatible format for the future. Teletext has been around for about a quarter of a century in more or less its present form and within another quarter of a century that form might well be gone completely. I have looked in the Unicode mail list archive and found various items about encoding teletext pages using existing Unicode characters. I am here suggesting a different approach, a teletext archiving approach. I suggest that, in a discussion within this mailing list, a Private Use Area encoding for archiving teletext pages is agreed, with a view that eventually it will be put as a proposal for promotion to regular Unicode, probably into one of the higher planes. The reason for this approach is that it will permit teletext pages to be encoded in a plain text file within a document which discusses the technology. The teletext characters need to be implemented with the same width as each other, whereas characters in a discussion document need to be displayable with possibly different widths one from another. I suggest, as a starting point for a discussion the following. U+E200 through to U+E27F for the United Kingdom teletext character set 0x00 to 0x7F. U+E280 through to U+E2FF to be used to define teletext characters defined in other countries where those characters are not the same as in the United Kingdom character set. This means all of the German accented characters and so on. The notes for each encoding to include details of the location within the 0x00 to 0x7F range where that character was originally encoded and in which country or countries it was so encoded. All teletext pages could then be encoded using the above characters. In addition, the following could be used. Where a character is to be displayed in contiguous graphics mode, and is a graphic, not a capital letter push through, the character may be represented using U+E320 to U+E33F and U+E360 and U+E37F. Where a character is to be displayed in separated graphics mode, and is a graphic, not a capital letter push through, the character may be represented using U+E3A0 to U+E3BF and U+E3E0 and U+E3FF. This will enable a good idea of the look of a teletext page to be displayed using an ordinary TrueType font in a wordprocessing document. Naturally there is also scope for special teletext displaying programs to be produced so that graphics with different combinations of foreground and background colours can be displayed properly. I feel that this encoding will be useful as a stepping stone to a permanent regular Unicode encoding of teletext characters for archiving purposes. Hopefully this initiative will encourage people to get out any old 5 1/4 inch floppy discs that they may have and transfer any teletext pages saved upon them into an archived form. Readers interested in teletext might like to have a look at the following. http://teletext.mb21.co.uk I am hopeful that by having a specific encoding within Unicode for teletext that the archives of teletext pages that exist will be conserved for posterity and that an important aspect of social history will be preserved for the future. Does anyone know if the early graphic art from Oracle (Oracle being the name of the then ITV teletext service as well as of the technology, being an acronym for Optional Reception of Announcements by Coded Line Electronics) in the mid 1970s has survived? Also, does anyone archive Viewdata pages? Viewdata was not a broadcasting technology but provided pages with a compatible display format to teletext which pages could be accessed over a telephone line connection. William Overington 31 July 2002
Re: Chromatic text, ligatures and Fraktur ligatures.
Doug Ewell wrote as follows. Nobody with the intelligence of a tree could possibly read the character-glyph document and come away with the impression that font styles, sizes, colors, etc. are central to the notion of what belongs in character encoding. Intelligence is clearly not the problem here. Actually, I did not write that. What I wrote was as follows. quote of what I previously wrote Courtyard codes and codes for chromatic fonts, in my opinion, fall within the definition of character in Annex B of that document. This is not me finding some definition tucked away obscurely, it is central. The introduction section of the document states as follows. quote This Technical Report is written for a reader who is familiar with the work of SC 2 and SC 18. Readers without this background should first read Annex B, Characters and Annex C, Glyphs. end quote end quote of what I previously wrote I have been referred to the ISO/IEC TR 15285 document about characters and glyphs and yet no one seems willing to discuss the definition of character that is clearly stated in that document, people just keep saying that markup exists, as if the very existence of XML in some way precludes single code point colour codes and single code point formatting codes and so on. The quote of what I previously wrote is saying that I have not found that definition tucked away obscurely, that definition is central to the ISO/IEC TR 15285 document. That is, I am not trying to push my ideas for colour codes through some obscure legal and technical loophole, I am saying that they are entirely consistent with the definition of character in the ISO/IEC TR 15285 document, where that definition is central to the ISO/IEC TR 15285 document. As you have already made your decision about my research and indeed about me, then I am not going to try to convince you otherwise and this posting is not intended to do so. I am merely answering a specific accusation as to my ideas and my personality. Unfortunately various responses to my research have been on other than the scientific aspects of my research and unfortunately in human society that type of response outweighs intellectual discussions on the facts, such as the specific fact of the definition of character in the ISO/IEC TR 15285 document which no one responding to my posts seems willing to discuss. I feel that if the definition of character in the ISO/IEC TR 15285 document is considered, with the meanings of the words in that definition considered, then scientific progress can be made. If people are simply going to question my motives and my personality and not discuss the definition of character in the ISO/IEC TR 15285 document, then that is just an example of the way that human society unfortunately works, in that scientific ideas can be dismissed without explanation by bringing in a questioning of the personality of the person suggesting them. William Overington 9 July 2002