Re: Origin of the digital encoding of accented characters for Esperanto
WJGO It does not seem axiomatic that accented characters for Esperanto would necessarily be included in a digital encoding of the accented characters needed for the languages of Europe. DS Where does languages of Europe come from? It seems to me that an alternative scenario could quite easily, and possibly more probably, have been what had happened, namely that a list of the countries of Europe had been made and then starting from that list, the main language of each country in that list then be added into a list of languages to be supported, with Esperanto not even having been thought about. Also, it could have been that if Esperanto had been suggested that the idea could have been dismissed as Esperanto were not the language of a country or dismissed for some negative opinion about Esperanto or some other purported reason. It seems axiomatic that the accented characters for French and German would be included, yet not axiomatic that the accented characters for Esperanto included. So I wondered how they came to become encoded. Back in the 1960s I saw a list of the accented characters needed to typeset in various European languages. It was in the Riscatype catalogue of metal type. Esperanto was included in that list. Is it possible that that list was used years later in deciding which accented characters to include in an electronic coding? I remember that in the early 1970s two researchers were trying to translate what they thought was a paper in Spanish and having great problems. I glanced at the text and pointed out that it was Portuguese. Asked if I spoke Portuguese I replied that I did not, but that, being interested in printing, I knew that the a tilde character was used in Portuguese and not in Spanish: so the Riscatype list was helpful to them. DS Latin Extended-A is not designed to exclusively cover Europe, and both ISO 8859-3 and Extended-A cover Turkish. Well, part of Turkey is in Europe. DS The largest Esperanto libraries have about 25,000 books, and there's a large collection of people wanting to use Esperanto on the Internet; ... Fine. DS ... moreover, the encoding decision is trivial, being a simple and uncontested set of twelve codepoints. Well, the decision was not necessarily trivial nor uncontested: that is now a part of history and maybe some documentation will be found to describe what was the situation at the time. DS Of all the Latin script characters not encoded in Unicode 1.1, I doubt any of them have 1% the use of the Esperanto characters. Not encoding them upfront would have been silly. I have been interested in Esperanto since the 1960s when I found an Esperanto dictionary in an antiquarian bookshop. I had not previously known of Esperanto. I asked the bookshop owner about this language and he explained and I bought the dictionary and a copy of the English version of the book The Life of Zamenhof, by Edmond Privat. Soon after I bought a copy of Teach Yourself Esperanto and some years later in the early 1970s I bought the Teach Yourself Esperanto Dictionary. In the late 1990s I gained two certificates in Esperanto, namely for Elementary and Intermediate levels. More recently I have written a song in Esperanto and I am hoping to record it and place it on the web so that it will become archived by the British Library. The song lyrics use g circumflex many times and s circumflex a few times and I was thinking that it is good that the characters are available in Unicode. DS Kie ekzistas vivo, ekzistas espero. Dankon. For the benefit of readers who do not know any Esperanto, I translate to English what David wrote Where there exists life, there exists hope. thus Where there is life, there is hope. and the translation into English of my reply is Thanks. I am also trying to draft a petition to send to the Unicode Technical Committee about encoding some localizable sentences with their symbols in plane 13 and building localizable sentence technology as a part of Unicode for the future. As part of the introduction I am seeking to compare and contrast Esperanto and localizable sentence technology. Both are intended to assist communication through the language barrier. Neither is intended to replace natural languages. Esperanto can be used to construct a sentence for any meaning. Yet localizable sentences are for a finite set of sentences. Esperanto does need to be learned as a language before it can be used, quicker and simpler than learning French or German, yet still taking quite a lot of study. Localizable sentences could be used easily, just by learning how to use a cascading menu system with category headings and sentences localized into one's own native language: there is the capability to include names, not localizable, within a stream of localizable sentences and escape mechanisms for adding unlocalizable items in Esperanto or in a natural language. Before encoding as electronic characters,
Re: Origin of the digital encoding of accented characters for Esperanto
On 3/23/2015 10:44 AM, Ken Whistler wrote: And the question, instead, then becomes tracking down through the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*), why the participants who drafted 8859-3 felt it was important to include the Esperanto letters in the repertoire for the South European set back in 1986. That date, by the way, is earlier than anything I have firsthand records for. ECMA was actively involved in developing these sets and published them as parallel standard (ECMA-94, second edition, 1986). http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf ECMA had a very close working relation with ISO, some of that history can be tracked down in snippets on the web; but sadly, most of the active participant in developing the early editions of the 8859 series would have passed away by now. A./ ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Origin of the digital encoding of accented characters for Esperanto
Origin of the digital encoding of accented characters for Esperanto Twelve accented characters (uppercase versions and lowercase versions of six accented letters) used for Esperanto are encoded in Unicode. These may well be in Unicode as legacy encoded characters from one or more earlier standards. Does anyone know please how Esperanto characters first became encoded digitally? For example, was it that someone who was interested in Esperanto happened to be a member of a committee that was working on encoding accented characters? Or did one or more people, or a group of people, or an Esperanto society, lobby for the characters to become included? Or what? It does not seem axiomatic that accented characters for Esperanto would necessarily be included in a digital encoding of the accented characters needed for the languages of Europe. William Overington 23 March 2015 ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
Ken, zgrep U011D /usr/share/i18n/charmaps/* ANSI_X3.110-1983.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX EUC-JISX0213.gz:U011D /xaa/xe0 LATIN SMALL LETTER G WITH CIRCUMFLEX EUC-JP.gz:U011D /x8f/xab/xba LATIN SMALL LETTER G WITH CIRCUMFLEX EUC-JP-MS.gz:U011D /x8f/xab/xba LATIN SMALL LETTER G WITH CIRCUMFLEX EUC-TW.gz:U0002011D /x8e/xa7/xac/xbc CJK GB18030.gz:U011D /x81/x30/x8e/x34 LATIN SMALL LETTER G WITH CIRCUMFLEX IBM905.gz:U011D /x9b LATIN SMALL LETTER G WITH CIRCUMFLEX ISO_6937-2-ADD.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX ISO_6937.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX ISO-8859-3.gz:U011D /xf8 LATIN SMALL LETTER G WITH CIRCUMFLEX ISO_8859-SUPP.gz:U011D /xb8 LATIN SMALL LETTER G WITH CIRCUMFLEX ISO-IR-90.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX SHIFT_JISX0213.gz:U011D /x85/xde LATIN SMALL LETTER G WITH CIRCUMFLEX T.101-G2.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX T.61-8BIT.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX UTF-8.gz:U011D /xc4/x9d LATIN SMALL LETTER G WITH CIRCUMFLEX VIDEOTEX-SUPPL.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX How come this character is in ISO-8859-3? IBM905? Leo On Mon, Mar 23, 2015 at 9:58 AM, Ken Whistler kenwhist...@att.net wrote: On 3/23/2015 8:35 AM, William_J_G Overington wrote: Origin of the digital encoding of accented characters for Esperanto Twelve accented characters (uppercase versions and lowercase versions of six accented letters) used for Esperanto are encoded in Unicode. WJO is referring to U+0109, U+011D, U+0125, U+0135, U+015D, U+016D (and their uppercase pairs). These may well be in Unicode as legacy encoded characters from one or more earlier standards. No. Does anyone know please how Esperanto characters first became encoded digitally? In the Unicode Standard, the fact that these all occur in the Latin Extended-A block is a clue. The Latin Extended-A block dates back to Unicode 1.0. You can easily verify that by referring to the archival record. See: http://www.unicode.org/versions/Unicode1.0.0/ And in fact, the exact set in the Latin Extended-A block can be traced even further back than the publication of Unicode 1.0 in 1991. That same repertoire was included in the charts distributed for public review in the Unicode 1.0 final review draft in December, 1990. So we know that the inclusion of the 12 accented characters for Esperanto in that set dates back at least that far -- which should eliminate a lot of fruitless alternative speculative theories about their origins in Unicode. For example, was it that someone who was interested in Esperanto happened to be a member of a committee that was working on encoding accented characters? Well, sort of. See further explanation below. Or did one or more people, or a group of people, or an Esperanto society, lobby for the characters to become included? No. Or what? Well, the answer is sort of or what. The repertoire of accented characters included in the Latin Extended-A block for the final review draft of Unicode 1.0 in December, 1990 was largely culled from the even earlier list of Latin letters proposed for encoding in the 2nd DP (Draft Proposal) for ISO/IEC 10646-1. Their inclusion in the Unicode Standard 1.0 repertoire was one of the early compatibility decisions, to ensure that repertoire that national bodies had thought important enough to be included in the early 10646 balloting was accounted for in some way in the first Unicode Standard draft. The list of accented Latin letters in the Latin Extended-A block consisted of the union of all of the then-extant ISO 8859 8-bit standard repertoire for various Latin alphabets, *plus* the additional letters culled from the 2nd DP 10646-1. For the record, the 2nd DP 10646 was JTC1/SC2 N2066 (=WG2 N551), dated December 1, 1989. In that era, documents were only distributed by paper, and I don't know of an extant online copy, so it is rather difficult to track down! speculation In any event, in that document from 1989, I consider it likely that the person who probably originally assembled the lists of various European language alphabets and included them in the drafts for balloting was Hugh McGregor Ross, the then British editor of 10646 and a person with a passion for details about lesser-used writing systems. Mr. Ross is, unfortunately, recently deceased, so we cannot ask him directly. But I suspect that examination of the early drafts of 10646 and papers related to it would confirm this speculation on my part. /speculation --Ken It does not seem axiomatic that accented characters for Esperanto would necessarily be included in a digital encoding of the accented characters needed for the languages of Europe. William Overington
Re: Origin of the digital encoding of accented characters for Esperanto
On 3/23/2015 8:35 AM, William_J_G Overington wrote: Origin of the digital encoding of accented characters for Esperanto Twelve accented characters (uppercase versions and lowercase versions of six accented letters) used for Esperanto are encoded in Unicode. WJO is referring to U+0109, U+011D, U+0125, U+0135, U+015D, U+016D (and their uppercase pairs). These may well be in Unicode as legacy encoded characters from one or more earlier standards. No. Does anyone know please how Esperanto characters first became encoded digitally? In the Unicode Standard, the fact that these all occur in the Latin Extended-A block is a clue. The Latin Extended-A block dates back to Unicode 1.0. You can easily verify that by referring to the archival record. See: http://www.unicode.org/versions/Unicode1.0.0/ And in fact, the exact set in the Latin Extended-A block can be traced even further back than the publication of Unicode 1.0 in 1991. That same repertoire was included in the charts distributed for public review in the Unicode 1.0 final review draft in December, 1990. So we know that the inclusion of the 12 accented characters for Esperanto in that set dates back at least that far -- which should eliminate a lot of fruitless alternative speculative theories about their origins in Unicode. For example, was it that someone who was interested in Esperanto happened to be a member of a committee that was working on encoding accented characters? Well, sort of. See further explanation below. Or did one or more people, or a group of people, or an Esperanto society, lobby for the characters to become included? No. Or what? Well, the answer is sort of or what. The repertoire of accented characters included in the Latin Extended-A block for the final review draft of Unicode 1.0 in December, 1990 was largely culled from the even earlier list of Latin letters proposed for encoding in the 2nd DP (Draft Proposal) for ISO/IEC 10646-1. Their inclusion in the Unicode Standard 1.0 repertoire was one of the early compatibility decisions, to ensure that repertoire that national bodies had thought important enough to be included in the early 10646 balloting was accounted for in some way in the first Unicode Standard draft. The list of accented Latin letters in the Latin Extended-A block consisted of the union of all of the then-extant ISO 8859 8-bit standard repertoire for various Latin alphabets, *plus* the additional letters culled from the 2nd DP 10646-1. For the record, the 2nd DP 10646 was JTC1/SC2 N2066 (=WG2 N551), dated December 1, 1989. In that era, documents were only distributed by paper, and I don't know of an extant online copy, so it is rather difficult to track down! speculation In any event, in that document from 1989, I consider it likely that the person who probably originally assembled the lists of various European language alphabets and included them in the drafts for balloting was Hugh McGregor Ross, the then British editor of 10646 and a person with a passion for details about lesser-used writing systems. Mr. Ross is, unfortunately, recently deceased, so we cannot ask him directly. But I suspect that examination of the early drafts of 10646 and papers related to it would confirm this speculation on my part. /speculation --Ken It does not seem axiomatic that accented characters for Esperanto would necessarily be included in a digital encoding of the accented characters needed for the languages of Europe. William Overington ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
On Mon, 23 Mar 2015 10:44:10 -0700 Ken Whistler kenwhist...@att.net wrote: And the question, instead, then becomes tracking down through the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*), why the participants who drafted 8859-3 felt it was important to include the Esperanto letters in the repertoire for the South European set back in 1986. That date, by the way, is earlier than anything I have firsthand records for. Perhaps its more an odds and sods collection. Esperanto was once significant, so one should not be surprised that it should be supported. Richard. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
On 23 mars 2015, at 08:35, William_J_G Overington wrote: Origin of the digital encoding of accented characters for Esperanto These may well be in Unicode as legacy encoded characters from one or more earlier standards. ISO 6937 of 1983 seems to have been designed to support them. http://en.wikipedia.org/wiki/ISO/IEC_6937 ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
For ISO 8859-3, the answer is in the wiki: http://en.wikipedia.org/wiki/ISO/IEC_8859-3 It was designed to cover Turkish, Maltese and Esperanto, ... The answer for IBM CP905 is simple -- it is simply the EBCDIC code page of June, 1986 that corresponded to ISO 8859-3. That also covers the answer for ISO-IR 109, which is simply the registration of the right-hand part of Latin-3. At any rate, since I didn't check first whether the Esperanto letters were in ISO 8859-3 before I wrote my initial response, this would certainly remove all proximate speculation about the occurrence of the accented letters for Esperanto in the Unicode 1.0 repertoire in Latin Extended-A. They were included by the exercise of doing the union of all the 8859 Latin alphabets. So the answer for Unicode is, instead, *yes*, they were in a pre-existing standard that was grandfathered in to the initial collection of accented Latin letters. And the question, instead, then becomes tracking down through the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*), why the participants who drafted 8859-3 felt it was important to include the Esperanto letters in the repertoire for the South European set back in 1986. That date, by the way, is earlier than anything I have firsthand records for. --Ken On 3/23/2015 10:10 AM, Leo Broukhis wrote: How come this character is in ISO-8859-3? IBM905? Leo ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
Ken wrote: The list of accented Latin letters in the Latin Extended-A block consisted of the union of all of the then-extant ISO 8859 8-bit standard repertoire for various Latin alphabets, *plus* the additional letters culled from the 2nd DP 10646-1. The Esperanto letters can be found in ECMA-94, 2nd Edition (June 1986), pp. 17-21 (pp. 33-37 in the PDF), which is equivalent to ISO 8859-3. http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf -- Doug Ewell | http://ewellic.org | Thornton, CO ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
So the answer for Unicode is, instead, *yes*, they were in a pre-existing standard that was grandfathered in to the initial collection of accented Latin letters. That's what I was hinting at. :) Leo On Mon, Mar 23, 2015 at 10:44 AM, Ken Whistler kenwhist...@att.net wrote: For ISO 8859-3, the answer is in the wiki: http://en.wikipedia.org/wiki/ISO/IEC_8859-3 It was designed to cover Turkish, Maltese and Esperanto, ... The answer for IBM CP905 is simple -- it is simply the EBCDIC code page of June, 1986 that corresponded to ISO 8859-3. That also covers the answer for ISO-IR 109, which is simply the registration of the right-hand part of Latin-3. At any rate, since I didn't check first whether the Esperanto letters were in ISO 8859-3 before I wrote my initial response, this would certainly remove all proximate speculation about the occurrence of the accented letters for Esperanto in the Unicode 1.0 repertoire in Latin Extended-A. They were included by the exercise of doing the union of all the 8859 Latin alphabets. So the answer for Unicode is, instead, *yes*, they were in a pre-existing standard that was grandfathered in to the initial collection of accented Latin letters. And the question, instead, then becomes tracking down through the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*), why the participants who drafted 8859-3 felt it was important to include the Esperanto letters in the repertoire for the South European set back in 1986. That date, by the way, is earlier than anything I have firsthand records for. --Ken On 3/23/2015 10:10 AM, Leo Broukhis wrote: How come this character is in ISO-8859-3? IBM905? Leo ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Origin of the digital encoding of accented characters for Esperanto
On Mon, Mar 23, 2015 at 8:35 AM, William_J_G Overington wjgo_10...@btinternet.com wrote: It does not seem axiomatic that accented characters for Esperanto would necessarily be included in a digital encoding of the accented characters needed for the languages of Europe. Where does languages of Europe come from? Latin Extended-A is not designed to exclusively cover Europe, and both ISO 8859-3 and Extended-A cover Turkish. The largest Esperanto libraries have about 25,000 books, and there's a large collection of people wanting to use Esperanto on the Internet; moreover, the encoding decision is trivial, being a simple and uncontested set of twelve codepoints. Of all the Latin script characters not encoded in Unicode 1.1, I doubt any of them have 1% the use of the Esperanto characters. Not encoding them upfront would have been silly. -- Kie ekzistas vivo, ekzistas espero. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode