Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-24 Thread William_J_G Overington

WJGO  It does not seem axiomatic that accented characters for Esperanto would 
necessarily be included in a digital encoding of the accented characters needed 
for the languages of Europe.

DS  Where does languages of Europe come from? 

It seems to me that an alternative scenario could quite easily, and possibly 
more probably, have been what had happened, namely that a list of the countries 
of Europe had been made and then starting from that list, the main language of 
each country in that list then be added into a list of languages to be 
supported, with Esperanto not even having been thought about. Also, it could 
have been that if Esperanto had been suggested that the idea could have been 
dismissed as Esperanto were not the language of a country or dismissed for some 
negative opinion about Esperanto or some other purported reason.

It seems axiomatic that the accented characters for French and German would be 
included, yet not axiomatic that the accented characters for Esperanto 
included. So I wondered how they came to become encoded.

Back in the 1960s I saw a list of the accented characters needed to typeset in 
various European languages. It was in the Riscatype catalogue of metal type. 
Esperanto was included in that list. Is it possible that that list was used 
years later in deciding which accented characters to include in an electronic 
coding?

I remember that in the early 1970s two researchers were trying to translate 
what they thought was a paper in Spanish and having great problems. I glanced 
at the text and pointed out that it was Portuguese. Asked if I spoke Portuguese 
I replied that I did not, but that, being interested in printing, I knew that 
the a tilde character was used in Portuguese and not in Spanish: so the 
Riscatype list was helpful to them.

DS  Latin Extended-A is not designed to exclusively cover Europe, and both ISO 
8859-3 and Extended-A cover Turkish.

Well, part of Turkey is in Europe.

DS  The largest Esperanto libraries have about 25,000 books, and there's a 
large collection of people wanting to use Esperanto on the Internet;  ... 

Fine.

DS  ... moreover, the encoding decision is trivial, being a simple and 
uncontested set of twelve codepoints.

Well, the decision was not necessarily trivial nor uncontested: that is now a 
part of history and maybe some documentation will be found to describe what was 
the situation at the time.

DS  Of all the Latin script characters not encoded in Unicode 1.1, I doubt any 
of them have 1% the use of the Esperanto characters. Not encoding them upfront 
would have been silly.

I have been interested in Esperanto since the 1960s when I found an Esperanto 
dictionary in an antiquarian bookshop. I had not previously known of Esperanto. 
I asked the bookshop owner about this language and he explained and I bought 
the dictionary and a copy of the English version of the book The Life of 
Zamenhof, by Edmond Privat. Soon after I bought a copy of Teach Yourself 
Esperanto and some years later in the early 1970s I bought the Teach Yourself 
Esperanto Dictionary. In the late 1990s I gained two certificates in Esperanto, 
namely for Elementary and Intermediate levels.

More recently I have written a song in Esperanto and I am hoping to record it 
and place it on the web so that it will become archived by the British Library. 
The song lyrics use g circumflex many times and s circumflex a few times and I 
was thinking that it is good that the characters are available in Unicode.

DS  Kie ekzistas vivo, ekzistas espero.

Dankon.

For the benefit of readers who do not know any Esperanto, I translate to 
English what David wrote

Where there exists life, there exists hope.

thus

Where there is life, there is hope.

and the translation into English of my reply is

Thanks.

I am also trying to draft a petition to send to the Unicode Technical Committee 
about encoding some localizable sentences with their symbols in plane 13 and 
building localizable sentence technology as a part of Unicode for the future.

As part of the introduction I am seeking to compare and contrast Esperanto and 
localizable sentence technology.

Both are intended to assist communication through the language barrier. Neither 
is intended to replace natural languages. 

Esperanto can be used to construct a sentence for any meaning. Yet localizable 
sentences are for a finite set of sentences.

Esperanto does need to be learned as a language before it can be used, quicker 
and simpler than learning French or German, yet still taking quite a lot of 
study. Localizable sentences could be used easily, just by learning how to use 
a cascading menu system with category headings and sentences localized into 
one's own native language: there is the capability to include names, not 
localizable, within a stream of localizable sentences and escape  mechanisms 
for adding unlocalizable items in Esperanto or in a natural language. 

Before encoding as electronic characters, 

Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Asmus Freytag

On 3/23/2015 10:44 AM, Ken Whistler wrote:

And the question, instead, then becomes tracking down through
the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*),
why the participants who drafted 8859-3 felt it was important
to include the Esperanto letters in the repertoire for the South
European set back in 1986. That date, by the way, is earlier than
anything I have firsthand records for. 


ECMA was actively involved in developing these sets and published them 
as parallel standard (ECMA-94, second edition, 1986).


http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf

ECMA had a very close working relation with ISO, some of that history 
can be tracked down in snippets on the web; but sadly, most of the 
active participant in developing the early editions  of the 8859  series 
would have passed away by now.


A./

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread William_J_G Overington
Origin of the digital encoding of accented characters for Esperanto
Twelve accented characters (uppercase versions and lowercase versions of six 
accented letters) used for Esperanto are encoded in Unicode.
These may well be in Unicode as legacy encoded characters from one or more 
earlier standards.
Does anyone know please how Esperanto characters first became encoded digitally?
For example, was it that someone who was interested in Esperanto happened to be 
a member of a committee that was working on encoding accented characters?
Or did one or more people, or a group of people, or an Esperanto society, lobby 
for the characters to become included?
Or what?
It does not seem axiomatic that accented characters for Esperanto would 
necessarily be included in a digital encoding of the accented characters needed 
for the languages of Europe.
William Overington
23 March 2015
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Leo Broukhis
Ken,

zgrep U011D /usr/share/i18n/charmaps/*
ANSI_X3.110-1983.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH
CIRCUMFLEX
EUC-JISX0213.gz:U011D /xaa/xe0 LATIN SMALL LETTER G WITH
CIRCUMFLEX
EUC-JP.gz:U011D /x8f/xab/xba LATIN SMALL LETTER G WITH CIRCUMFLEX
EUC-JP-MS.gz:U011D /x8f/xab/xba LATIN SMALL LETTER G WITH
CIRCUMFLEX
EUC-TW.gz:U0002011D /x8e/xa7/xac/xbc CJK
GB18030.gz:U011D /x81/x30/x8e/x34 LATIN SMALL LETTER G WITH CIRCUMFLEX
IBM905.gz:U011D /x9b LATIN SMALL LETTER G WITH CIRCUMFLEX
ISO_6937-2-ADD.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH
CIRCUMFLEX
ISO_6937.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX
ISO-8859-3.gz:U011D /xf8 LATIN SMALL LETTER G WITH CIRCUMFLEX
ISO_8859-SUPP.gz:U011D /xb8 LATIN SMALL LETTER G WITH
CIRCUMFLEX
ISO-IR-90.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX
SHIFT_JISX0213.gz:U011D /x85/xde LATIN SMALL LETTER G WITH
CIRCUMFLEX
T.101-G2.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX
T.61-8BIT.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH CIRCUMFLEX
UTF-8.gz:U011D /xc4/x9d LATIN SMALL LETTER G WITH CIRCUMFLEX
VIDEOTEX-SUPPL.gz:U011D /xc3/x67 LATIN SMALL LETTER G WITH
CIRCUMFLEX

How come this character is in ISO-8859-3? IBM905?

Leo


On Mon, Mar 23, 2015 at 9:58 AM, Ken Whistler kenwhist...@att.net wrote:



 On 3/23/2015 8:35 AM, William_J_G Overington wrote:

 Origin of the digital encoding of accented characters for Esperanto

 Twelve accented characters (uppercase versions and lowercase versions of
 six accented letters) used for Esperanto are encoded in Unicode.


 WJO is referring to U+0109, U+011D, U+0125, U+0135, U+015D, U+016D (and
 their uppercase pairs).


 These may well be in Unicode as legacy encoded characters from one or
 more earlier standards.


 No.


 Does anyone know please how Esperanto characters first became encoded
 digitally?


 In the Unicode Standard, the fact that these all occur in the Latin
 Extended-A block is
 a clue. The Latin Extended-A block dates back to Unicode 1.0. You can
 easily verify
 that by referring to the archival record. See:

 http://www.unicode.org/versions/Unicode1.0.0/

 And in fact, the exact set in the Latin Extended-A block can be traced
 even further
 back than the publication of Unicode 1.0 in 1991. That same repertoire was
 included
 in the charts distributed for public review in the Unicode 1.0 final
 review draft
 in December, 1990. So we know that the inclusion of the 12 accented
 characters
 for Esperanto in that set dates back at least that far -- which should
 eliminate a
 lot of fruitless alternative speculative theories about their origins in
 Unicode.


 For example, was it that someone who was interested in Esperanto happened
 to be a member of a committee that was working on encoding accented
 characters?


 Well, sort of. See further explanation below.


 Or did one or more people, or a group of people, or an Esperanto society,
 lobby for the characters to become included?


 No.


 Or what?


 Well, the answer is sort of or what. The repertoire of accented
 characters included in the
 Latin Extended-A block for the final review draft of Unicode 1.0 in
 December, 1990
 was largely culled from the even earlier list of Latin letters proposed for
 encoding in the 2nd DP (Draft Proposal) for ISO/IEC 10646-1. Their
 inclusion in
 the Unicode Standard 1.0 repertoire was one of the early compatibility
 decisions,
 to ensure that repertoire that national bodies had thought important
 enough to
 be included in the early 10646 balloting was accounted for in some way in
 the first Unicode Standard draft.

 The list of accented Latin letters in the Latin Extended-A block consisted
 of the
 union of all of the then-extant ISO 8859 8-bit standard repertoire for
 various
 Latin alphabets, *plus* the additional letters culled from the 2nd DP
 10646-1.

 For the record, the 2nd DP 10646 was JTC1/SC2 N2066 (=WG2 N551), dated
 December 1, 1989. In that era, documents were only distributed by paper,
 and I don't know of an extant online copy, so it is rather difficult to
 track down!

 speculation
 In any event, in that document from 1989, I consider it likely that the
 person
 who probably originally assembled the lists of various European language
 alphabets and
 included them in the drafts for balloting was Hugh McGregor Ross, the
 then British editor of 10646 and a person with a passion for details about
 lesser-used writing systems. Mr. Ross is, unfortunately, recently deceased,
 so we cannot ask him directly. But I suspect that examination of the
 early drafts of 10646 and papers related to it would confirm this
 speculation
 on my part.
 /speculation

 --Ken


 It does not seem axiomatic that accented characters for Esperanto would
 necessarily be included in a digital encoding of the accented characters
 needed for the languages of Europe.

 William Overington


 

Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Ken Whistler



On 3/23/2015 8:35 AM, William_J_G Overington wrote:

Origin of the digital encoding of accented characters for Esperanto

Twelve accented characters (uppercase versions and lowercase versions 
of six accented letters) used for Esperanto are encoded in Unicode.


WJO is referring to U+0109, U+011D, U+0125, U+0135, U+015D, U+016D (and
their uppercase pairs).



These may well be in Unicode as legacy encoded characters from one or 
more earlier standards.


No.



Does anyone know please how Esperanto characters first became encoded 
digitally?


In the Unicode Standard, the fact that these all occur in the Latin 
Extended-A block is
a clue. The Latin Extended-A block dates back to Unicode 1.0. You can 
easily verify

that by referring to the archival record. See:

http://www.unicode.org/versions/Unicode1.0.0/

And in fact, the exact set in the Latin Extended-A block can be traced 
even further
back than the publication of Unicode 1.0 in 1991. That same repertoire 
was included
in the charts distributed for public review in the Unicode 1.0 final 
review draft
in December, 1990. So we know that the inclusion of the 12 accented 
characters
for Esperanto in that set dates back at least that far -- which should 
eliminate a
lot of fruitless alternative speculative theories about their origins in 
Unicode.




For example, was it that someone who was interested in Esperanto 
happened to be a member of a committee that was working on encoding 
accented characters?


Well, sort of. See further explanation below.



Or did one or more people, or a group of people, or an Esperanto 
society, lobby for the characters to become included?


No.



Or what?


Well, the answer is sort of or what. The repertoire of accented 
characters included in the
Latin Extended-A block for the final review draft of Unicode 1.0 in 
December, 1990

was largely culled from the even earlier list of Latin letters proposed for
encoding in the 2nd DP (Draft Proposal) for ISO/IEC 10646-1. Their 
inclusion in
the Unicode Standard 1.0 repertoire was one of the early compatibility 
decisions,
to ensure that repertoire that national bodies had thought important 
enough to

be included in the early 10646 balloting was accounted for in some way in
the first Unicode Standard draft.

The list of accented Latin letters in the Latin Extended-A block 
consisted of the
union of all of the then-extant ISO 8859 8-bit standard repertoire for 
various
Latin alphabets, *plus* the additional letters culled from the 2nd DP 
10646-1.


For the record, the 2nd DP 10646 was JTC1/SC2 N2066 (=WG2 N551), dated
December 1, 1989. In that era, documents were only distributed by paper,
and I don't know of an extant online copy, so it is rather difficult to 
track down!


speculation
In any event, in that document from 1989, I consider it likely that the 
person
who probably originally assembled the lists of various European language 
alphabets and

included them in the drafts for balloting was Hugh McGregor Ross, the
then British editor of 10646 and a person with a passion for details about
lesser-used writing systems. Mr. Ross is, unfortunately, recently deceased,
so we cannot ask him directly. But I suspect that examination of the
early drafts of 10646 and papers related to it would confirm this 
speculation

on my part.
/speculation

--Ken



It does not seem axiomatic that accented characters for Esperanto 
would necessarily be included in a digital encoding of the accented 
characters needed for the languages of Europe.


William Overington


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Richard Wordingham
On Mon, 23 Mar 2015 10:44:10 -0700
Ken Whistler kenwhist...@att.net wrote:

 And the question, instead, then becomes tracking down through
 the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*),
 why the participants who drafted 8859-3 felt it was important
 to include the Esperanto letters in the repertoire for the South
 European set back in 1986. That date, by the way, is earlier than
 anything I have firsthand records for.

Perhaps its more an odds and sods collection.

Esperanto was once significant, so one should not be surprised that it
should be supported.

Richard. 
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Tom Gewecke

On 23 mars 2015, at 08:35, William_J_G Overington wrote:

 Origin of the digital encoding of accented characters for Esperanto
 
 These may well be in Unicode as legacy encoded characters from one or more 
 earlier standards.

ISO 6937 of 1983 seems to have been designed to support them.

http://en.wikipedia.org/wiki/ISO/IEC_6937
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Ken Whistler

For ISO 8859-3, the answer is in the wiki:

http://en.wikipedia.org/wiki/ISO/IEC_8859-3

It was designed to cover Turkish, Maltese and Esperanto, ...

The answer for IBM CP905 is simple -- it is simply the EBCDIC
code page of June, 1986 that corresponded to ISO 8859-3.
That also covers the answer for ISO-IR 109, which is simply
the registration of the right-hand part of Latin-3.

At any rate, since I didn't check first whether the Esperanto
letters were in ISO 8859-3 before I wrote my initial response,
this would certainly remove all proximate speculation about
the occurrence of the accented letters for Esperanto in
the Unicode 1.0 repertoire in Latin Extended-A. They were
included by the exercise of doing the union of all the
8859 Latin alphabets.

So the answer for Unicode is, instead, *yes*, they were in
a pre-existing standard that was grandfathered in to the
initial collection of accented Latin letters.

And the question, instead, then becomes tracking down through
the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*),
why the participants who drafted 8859-3 felt it was important
to include the Esperanto letters in the repertoire for the South
European set back in 1986. That date, by the way, is earlier than
anything I have firsthand records for.

--Ken


On 3/23/2015 10:10 AM, Leo Broukhis wrote:


How come this character is in ISO-8859-3? IBM905?

Leo




___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Doug Ewell
Ken wrote:

 The list of accented Latin letters in the Latin Extended-A block
 consisted of the union of all of the then-extant ISO 8859 8-bit
 standard repertoire for various Latin alphabets, *plus* the additional
 letters culled from the 2nd DP 10646-1.

The Esperanto letters can be found in ECMA-94, 2nd Edition (June 1986),
pp. 17-21 (pp. 33-37 in the PDF), which is equivalent to ISO 8859-3.

http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf

--
Doug Ewell | http://ewellic.org | Thornton, CO 


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread Leo Broukhis
 So the answer for Unicode is, instead, *yes*, they were in
 a pre-existing standard that was grandfathered in to the
 initial collection of accented Latin letters.

That's what I was hinting at. :)

Leo



On Mon, Mar 23, 2015 at 10:44 AM, Ken Whistler kenwhist...@att.net wrote:

 For ISO 8859-3, the answer is in the wiki:

 http://en.wikipedia.org/wiki/ISO/IEC_8859-3

 It was designed to cover Turkish, Maltese and Esperanto, ...

 The answer for IBM CP905 is simple -- it is simply the EBCDIC
 code page of June, 1986 that corresponded to ISO 8859-3.
 That also covers the answer for ISO-IR 109, which is simply
 the registration of the right-hand part of Latin-3.

 At any rate, since I didn't check first whether the Esperanto
 letters were in ISO 8859-3 before I wrote my initial response,
 this would certainly remove all proximate speculation about
 the occurrence of the accented letters for Esperanto in
 the Unicode 1.0 repertoire in Latin Extended-A. They were
 included by the exercise of doing the union of all the
 8859 Latin alphabets.

 So the answer for Unicode is, instead, *yes*, they were in
 a pre-existing standard that was grandfathered in to the
 initial collection of accented Latin letters.

 And the question, instead, then becomes tracking down through
 the ancient history of JTC1/SC2/WG3 (-- Note *3*, not *2*),
 why the participants who drafted 8859-3 felt it was important
 to include the Esperanto letters in the repertoire for the South
 European set back in 1986. That date, by the way, is earlier than
 anything I have firsthand records for.

 --Ken



 On 3/23/2015 10:10 AM, Leo Broukhis wrote:


 How come this character is in ISO-8859-3? IBM905?

 Leo




___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of the digital encoding of accented characters for Esperanto

2015-03-23 Thread David Starner
On Mon, Mar 23, 2015 at 8:35 AM, William_J_G Overington
wjgo_10...@btinternet.com wrote:
 It does not seem axiomatic that accented characters for Esperanto would
 necessarily be included in a digital encoding of the accented characters
 needed for the languages of Europe.

Where does languages of Europe come from? Latin Extended-A is not
designed to exclusively cover Europe, and both ISO 8859-3 and
Extended-A cover Turkish. The largest Esperanto libraries have about
25,000 books, and there's a large collection of people wanting to use
Esperanto on the Internet; moreover, the encoding decision is trivial,
being a simple and uncontested set of twelve codepoints.

Of all the Latin script characters not encoded in Unicode 1.1, I doubt
any of them have 1% the use of the Esperanto characters. Not encoding
them upfront would have been silly.

-- 
Kie ekzistas vivo, ekzistas espero.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode