from:"William Overington"

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-17 Thread William Overington

about the Private Use Area being rewritten for Unicode 4.0.  Is there any
chance of someone posting the Unicode 4.0 text into this discussion please?

It remains to be seen what will be decided as the built-in font for the
European Union implementation of the DVB-MHP specification.  It might be
the minimum font of the DVB-MHP specification or it might be more
comprehensive.

For example, should Greek characters be included?  Should weather symbols
be included?  These and many other issues remain to be decided.

The minimum font for any specification for Europe should be the
MES-2. If you are talking to these people, tell them.

Now, I have never heard of the MES-2 whatever that is.  However, I do not
have deep knowledge of the various standards which exist.  Could you
possibly say some more about MES-2 please.

The minimum character set for DVB-MHP is in Annex E of the DVB-MHP
specification, available from the http://www.mhp.org webspace.  It is all in
one huge pdf file.  I am hoping that the European Union will specify a
rather more comprehensive font.  It may be that a lot will depend on how
much unused space, if any, there is in the read-only memories which are used
for the built-in font.

The Cenelec group to which I refer is the DigitalTV_WG and readers who might
like to join could ask at [EMAIL PROTECTED] as new members have joined at
various times.  Names of members are listed in the internal email facility
of the forum.  Membership is free and there appear to be people from outside
the European Union as well as within it, as what is decided for Europe may
be adopted by countries in other parts of the world.

Readers might also be interested in the related TVforALL_WG forum which
discusses issues of access to broadcasts for people with disabilities.  The
Audio Description, Subtitle, Signing logos issue has been posted in both of
these forums.  Please enquire at [EMAIL PROTECTED] by email if you are
interested.  Membership is also free for that forum.

William Overington

17 July 2003

Re: Combining diacriticals and Cyrillic

2003-07-16 Thread William Overington

Peter Constable wrote as follows.

William Overington wrote on 07/15/2003 07:22:22 AM:

 No, the Private Use Area codes would not be used for interchange, only
 locally for producing an elegant display in such applications as chose to
 use them.  Other applications could ignore their existence.

Then why do you persist in public discussion of suggested codepoints for
such purposes? If it is for local, proprietary use internal to some
implementation, then the only one who needs to know, think or care about
these codepoints is the person creating that implementation.

The original enquiry sought advice about how to proceed.  I posted some
ideas of a possible way to proceed.  If the idea of using a eutocode
typography file is taken up and software which uses it is produced, then it
would be reasonable to have a published list of Private Use Area code points
for the precomposed characters which are to be available, as in that way the
output stream from the processing could be viewed with a number of fonts
from a variety of font makers without needing to change the eutocode
typography file if one changed font.

I have not published many of my suggested code points in this forum
precisely because a few people do not want them published here.  For
example, there is the ViOS-like system for a three-dimensional visual
indexing system for use in interactive broadcasting.

 Publishing a list of Private Use Area code points would

have absolutely no purpose at all.


 mean that such
 display could be produced using a choice of fonts from various font
makers
 using the same software

Now you are talking interchange. Interchange means more than just person A
sends a document to person B. It means that person A's document works with
person B's software using person C's font. (An alternate term that is often
used, interoperate, makes this clearer.)

Exactly.  This is why publishing the list of Private Use Area code point
assignments for the precomposed characters is a good idea.  Person B can
display the document and then wonder if it might look better with that font
made by person D and have a try with that font.  If the list of Private Use
Area code point assignments for the precomposed characters has been
published and both C and D have used the list to add the extra Cyrillic
characters into their fonts, then the published list of Private Use Area
code point assignments for the precomposed characters has helped to achieve
interoperability.

 I feel that an important thing to remember is the dividing line between
what
 is in Unicode and what is in particular advanced format font technology
 solutions

And best practice for advanced format font technologies eschews PUA
codepoints for glyph processing.

Who decides upon what is best practice?

You've been told that several times by
people who have expertise in advanced font technologies, an area in which
you are not deeply knowledgable or experienced, by your own admission.

Well, it is not a matter of an admission as if dragged out of me under
examination by counsel in a courtroom.  I openly stated the limits of my
knowledge in that area, not as a retrospective defence yet as an up-front
expression of the limitation of my knowledge when putting forward ideas,
specifically so as not to produce any incorrect impression as to expertise
in that area.

 yet they are not suitable for platforms such as Windows 95 and
 Windows 98, whereas a eutocode typography file approach would be suitable
 for those platforms and for various other platforms.

Wm, if someone wanted, they could create an advanced font technology to
work on DOS, but why bother? Who's going to create all the new software
that works with that technology, and make it to work within the limitations
of a DOS system?

Yet I am not suggesting a system to work on DOS.

Your idea is at best a mental exercise, and even if you or
someone else built an implementation, what is not needed is some public
agreement on PUA codepoints for use in glyph processing.

When you say agreement I am not suggesting agreement in some formal
manner.  It is more like the authorship of a story where people may read it
or not as they choose.  Yet if people do read the story, or watch a
television or movie implementation of it, a common culture may come to exist
amongst the readers which can be applied in other circumstances.

For example, it's as if on a holodeck and a character says 'arch' and 
is something which people who have watched Star Trek The Next Generation may
use as a cultural way of expressing something.

The original enquiry referred as if a number of people are trying to solve
the problem.  If a list of the characters is published with Private Use Area
code points from U+EF00 upwards, then they could all, if they so choose, use
that set of code points and it might help in font interoperability,
certainly if they choose to implement a eutocode typography file system and
maybe in some other implementations.  I suggested U

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-16 Thread William Overington

Peter Constable wrote as follows.

William Overington wrote on 07/15/2003 05:33:22 AM:

 William, CENELEC is an international standards body. Such bodies either
 create their own standards or use other international standards. They do
 not use PUA codepoints.

 Well, the fact of the matter is that Cenelec is trying to achieve a
 consensus for the implementation of interactive television within the
 European Union

And that does not require PUA codepoints; moreover, your response does not
escape the fact I was pointing out that a standards body will not be
publishing standards that make reference to PUA codepoints.

Please have a look at what Cenelec is do in trying to achieve a consensus
for the implementation of interactive television within the European Union.
This particular project for the European Commission is trying to achieve a
consensus for the implementation of interactive television within the
European Union.  Your comments seem to relate to standards bodies generally
or as to how Cenelec proceeds generally.  This project is a particular
project trying to achieve a consensus for the implementation of interactive
television within the European Union.  The difference is that things need to
move forward promptly.  There are lots of aspects, such as how many buttons
to have on a hand-held infra-red control device for end user interaction
with a running Java program (that is, the _minimum_ twenty of the DVB-MHP
specification, or some more) and such as whether mouse events should be
accessible to end users (as the DVB-MHP specification has mouse event access
as optional in interactive televisions) and so on.

What you write in relation to most projects carried out by standards bodies
may well be true, yet I was writing specifically about one particular
project being run by Cenelec.

 In view of the fact that the interactive television system (DVB-MHP,
Digital
 Video Broadcasting - Multimedia Home Platform http://www.mhp.org ) uses
Java
 and Java uses Unicode it is then a matter of deciding how to be able to
 signal the symbols in a Unicode text stream.

And they won't be standardizing on symbols encoded using PUA codepoints.

The deciding is not about something to incorporate into the DVB-MHP
standard.  It is a matter of trying to gain a consensus as to how to signal
those symbols at the present time and in the near future (that is, until (if
and when) some regular Unicode code points are achieved) within Java
programs which run upon the DVB-MHP platform and in fonts which are used
upon the DVB-MHP platform.  It is essentially a matter for end users of the
system, just as the two Private Use Area characters being suggested in
another thread of this forum in relation to Afghanistan are a matter for end
users of the Unicode Standard and does not affect the content of the Unicode
Standard itself.

 In view of the fact that the process of getting regular Unicode code
points
 for the symbols would take quite a time, and indeed that there is as yet
no
 agreement on which symbols to use, and that the implementation of
 interactive television needs to proceed, it seems to me that putting
forward
 three specific Private Use Area code points for the symbols at this time
is
 helpful to the process.

Then you obviously don't understand the process.

Well, maybe I don't.  However, the fact of the matter is that sooner or
later some code points are needed to signal those symbols.  I have put
forward three suggested code points.  I also mentioned them in this mailing
list.  My specific suggestions are in the Private Use Area and do not clash
with various uses of the Private Use Area known to me.  So three specific
code points have been mentioned and I suggest that having those three code
points published both in the Cenelec forum and here is beneficial as if they
are used then various potential problems which could have arisen if some
other choices (such as three unused code points in regular Unicode or
several different sets of three code points in regular Unicode) were used.

 Such things are *not* useful. They do not achieve consistency, not in
the
 short term, and most certainly not in the long term. If consistency is
 needed, the standardization process is used to established standardized
 representations.

 Well, what is the alternative?

The alternative to agreeing on a standard? None, but why would you need an
alternative?

Code points for the symbols are needed now or in the near future.  The
symbol designs are not yet agreed.  Obtaining regular Unicode points, if
achievable, would take quite a time.  With my suggested code points
published, decisions on which symbol designs to use and getting them into
use with everyone using the same code points could happen within a few days.

 The code points are in the Private Use Area,
 so the suggestion avoids the possibility of a non-conformant use of a
 regular Unicode code point.

That is hardly the concern. Standards are designed to be international
agreements

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-15 Thread William Overington

 which I have
suggested for a chess font with pieces on both white and black squares.
Although I am hoping that my eutocode graphics system will become widely
used in interactive television systems, I accept that it is a specialist
application which may be of great interest to some people and of no interest
to many other people.  Likewise the code points for a chess font,
particularly those for chess variants such as Carrera's Chess.  However, the
symbols for Audio Description, Subtitle, Signing have very widespread use
possibilities and so posting my suggested code point allocations for them
here in a short note seemed, and still seems, reasonable to me.

William Overington

15 July 2003

Re: Combining diacriticals and Cyrillic

2003-07-15 Thread William Overington

Tex Texin wrote as follows.

William,

You understand Unicode well enough by now, to know that this is an abhorent
suggestion.

The word abhorent seems rather strong!  :-)

As the characters can be represented in Unicode by using Cyrillic plus
combining diacriticals, to create a proprietary set of codes in the Private
Use Area would introduce incompatibilities with other applications that
support these characters in the recommended form.

No, the Private Use Area codes would not be used for interchange, only
locally for producing an elegant display in such applications as chose to
use them.  Other applications could ignore their existence.

Publishing a list of Private Use Area code points would mean that such
display could be produced using a choice of fonts from various font makers
using the same software to produce the purely local text stream to produce a
display without locking together the provision of the software and the
provision of the font to the same supplier using an unpublished Private Use
Area encoding.

Following your
recommendation would cause searching, sorting and interchange of Vladimir's
data to fail in applications that properly support these characters.

No, the Private Use Area codes would not be used for interchange, only
locally for producing an elegant display.

And it is
likely difficult to get other applications to buy into supporting a
proprietary solution.

Well, the set of Private Use Area codes and the software algorithm of the
eutocode typography file could be used or not used or even ignored as each
person chooses.

It is easier to address the rendering problem that
Vladimir has than to unravel the mess your suggestion would create. It
isn't even a good recommendation for short term use.

Well, as far as I can tell the eutocode typography file and using the
Private Use Area to hold the glyphs for the precomposed forms used locally
and not for interchange does address the rendering problem which Vladimir
asked about.

The benefit of a eutocode typography file is that if a software application
is produced which uses the information in a eutocode typography file, then,
as the eutocode typography file is a Unicode plain text file, the software
can be customized using a plain text file.  Thus the same software program
could be used for languages of the Indian subcontinent, accented Cyrillic
characters or indeed many other language characters which someone might want
to use, simply by providing a eutocode typography file which includes the
rules to translate from Unicode sequences to Private Use Area code points
for that particular use.

Did I miss something? Why are you recommending the PUA for this use?

Well, did you read this bit?

quote

Software would need to be developed (by you or by other interested people),
yet essentially what is needed is software to take an input document and
process it according to information in a eutocode typography file.  In this
way the Private Use Area codes would not be used for interchanging
information, yet would be used locally so as to produce an elegant display.

end quote

I feel that an important thing to remember is the dividing line between what
is in Unicode and what is in particular advanced format font technology
solutions which some other organizations supply.  Those advanced font format
technologies may be very good, I do not know as I have no experience of
using them, yet they are not suitable for platforms such as Windows 95 and
Windows 98, whereas a eutocode typography file approach would be suitable
for those platforms and for various other platforms.

I am hoping that the eutocode typography file approach with display glyphs
added into the Private Use Area will be a useful technique in many areas,
including, yet not limited to, interactive broadcasting.

William Overington

15 July 2003

[Private Use Area] Audio Description, Subtitle, Signing

2003-07-14 Thread William Overington

There is presently discussion about the symbols to be used to indicate the
availability of Audio Description, Subtitle and Signing in television
broadcasts.

This is being discussed in the Digital_TV and TV_for_All discussion forums
at the http://www.cenelec.org webspace.

I am suggesting that the following Private Use Area code points be used for
the symbols at the present time.  This could lead to a useful consistency of
encoding for use with interactive television systems.  Hopefully regular
Unicode code points will be established at some time in the future, these
Private Use Area code point suggestions are simply to help in achieving
consistency in the mean time.

U+F2F0, decimal 62192, Audio Description
U+F2F1, decimal 62193, Subtitle
U+F2F2, decimal 62194, Signing

William Overington

14 July 2003

Re: Combining diacriticals and Cyrillic

2003-07-14 Thread William Overington

A possibly useful thing to do would be to make a list of those characters
which you which to produce which are not already encoded as precomposed
characters in Unicode, sort them into alphabetical order and publish a list
of them with code point assignments in the Private Use Area starting at
U+EF00.

This would mean that fonts could be produced with each of those precomposed
glyphs accessible from a Private Use Area code point.

Please know that you can use any code points in the Private Use Area which
you choose, yet I am suggesting U+EF00 upwards so that the code points would
be consistent with my suggested use of the Private Use Area for interactive
television broadcasts.

For producing graphics files for the web or for local hardcopy printing it
would be possible to use those glyphs directly from the Private Use Area,
thereby producing an elegant graphic.  As Unicode code point information is
not placed in a graphic when lettering is added to a graphic, the result
would not show that the Private Use Area had been used.

I have devised a method called a eutocode typography file for use with
languages of the Indian subcontinent.  It would seem potentially useful for
your application as well.

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

As far as I know the eutocode typography file has not yet been implemented
in any software applications, it is primarily a suggestion for the future in
relation to interactive television yet may be useful elsewhere.

http://www.users.globalnet.co.uk/~ngo/ast0.htm

Software would need to be developed (by you or by other interested people),
yet essentially what is needed is software to take an input document and
process it according to information in a eutocode typography file.  In this
way the Private Use Area codes would not be used for interchanging
information, yet would be used locally so as to produce an elegant display.

The best long term solution, in my opinion, would be to send in a proposal
to the Unicode Consortium to add the precomposed glyphs into regular
Unicode.  However this takes time and may not be successful and a Private
Use Area solution does permit progress to be made now.

Please know that my suggestion of publishing a list of Private Use Area code
points may be regarded as controversial by some readers of this list and it
is possible that you may be advised not to do it by some other readers.

However, in my opinion, publication of code points for some uses of the
Private Use Area does have some benefits for some applications.  In this
case it would at least achieve some consistency amongst those font makers
who might like to add the precomposed characters into existing fonts.  In
relation to advanced format fonts the use of the Private Use Area code point
in addition to the encoded access method does have the benefit of allowing
access to the glyphs to people who are using a PC which does not have
facilities for using the encoded access method of the advanced format font.

William Overington

14 July 2003

-Original Message-
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Date: Thursday, July 10, 2003 10:23 AM
Subject: Combining diacriticals and Cyrillic


Dear Ladys and Gentlemen,

Currently there is an ongoing effort in Bulgaria trying to resolve an
issuie concerning the way we write in Bulgarian.

Our problem is:

Usually a bulgarian regular user does not need to write accented
characters. There is one middle-sized exclusion of this, but generally we do
fine without accented characters. The problem is that in some special cases
or more serious lingustic work, one definetely needs to be able to write
accented characters (accented vowels).

One of the ideas is to invent a new ASCII-based encodings, containing the
accented characters we need. This would introduce an additional disorder in
the current mess of cyrillic encodings, and would introduce problems with
automated spellcheck.

Generally I beleive it would be best to invent a Unicode based solution.

Such a solution is for example, combining diacritical signs with the
cyrillic symbols.

I composed a demo page:
http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/

and then made 10-20 shots of the results on Opera and IE on Linux, Windows
98 and Windows XP:
http://v.bulport.com/bugs/opera/426/balhaah_lonex_org/shots.html

You can see that this approach yields _quite_ incosistent and useless
results, depending on the font, application and operating system being used.

Finally, I wonder if you could give us some advice:

1.
Is it possible somehow to improve this approach? I imagine eg., if the font
can provide prepared combined symbols whenever the application asks for a
combined cyrillic+diacritical, instead of leaving the application to do the
combination.

2.
Do you see other unicode based approach to the Bulgarian problem?

3.
Do you beleive the approach should be looked for outside Unicode?

Please excuse me for wasting your time

Re: Revised N2586R

2003-06-26 Thread William Overington

Michael Everson wrote as follows.

At 08:44 -0700 2003-06-25, Doug Ewell wrote:

If it's true that either the UTC or WG2 has formally approved the
character, for a future version of Unicode or a future amendment to 10646,
then I don't see any reason why font makers can't PRODUCE a font with a
glyph for the proposed character at the proposed code point.

They just can't DISTRIBUTE the font until the appropriate standard is
released.

That's correct.

Well, certainly authority would be needed, yet I am suggesting that where a
few characters added into an established block are accepted, which is what
is claimed for these characters, there should be a faster route than having
to wait for bulk release in Unicode 4.1.  If these characters have been
accepted, why not formally warrant their use now by having Unicode 4.001
and then having Unicode 4.002 when a few more are accepted?  These minor
additions to the Standard could be produced as characters are accepted and
publicised in the Unicode Consortium's webspace.  If the characters have not
been accepted then they cannot be considered ready to be used, yet if they
have been accepted, what is the problem in releasing them so that people who
want to get on with using them can do so?  Some fontmakers can react to new
releases more quickly than can some other fontmakers, so why should progress
be slowed down for the benefit of those who cannot add new glyphs into fonts
quickly?

For example, symbols for audio description, subtitles and signing are needed
for broadcasting.  Will that need to have years of waiting and using the
Private Use Area when it could be a fairly swift process and the characters
could be implemented into read-only memories in interactive television sets
that much sooner?  Why is it that it is regarded by the Unicode Consortium
as reasonable that it takes years to get a character through the committees
and into use?  Surely where a few characters are needed the Unicode
Consortium and ISO need to take a twenty-first century attitude to getting
the job done for people's needs rather than having the sort of delays which
might have been acceptable in days gone by.  The idea of having to use the
Private Use Area for a period after the characters have been accepted is
just a nonsense.

William Overington

26 June 2003

Re: Revised N2586R

2003-06-26 Thread William Overington

Peter Constable wrote as follows.

  the name is simply a unique identifier within the std.

Well, the Standard is the authority for what is the meaning of the symbol
when found in a file of plain text.  So if the symbol is in a plain text
file before or after the name of a person then the Standard implies a
meaning to the plain text file.

 A name may be somewhat indicative of it's function, but is not necessarily
so.

Well, that could ultimately be an issue before the courts in a libel case if
someone publishes a text with a symbol next to someone's name.  A key issue
might well be as to what is the defined meaning of the symbol in the
Standard.  Certainly, the issue of what a reasonable person seeing that
symbol next to someone's name might conclude is being published about the
person might well also be important, even if that meaning is not in the
Standard.

 You could call it WHEELCHAIR SYMBOL, but that engineering of the standard
is not also social engineering, and people may still use it to label
individuals in a way that may be violating human rights -- we cannot stop
that. No matter what we call it, end users are not very likely going to be
aware of the name in the standard; they're just going to look for the shape,
and if they find it, they'll use it for whatever purpose they chose to.

Certainly.  Yet a plain text interchangeable file would not have the meaning
built into it by the Standard.  I agree though that there may well still be
great problems.

William Overington

26 June 2003

Re: Nightmares

2003-06-26 Thread William Overington

Tom Gewecke wrote as follows.

 My personal idea of an Orwellian nightmare would to have a committee of
vigilant freedom protectors evaluating the political and social
implications of encoding symbols and passing judgement on whether
particular characters should be encoded and what their names should not be.

Yes, I agree that would be terrible.

The difference of your personal idea of an Orwellian nightmare from what I
am suggesting should take place is great.  I am suggesting that everybody,
as part of their activity in character encoding, be vigilant that what is
encoded does not provide an infrastructure for an Orwellian nightmare to
take place with computing systems such as databases.  The difference is like
a country having a special riot police force and having regular police who
wear riot gear when the need arises.  This distinction was stressed when
police in riot gear were first seen on the streets in England, as the
television news began by using the term riot police.  So I am not
suggesting such a committee, just ordinary regular people who encode
characters being vigilant about the political and social implications of
what they are doing, lest by not concerning themselves with such an
important aspect of their work, namely the potential for causing misery, the
opportunity for such misery to occur is unthinkingly provided or is not
prevented when it easily could be prevented.

Hopefully this will clarify my thinking to you and hopefully be of interest
to people involved in character encoding discussions.

One of the great issues of the last century was as to whether scientists
should consider the political and social implications of their work or just
work as if somehow separate from society and leave the application of the
things which they discovered and developed to politicians and business
people.

This issue has arisen because of my concern that a particular symbol has
been labelled as HANDICAPPED SIGN.  I hope that the name will be changed to
WHEELCHAIR SYMBOL.

Yet what if my concerns over the need for vigilance were now dismissed?
What characters might be encoded in the future with what names?  After all,
if no one is willing to be vigilant because that very vigilance is regarded
as an Orwellian nightmare, there would then be no constraints.

I am very much someone who believes in the need for checks and balances.  I
feel that we need checks and balances in what is encoded and what names are
applied to symbols.  I also feel that we need checks and balances as to how
those checks and balances are carried out.

William Overington

26 June 2003

Re: Revised N2586R

2003-06-25 Thread William Overington

I am rather concerned that the name HANDICAPPED SIGN is being used without
any justification or discussion of the name of the character.

The Name Police approved. ;-)

I am rather concerned about the Orwellian nightmare possibilities of this
and believe that vigilance is a necessary activity to protect freedom.

Oh, spare us.

Well, it is like the Millennium bug problem.  People took it seriously and
spent a lot of time and effort in preventing it causing chaos.  When nothing
happened a news anchor on British TV in early January 2000 asked an expert
in the studio if, as nothing had happened, all the concern had been just a
lot of hype.

The expert explained that it was only because of the concern and the care
taken that nothing had gone wrong on 1 January 2000.

In like manner I feel that it is very important that care be taken now over
issues such as the possibility of an Orwellian nightmare then when it does
not happen although we might not be sure whether our vigilance prevented it
happening or whether it would not have happened at all, nevertheless it will
not happen: whereas if we do not bother who knows what practices might exist
with databases in ten or twenty years time.

Likely WHEELCHAIR SYMBOL is a more accurate name.

That is a good suggestion.  Perhaps WHEELCHAIR SYMBOL could be used instead
of HANDICAPPED SIGN please.

A guiding principle for encoding symbols could be that the description
applies to the symbol not to any person whom it might be used to describe in
some applications.

There is a DISABILITY SYMBOL http://www.mdx.ac.uk/awards/disable.htm which
is different; it's called the TWO TICKS SYMBOL as well.

Where I have seen the two ticks symbol in use is to indicate in brochures
and advertisements that an organization claims to take care to treat people
who have disabilities in a fair manner, doing what is necessary to help them
use facilities or be employed.  It is not applied, as far as I know, to
individuals who have a disability.

An Orwellian nightmare scenario of just encoding the symbols and leaving
it to people who use Unicode as to how they use the symbols is not
attractive.

Rein in those hares, William, please.

Well, I realize that what I say may, at first glance, possibly appear
extreme at times, yet please do consider what I write in an objective
manner.  If Unicode has a WHEELCHAIR SYMBOL then that is a symbol, if
Unicode encodes a HANDICAPPED SIGN then that is a description of someone to
whom it is applied, a Boolean sign for all, whatever the disability may be,
whether it is relevant to the matter in hand or not.  I do wonder whether
the encoding of the symbol as HANDICAPPED SIGN would be consistent with
human rights as it would be assisting automated decision making with a
Boolean flag and providing an infrastructure for such practices.

However, hopefully those of you who have the power to vote on these matters
will act to change the name from HANDICAPPED SIGN so as to take account of
these concerns.  For me, WHEELCHAIR SYMBOL seems fine as the name simply
describes the symbol.  However, it may be that other people might have other
views on the name.

William Overington

25 June 2003

Re: Revised N2586R

2003-06-24 Thread William Overington

Michael Everson wrote as follows.

 I do the best I can. At the end of the day my document won its case and
the five characters were accepted.

This raises an interesting matter.

In that the document proposes U+2693 for FLEUR-DE-LIS it would seem not
unreasonable for fontmakers now to be able to produce fonts having a
FLEUR-DE-LIS glyph at U+2693.

However, what is the correct approach?  Is it that the characters must
remain either unimplemented or else implemented as Private Use Area
characters until Unicode 4.1 or whatever is published, notwithstanding that
the hardcopy Unicode 4.0 book is not yet available?  That will probably take
quite some time.  It appears to me that there should be some system devised
so that when a few extra symbols are accepted into an already established
area that those characters can be implemented in a proper manner much more
quickly than at present.

However, such speeding up of the process might not always be a benefit.  For
example, the proposed U+267F which has, in the document the name HANDICAPPED
SIGN could, if there were a fast track process, be all the more quickly
incorporated into databases as a way for officials to make automated
decisions about people much more conveniently without considering the
individual circumstances of each person so tagged.

I am rather concerned that the name HANDICAPPED SIGN is being used without
any jusitication or discussion of the name of the character.  The character
has now been accepted it appears.

I am rather concerned about the Orwellian nightmare possibilities of this
and believe that vigilance is a necessary activity to protect freedom.  Just
think, data about someone can be expressed with one character which can be
sent around the world to be stored in a database which is not necessarily in
a jurisdiction which has laws about data protection.  Automated decision
making is a matter covered by United Kingdom data protection law, yet does
the law have any effect in practice?  For example, some credit card
application documents now have in the small print items about the applicant
agreeing to accept automated decisions.  And also, does every user of
computer equipment obey the law?

I gather that in the United States there is a concept of a Social Security
number and that it has now become the widespread practice that people who
are nothing to do with the administration of social security now routinely
ask (and maybe even require) someone to state his or her social security
number before they can do anything.  I wonder what is the effect of saying
that the number is for social security purposes and one is not willing to
state what it is.  Perhaps even questioning why that information is needed
will go against one.

The issue of the name for what Michael has named as HANDICAPPED SIGN needs,
in my opinion, some discussion.  If that discussion widens into what
purposes for which Unicode could or should be used and whether the political
and social implications of encoding symbols is something of which people
should be aware, then fine.

For example, would DISABILITY LOGO be a better name?  I have seen the logo
used in signs in shops with the message Happy to help referring to help
for people with any disability where help is wanted, not just for people in
wheelchairs.  So having the logo in fonts so that such signs could be
printed might well be helpful.  Yet I feel that some discussion about the
implications of encoding this logo need to take place, particularly as the
N2586R document suggests as seemingly obvious the potential for use in
databases.  For example, could the sign be made as not to be interchanged?
Is it best not to encode it in Unicode at all as being too dangerous in some
of its potential applications?  If this symbol is implemented without some
protection for rights, could there be a basis for compensation by someone
disadvantaged by the use of such a symbol in a database?

An Orwellian nightmare scenario of just encoding the symbols and leaving it
to people who use Unicode as to how they use the symbols is not attractive.

William Overington

24 June 2003

Re: Address of ISO 3166 mailing list

2003-06-05 Thread William Overington

Tex Texin wrote as follows.

 Marion, It is very easy to start your own list at
http://www.yahoogroups.com

 You can create lists for 3166, as well as for hiberno-english etc.

 Other Unicode folks have created specialized lists for their own purposes.

A feature of Yahoo groups is the Yahoo rules about intellectual property
rights regarding postings and also the indemnity rules.

As regards intellectual property rights, if someone posts then if later he
or she wishes to publish a book and the publisher asks if any person or
company owns any intellectual property rights in relation to the material in
the book, then the answer might properly be, yes, Yahoo.  That then may mean
that exclusive rights cannot be assigned to a publisher and then the
publisher cannot make a claim against anyone for infringement of copyright
because the publisher does not have exclusive rights.  I am not a lawyer,
yet I do urge caution as to what intellectual property rights problems may
be caused if one posts in a Yahoo group, which do not occur if one posts in
this forum.

There is also the indemnity rule.  It appears that if someone posts in a
Yahoo group and someone somewhere claims against Yahoo, then the poster and
maybe the person who started the group are liable to Yahoo for expenses,
including lawyers fees.  There appears to be a danger that if someone made
even a wild, spurious claim in a court and Yahoo needed nevertheless to
defend it lest it win by not being answered, then the person who starts the
Yahoo group could be liable for the cost of Yahoo's lawyers.

William Overington

5 June 2003

Re: Rare extinct latin letters

2003-06-03 Thread William Overington

Peter Constable wrote as follows.

William Overington wrote on 06/02/2003 01:06:25 AM:

 I am wondering whether the range from U+F200 through to U+F2FF is being
used
 by anyone for anything.

This is a nonsense question. It should never matter to person A whether
others are using particular PUA codepoints *unless* person A needs to
interchange with person B, in which case A and B need to agree on that
range if A intends to use it in interchaning with B.

Suppose person Ai and person Bi are both people with an interest in the
texts which contain these particular rare extinct ligatures.and wish to
exchange documents, which they have keyed themselves, over the internet and
view them using a package such as, say, Microsoft WordPad.  I use Ai and Bi
to mean some particular pair of persons A and B.  You wrote never, so one
counter example will disprove the generality of your claim.  Neither Ai nor
Bi has facilities to make fonts, so they need to rely on having a font made
by a third party.  They have a better chance of having a font to use if the
characters are added into an existing font which already has many other
characters in it, such as the basic latin alphabet and punctuation, so that
only the rare extinct latin letters represent special drawing work, rather
than the whole font.  So, if they look at fonts such as, for example,
Code2000, Gentium and Junicode and observe which Private Use Area code
points are already in use within that font, then choose code points for the
rare extinct latin letters which code points are not used in the fonts at
which they look, then the chances of getting their chosen characters
implemented in those fonts will be increased.

For example, considering my own Quest text font.  If Ai and Bi choose to
place their characters in the U+E7.. block or the U+EB.. block, then I would
not implement them in Quest text.  However, if they place them in the U+F2..
block, then I might well try to have a go at adding them in.  I recognize
that the lettering style of Quest text might not be appropriate to those
characters and Quest text might not be liked as a display face by Ai and Bi,
yet please allow me some latitude in this as I am trying to explain my
thoughts without speculating about the thoughts of some other person who
produces a font which might have a face design considered more appropriate
to the particular application.

So, bearing in mind my knowledge of some uses of the Private Use Area I
thought that the U+F2.. block looks prima facie reasonable, in that it
avoids code points used for Tengwar, for Phaistos Disc, for Ewellic, for
golden ligatures and courtyard codes, while also avoiding the very top end
of the Private Use Area.  So, instead of simply sending a private email
response I posted to the mailing list in the hope that the readers of this
forum might like to help the process along of helping the gentleman be able
to use those rare extinct latin letters which interest him, in a practical
manner.

Your question seems to be assuming the community of Unicode users at large
can share agreements on PUA assignments,

Well, surely they can if they choose to do so.  Please note that I am not
saying should, must, will or whatever: you used the word can and I answer
about can.

 and in response I'd say that effectively you must assume that every last
PUA codepoint is being used by somebody somewhere.

I accept that that assumption needs to be made in generalized theoretical
considerations, yet in a practical situation of trying to get a few special
characters added into one or more existing fonts, it is highly relevant to
know which code points are already in use and which are not already in use
in a selection of fonts as that information can then be used so as to devise
a Private Use Area encoding scheme for the desired characters which has a
higher chance of being implemented.

 (And I can assure you that somebody has their own usefor F200..F2FF.)

Well, unless it is a secret or confidential it would be helpful if you could
please say what it is, as that information could be used to consider whether
a font needing both collections of characters would be likely to be needed
for one particular document produced by an end user.

William Overington

2 June 2003

Re: Rare extinct latin letters

2003-06-02 Thread William Overington

Patrick Andries wrote as follows.

 [PA] I believe the need of an encoding may be pragmatically ascertained, I
don't known about the « real linguistic value » of an alphabet. I have, by
the way, no problem if someone says : « Sorry, too idiosyncratic and
excentric ! Use the private user area if you need such characters. » This
may well be the case.

I suggest that a good idea would be for you to produce a list of which
characters you would like and encode them as a Private Use Area encoding and
publish the list.  That would bring the possibility of being able to use the
characters in a Unicode compatible environment one step closer.  If they are
one day promoted to regular Unicode then fine, otherwise there would
nevertheless be a consistent encoding available for anyone who chooses to
use it, which would help in interoperability.

If you choose to encode them in the Private Use Area, it is entirely up to
you which code points you specify within the range U+E000 through to U+F8FF.
However, you might like to take into account the code ranges already being
used by various fonts which use the Private Use Area as avoiding a clash
might increase the chances of the characters becoming added into established
fonts such as Code2000, Gentium and Junicode, as well as being added into
fonts designed specifically for older French texts.  If a set of code point
allocations is widely available, then the chances for implementation itself
and implementation in an interoperable manner are increased.

I am wondering whether the range from U+F200 through to U+F2FF is being used
by anyone for anything.  So perhaps, if you choose to encode the rare
extinct latin letters in the Private Use Area, if anyone who reads this
knows of whether U+F200 through to U+F2FF is being used by anyone for
anything perhaps he or she might draw attention to the fact in this forum
please.

William Overington

2 June 2003

Re: default ignorable posts (was Re: Is it true that Unicode isinsufficientfor Oriental languages?)

2003-05-28 Thread William Overington

Peter Constable wrote as follows.

 Moreover, a while back, I took a look at the forum in which DVB-MHP is
being discussed to see how people there responded to your ideas, and
discovered that nobody there was interested (as indicated by lack of any
response to your posts). If it's not worth discussing in that place, where
it is centrally on topic, it's not worth discussing here.

A lack of response to a post is not in any way any indication of lack of
interest.  It might perhaps be that nobody was interested, yet a lack [sic]
of any response is no measure of interest or otherwise.  If people simply
agreed, or thought it interesting and something to possibly bear in mind for
the future then there would be no need to reply.

Part of the process of the publication option of getting an invention
implemented is to place the information before people so that as many of
one's ideas as possible are there when the idea gets taken up.  Once it is
taken up, various people may start adding items as they are needed: the more
that the inventor has published and placed before people before taking-up
takes place the more of the inventors ideas are likely to be in the
implemented system.  So publishing the details is important.

For example, it might be that my list of Private Use Area code point
allocations for multimedia programmed learning authorship within Unicode
text files might be printed out and filed by industrial librarians.

Although Private Use Area code point allocations have no standing in
relation to the Unicode Standard, there is no reason why they should not be
used consistently and widely within a specialist domain, such as, for
example, digital interactive broadcasting.  Indeed, Private Use Area code
points could be widely used for some activities such as multimedia authoring
generally.

I feel that it needs to be pointed out that many people are not allowed to
post in public forums or to comment publicly on technical matters and ideas
which relate to their employment, so lack [sic] of response to my ideas is
no indication of any lack of interest.

However, it might indeed be that there is no interest in my code point
allocations, yet that is the chance which I, as an inventor, need to take
when trying to follow the publication option to get an invention
implemented.  It worked for my telesoftware invention however, as that
invention is now at the centre of digital interactive television systems and
the word telesoftware is in the Oxford English Dictionary.

William Overington

28 May 2003

Re: Ancient Greek

2003-04-04 Thread William Overington

Chris Hopkins wrote as follows.

quote


I am a new list member interested in implementing archaic, classical and
Hellenistic Greek glyphs in a Unicode font. My initial questions will be
focused on handling multiple alternate glyphs for each character, and how to
organize a font with several thousand Hellenistic monograms.

Is this the appropriate discussion list? If not, I'd appreciate a pointer.

end quote

This looks an interesting discussion and I hope that you will ask your
questions in this forum.

The matter of multiple alternate glyphs for each character seems at first a
font issue, and it is partly a font issue, yet it is also a Unicode issue
once one starts trying to encode a document which is intended to apply those
glyphs in some controlled selection manner.  For example, are you going to
have some texts such as Author A uses the symbol X for beta whereas author
B uses the symbol Y for beta. where X and Y are just two of the multiple
alternate glyphs which you mentioned?

What please is a Hellenistic monogram?  I am wondering whether this is going
to be a good application of the Private Use Area, either on a permanent
basis or on a temporary basis pending making a formal encoding application.
In either case, reading about the Private Use Area in Chapter 13 of the
Unicode specification available from the http://www.unicode.org webspace may
prove interesting.

William Overington

4 April 2003

Re: Exciting new software release!

2003-04-04 Thread William Overington

Doug Ewell wrote as follows.

quote

What happened to LTag?  Well, as everybody knows, the Unicode Technical
Committee strongly discourages the usage of these tags, to the point
were they were almost deprecated earlier this year.  They are permitted
only in special protocols, and are certainly frowned upon for use in
arbitrary plain text, which is what LTag was for.  So, in an attempt to
restore some of my lost Unicode street cred I removed LTag from my
site.  I still keep the program around, but only as a reference to ISO
639 and 3166 codes.

end quote

Well, whether the tags were (italics) almost (end italics) deprecated
earlier this year I do not know, yet the fact is that, after a lengthy and
extended Public Review process as to whether to deprecate them, the tags
were not deprecated but the situation was left broadly unaltered but with
some additional notes to be included in the Unicode 4.0 document.

It remains to observe what is to be put about tags in the Unicode 4.0 book.
Whether tags will be used in interactive broadcasting as a feature used in
(italics) some (end italics) content, such as with (italics) some (end
italics) generic file handling packages for distance education, remains for
the future, yet the option remains open.

William Overington

4 April 2003

Re: Exciting new software release!

2003-04-04 Thread William Overington

Doug Ewell wrote as follows.

I'll mail it, or maybe repost it, after I finish applying a nice, THICK
coating.  I'm thinking about one of those expired-shareware message
boxes where the OK button is disabled for the first five seconds.

But I'd like to get this third-subtag question resolved first.

Could you possibly consider making the checking facility a checkbox option
please, which comes up already checked, so that explicit unchecking needs to
be done in order not to have the checking.

I am not thinking of going against recognized standards but always having
checking might end up causing problems as time goes on.

William Overington

4 April 2003

Re: Exciting new software release!

2003-04-04 Thread William Overington

Stefan Persson wrote as follows.

quote

Well, let's say that I make a plain text document and include a
mathematical formula or funtion such as cos x, it would still be legal
to use an italic x from the mathematical block, wouldn't it?  This is
what those characters are intended for, right?

end quote

In the days of letterpress printing, something such as

y = cos x

would have been set with the cos in roman type, probably from an ordinary
serifed font, as might be used for ordinary book printing, and the y and the
x in the italic version of the same typeface.  I remember that the typeface
Modern Roman, a serifed face with an upward hook on the end of a capital R
character and a very open lowercase e character, was often used, though not
exclusively.

How should that be set in Unicode plain text?  Is it to use the letters for
cos from the range U+0020 to U+007E and then use U+1D466 for the y and
U+1D465 for the x?

I note that U+1D465 MATHEMATICAL ITALIC SMALL X in the code chart has the
following text accompanying the definition, following a symbol which looks
like a wavy equals sign with the word font within angled brackets which I
will not place in this email in case it upsets any email systems, so I will
herein use parentheses.

(font) 0078 x latin small letter x

Yet there would seem to be missing the concept that the character is an
italic of a serifed font.

When trying the MathText program I tried, as I mentioned before, to try to
get MathText to produce Greek characters.  This was mainly out of curiosity,
having been studying, as part of the process of studying MathText, the
U1D400.pdf code chart document rather than any immediate need, though with
the thought that such a facility might be useful sometime and that, should
such a situation arise, I could perhaps use MathText to generate the codes.

Yet which Greek characters would I wish to use? Subsequent study of the
U1D400.pdf document raises an interesting matter.  I would probably want to
use some of those in the range U+1D6FC MATHEMATICAL ITALIC SMALL ALPHA
through to U+1D71B MATHEMATICAL ITALIC PI SYMBOL.  However, whereas I might
well want to use U+1D6FC, the U+1D71B is a symbol which I have not seen
before and indeed wonder what it is, bearing in mind the existence of
U+1D70B MATHEMATICAL ITALIC SMALL PI.

Yet the interesting point which has arisen is this.  The most common use of
such italic letters would seem to be, from my own potential usage, would be
for angles theta, phi and psi for expressing rotation angles.

U+1D713 MATHEMATICAL ITALIC SMALL PSI for psi.

U+1D703 MATHEMATICAL ITALIC SMALL THETA for theta rather than using U+1D717
MATHEMATICAL ITALIC THETA SYMBOL.

U+1D719 MATHEMATICAL ITALIC PHI SYMBOL for phi, rather than using U+1D711
MATHEMATICAL ITALIC SMALL PHI.

I seem to remember a discussion in this group about the two versions of phi
in relation to ordinary Greek characters some time ago.

William Overington

4 April 2003

Re: Exciting new software release!

2003-04-03 Thread William Overington

It certainly is exciting!

I learn a lot from your fun Doug.  I remember when we had The Respectfully
Experiment and I asked you how you managed to get the U+E707 character into
your message and you mentioned the SC UniPad program from the
http://www.unipad.org webspace.  That program is very useful for various
purposes, I have used it in relation to preparing text with colour codes for
research about broadcasting and indeed I have been using it to analyze the
output from using your MathText program.

Some information about the colour code experiments, and a link to a font
with which one can experiment, are in the following web page.

http://www.users.globalnet.co.uk/~ngo/font7001.htm

I used a file, produced using Notepad, named mathin.txt with the following
text.

This is a test.

I processed this file through MathText using the Fraktur style using
mathout.txt as the output file.

I then used File | Open in SC UniPad to open the file mathout.txt as a UTF-8
file.  There was the display in Fraktur letters.  Wow!

So, I then did an Edit | Select All on the Fraktur text, followed by an Edit
| Convert | Unicode to UCN.  This gave a stream of ordinary text in \u and
\U format, each \u sequence having four hexadecimal characters after the \u
and each \U sequence having eight hexadecimal characters after the \U.  Wow
again!  I did not realize that SC UniPad would do such a conversion!

These tests were carried out on a PC running Windows 98.

I am now wondering whether I can convert the text into surrogate pairs so
that I can both read the \u sequences for the surrogate pairs in SC UniPad
and so that I can copy the surrogate characters themselves onto the
clipboard for pasting into the text box of a Java applet.

Have you considered the possibility of a similar program to encode a string
of ASCII characters as plane 14 tags please, with an option checkbox to
include the U+E0001 character at the start and an option checkbox to include
a U+E007F character?  That would be a very useful program which could be
used in conjunction with SC UniPad to marshall plain text which uses
language tags.  Such a program would be a very useful tool to have available
for access level content production for use for producing content for free
to the end user distance education for broadcasting around the world upon
the DVB-MHP platform for interactive television.

Recently I was thinking about the possibility of defining a few Private Use
Area characters in one or both of planes 15 and 16.  This being so as to try
to gain experience of applying those Private Use Areas up in the mountains
for use if and when such use becomes desirable.  I am thinking of the long
term possibility of a music font being defined there as one possible
application.  However, for the moment, something more general, such as a few
symbols for vegetables, just to gain experience of what is involved.  For
example, how would one produce a display (not necessarily a web page
display) of the text of the following song together with a few graphics of
vegetables if the whole document were encoded as plain text with the
illustrations of the vegetables encoded as Private Use Area characters from
plane 15 or plane 16?

http://www.users.globalnet.co.uk/~ngo/song1015.htm

As a direct consequence of using SC UniPad with characters from beyond plane
0 as a result of your posting I have found that the CTRL Q facility of SC
UniPad may be used to enter five and six character hexadecimal sequences
which are within the Unicode code space and that such characters may then be
converted to the \U and eight hexadecimal characters format.

Looking at the U1D400.pdf document for which you provided a link in your
document about the program and considering the MathText dialogue box, I am
wondering whether one can set out on with an ASCII file produced with
Notepad and use MathText to reach the various mathematical Greek characters
shown in the U1D400.pdf document.  Is that possible at present?  I tried
with an Alt 130 and an Alt 225 in the .txt file following A and B and before
C and D and requested Bold of MathText just to see what happened, but only
the A and B came out.

Thank you for posting details of an interesting program which is a catalyst
for interest in applying the higher planes of Unicode.

William Overington

3 April 2003

Displaying languages of the Indian subcontinent upon the DVB-MHP platform.

2003-04-03 Thread William Overington

John Clews wrote as follows.

quote

In fairness, you ought to take account of the fact that languages of
the Indian subcontinent have been displayed on TV systems in India
for nearly ten years, based around ISCII.

Is there a reason for doing anything different for DVB-MHP?
Or are mappings and similar accounted for in your paper?

end quote

Many thanks for your note.

The whole text of my document is in the posting.  I know very little about
Indian languages.  It is just that I know that DVB-MHP uses Java and Unicode
and that the built-in font for the minimum DVB-MHP television set is very
European languages oriented.  Please see Annex E of the DVB-MHP
specification, available from the http://www.mhp.org webspace.  I have
suggested in another document in the DigitalTV forum in the
http://www.cenelec.org webspace some additional characters which I feel
would be good additions for the built-in font for the European Union
interactive television.  This is entirely in compliance with the DVB-MHP
specification, as the specification provides many options for local
implementation and specifies a minimum implementation.  Within the European
Union there will potentially be a local implementation though covering the
whole of the European Union, so suggesting some extra characters to be in
the built-in font of all such televisions is just part of the process of
deciding which options to include in the local implementation for the
European Union.

I am simply trying to point out that using languages of the Indian
subcontinent upon the DVB-MHP system with its PFR0 font system may cause
problems, in the hope that experts will look at the problem soon as there at
present seems to be little (or maybe no) intersection in the set of people
who know about DVB-MHP and the set of people who know about languages of the
Indian subcontinent expressed in Unicode.  I am concerned that if nothing is
done there will be problems in a few years time with lots of
interoperability problems, whereas looking at the problem now could save a
lot of problems later.  The possibilities for using the DVB-MHP system for
education around the world are enormous.  The Indian subcontinent is one
major potential area of such use.  I am concerned to try to ensure that
there is a good infrastructure in place so that the languages of the Indian
subcontinent may be used in a Unicode manner upon the DVB-MHP platform in a
straightforward manner.

On the specific matters which you mention, I had no knowledge of the fact
that languages of the Indian subcontinent have been displayed on TV systems
in India for nearly ten years, based around ISCII.  That is interesting to
know.  Is that on a teletext system or what?  However, as far as I know
ISCII is an 8-bit encoding system (and I do mean as far as I know because
I am not certain of that) whereas DVB-MHP uses Unicode.  So not including
mention of ISCII in my document is no problem as far as I know.  Mappings
from ISCII to Unicode are not mentioned in my document.  I started from
considering that someone had encoded some text written in a language of the
Indian subcontinent into Unicode and that it was in a text file ready for
broadcasting and considered the process of getting the text displayed upon
the screen of a DVB-MHP television.  I then pointed out what, to me, seems a
problem which presently exists, in that a PFR0 font, as far as I can tell,
is not a smart font format.

I am hoping that, by having published the paper in the DigitalTV forum in
the http://www.cenelec.org webspace that the matter may be resolved within
the context of the setting of content authoring guidelines for interactive
television which are to be produced for the European Union.

Certainly, I cannot resolve the matter myself as I do not have the
linguistic knowledge necessary to do so.

I suggest using glyphs mapped to the Private Use Area from U+EC00 for this
specific application of Unicode upon the DVB-MHP platform, though not
broadcasting using those code points in relation to languages of the Indian
subcontinent, just using them locally for font access after they are
generated using a eutocode typography file.  I have since been wondering
what is the position of displaying Arabic text using a PFR0 font upon the
DVB-MHP platform.  Does a similar problem exist?  Would the set of Arabic
presentation forms encoded into Unicode be sufficient for the task, so that
a Private Use Area encoding would not be necessary?  The right to left
display of Arabic text is another factor which needs consideration in
relation to the DVB-MHP system.

Thank you for your interest.

William Overington

3 April 2003

Re: Exciting new software release!

2003-04-03 Thread William Overington

In the interests of some fun research in the hope that the fun will lead to
learning in some serendipitous manner I am starting off some Private Use
Area codes for vegetables.

U+10F700 POTATO
U+10F701 CARROT
U+10F702 PARSNIP
U+10F703 PEA

U+10F740 PEAS IN A POD

U+10F780 LEAF OF MINT
U+10F781 LEAF OF SAGE

These should be enough to get started in experimenting with the way that
Private Use Area characters from plane 16 can be applied and finding out
what the problems are in relation to any particular platforms, file formats
and font technologies.

William Overington

3 April 2003

Displaying languages of the Indian subcontinent upon the DVB-MHP platform.

2003-04-02 Thread William Overington

I have now completed and published my document on the topic of displaying
languages of the Indian subcontinent upon the DVB-MHP platform.  DVB-MHP
stands for Digital Video Broadcasting - Multimedia Home Platform.  Details
of the DVB-MHP system are available from the http://www,mhp.org webspace.
There is also the http://forum.mhp.org webspace which may be joined online.
DVB-MHP is likely to become the common interactive television standard
throughout much of the world.  However, the standard provides many options,
defining a minimum system and leaving open many options for implementation
decisions.

The document which I have recently completed is published in the DigitalTV
forum in the http://www.cenelec.org webspace.  This forum is a specialist
forum regarding the implementation of interactive television, using the
DVB-MHP system, within the European Union, involving such issues as
interoperability.  Readers interested in joining this forum may like to know
that an email address for making application is [EMAIL PROTECTED] and
that there is also at present a notification about the purpose of the forum
available using a link in the http://www.cenelec.org webspace.  The fact of
membership is visible to other participants.  Although not mandatory, it is
quite likely that what is decided for use within the European Union will be
used in many countries which are not within the European Union.

The file has the following name, in accordance with the file naming
conventions of the forum.

DigitalTV_WJGO0005_Languages_of_the_Indian_subcontinent.txt

A transcript of the text of the document is below.

William Overington

2 April 2003



Displaying languages of the Indian subcontinent upon the DVB-MHP platform.

I wonder if I may please draw your attention to a potential problem with the
displaying of the languages of the Indian subcontinent upon the screens of
DVB-MHP interactive televisions.

The DVB-MHP system uses Unicode.  The DVB-MHP system also uses a Portable
Font Resource PFR0 font.

I am not a linguist so I am simply mentioning the following document.

http://www.unicode.org/book/ch09.pdf

It is Chapter 9 of The Online Edition of The Unicode Standard, Version 3.0,
the chapter being entitled South and Southeast Asian Scripts.

It appears, as far as I can tell, that a PFR0 font cannot display the
languages of the Indian subcontinent directly from a sequence of Unicode
characters.

The Online Edition of The Unicode Standard can be downloaded from the
following web page, chapter by chapter.

http://www.unicode.org/book/u2.html

The main index page of the Unicode web site is as follows.

http://www.unicode.org

I have thought out what I consider to be a way to solve the problem of
displaying the languages of the Indian subcontinent using software within a
Java program running upon the DVB-MHP platform.

The method is described in the following document.

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

The method uses what I have called a eutocode typography file.

However, in order for the method to be highly effective for the DVB-MHP
platform at an interoperability level, what is really needed is a
standardized list of glyphs for displaying the languages of the Indian
subcontinent so that those glyphs may be mapped to U+EC00 onwards of the
Private Use Area of Unicode.  This would not be essential, yet is, I feel,
highly desirable, because if such a list can be produced and the same list
used by all content authors who produce content for broadcasting upon the
DVB-MHP platform using languages of the Indian subcontinent, then lots of
repeated work can be avoided in the future and there will be advantages for
interoperability of font generation.  For the avoidance of doubt, please
know that I am not suggesting that those Private Use Area code points be
used for broadcasting text.  Text would be broadcast using regular Unicode
code points.  The reason for assigning the glyphs to code points is so that
the incoming text stream can be converted into a local, within the
television set, text stream which can be used to access the PFR0 font so as
to enable the correct glyphs to be displayed upon the screen.  Study of the
document above will show that that particular choice of Private Use Area
code points could also protect against any broadcasting of the languages of
the Indian subcontinent using those Private Use Area codes to access the
glyphs directly as those code points when broadcast could be regarded as
data for a vector graphics system which could be used for drawing
illustrations within a document.  The vector graphics data does not need to
access the font, so the locations in the font can be used for this purpose
on a local, within the television set basis.

If such a list can be produced within the context of the setting of content
authoring guidelines which are to be produced for the European Union, with
appropriate liaison with the government of India, then the task can be
carried out once within

Re: Characters for Cakchiquel

2003-03-29 Thread William Overington

.

Actually, I was rather hoping that the start of a Private Use Area encoding
might be produced by a few interested people fairly quickly, perhaps in this
thread or in some email correspondence.  Once that is done, then font
support could gradually be produced.

William Overington

29 March 2003

Re: Characters for Cakchiquel

2003-03-28 Thread William Overington

Phil Blair wrote as follows.

quote

2.The Jesuits and other missionaries of the Age of Exploration worked
and published intensively in then-exotic languages on four continents. There
are scholars and groups of scholars now attempting to look systematically at
that body of work. I suspect that there is no stange character that could
turn up in a Maya text from that period that wouldn't also turn up in texts
about South American, Asian, or African languages, and when we do deal with
these characters it would be best to do it in a systematic and comprehensive
way. They will all reflect a common origin in the missionary training
institutions of Europe.

end quote

That research sounds fascinating.  Do you have any details of who is doing
the research please?  I am not a linguist yet do have a great interest in
the typographical aspects of the way special characters were printed by the
early printers.  I also have an interest in history so such a project would
be doubly interesting for me.

I suggest that a good idea would be if those of us who are interested could
research the typography and printing aspects and that a Private Use Area
encoding could be made of the special characters.  Then various craft
fontmakers might all use the same encoding and start to produce fonts which
contain the characters.  For example, as a first suggestion, if U+E400 and
upwards were used for that purpose, would that be a suitable choice for the
various font makers who might like to consider adding such characters into
their existing fonts?  The long term goal would be to get the characters
promoted into regular Unicode, yet using the Private Use Area would allow
documents to be encoded rather sooner than if one needs to wait for encoding
into regular Unicode and any such documents encoded could be converted by an
automated process at a later date.  Indeed, using the Private Use Area in
this manner and having font availability might help the research.  My
suggestion of U+E400 is as a basis for discussion: does anyone happen to
know if the researchers have already started a Private Use Area encoding
please as that possibility needs to be checked before starting a new
encoding?

Does anyone happen to know if any of the metal fonts, or matrices, of such
characters survive from the sixteenth century please?  From the general
history of printing there does seem to be a great lack of surviving early
printing type, which has always seemed strange to me, as well as
unfortunate.

William Overington

28 March 2003

Re: Custom fonts

2003-03-19 Thread William Overington

Pim Blokland asked as follows.

quote

Now my suggestion was the browser program which displays this file should be
able to look at the font information in the XML file, open the font file and
retrieve the names of all characters in it, so it can
show the hwesta; character (and all other characters) without needing a
long list of ENTITY entries in the XML.

Anyone else think this would be a good idea?

end quote

Well, I think it would be a good idea.  Could you explain it further please?
For example, starting from a golden ligatures collection character ct
ligature, which I have designated as U+E707 within the Private Use Area
within the golden ligatures collection.

Does this mean that for each Private Use Area item which I specify I would
need to specify a single word name for use in such constructs?

I am happy to do that, thinking that g_ct would be a suitable name for the
golden ligatures ct item.  I could fairly easily devise such names for most
of the golden ligatures collection and with a little thought will hopefully
be able to devise suitable names for the rest.  Am I right in thinking that
this system will only really work if the names are unique, so that if
someone else devises a code for ct at some other code point then it is
important that the name for that usage is other than g_ct or is it not
essential, though just desirable, for the names to be unique?

Can you possibly post an example of what files would need to carry which
information please so that the g_ct name could be used in the manner which
you suggest?

William Overington

19 March 2003

Re: List of ligatures for languages of the Indian subcontinent.

2003-03-18 Thread William Overington

 might like a copy.  It is not a Unicode font,
so that it can be easily used with the Paint program, though I am
considering a Unicode version yet wondering quite how best to encode it.

http://www.users.globalnet.co.uk/~ngo/OLD_NEW_.TTF

William Overington

18 March 2003

List of ligatures for languages of the Indian subcontinent. (from Re: per-character stories in a database)

2003-03-17 Thread William Overington

 And nobody out there is volunteering to do it.

I would do it gladly, but I do not have any skills at Indian languages.  My
opinion is that the list is important for the future of digital interactive
broadcasting so I am trying to get the list done so that it is ready for use
in displaying distance education texts in interactive broadcasting
situations across the Indian subcontinent using my telesoftware invention.

I was told that I could commission it.  I described what I thought was a
good design brief for the list and asked how much it would cost.  I am still
waiting to find out.

A lot of the information needed to prepare the numbered list is apparently
in files, it is just that it is not available to people.

If the Unicode Consortium really does not wish to include this important
project within its scope, then it will need to be achieved in some other
manner.  I would have thought that whether the Unicode Consortium will take
this project on or not should go to a formal board meeting of the Unicode
Consortium so that there can be no doubt whatsoever of the provenance of any
decision.

William Overington

17 March 2003

Re: per-character stories in a database (derives from Re: geometric shapes)

2003-03-15 Thread William Overington

 then be such that
royalties go to the United Nations for ever to help with health care around
the world.  Just think, the operas of Gilbert and Sullivan go out of
copyright in about two years time, so perhaps they could be the providers of
such money.  The idea is not perhaps as far fetched as it might at first
sound.  Please look at the provisions in British Law (in an Act of 1988
about intellectual property I think) where there is a specific provision in
relation to the one work Peter Pan.  Where the author of a work is a
corporation the last seventy years of copyright starts ticking right from
publication, so perhaps some of the great movies would soon produce such
income.  Just an idea at present, but maybe once this posting goes around
the world lots of people might think about it and maybe someone can get it
done.

William Overington

15 March 2003

Re: Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-14 Thread William Overington

Yesterday, 13 March 2003, I wrote as follows.

quote

So I reasoned that the system might scan through a font when it is loaded
and decide upon the lowest point for the whole font and then proceed on that
basis.

end quote

An email correspondent has kindly written to me privately and I now know
that it is not necessary for an application such as a wordprocessing package
to make a complete survey of all the glyphs in a font as the font is being
loaded, because the information on what are the high and low points for the
font is readily available in predefined locations within the font.

I expect that many readers of this list already know that, yet I feel that I
should post this note in case some readers do not because I would not want
to have set them off on a wrong way of looking at how a system works.

William Overington

14 March 2003

Unicode 4.0 chapter headings and numbering.

2003-03-14 Thread William Overington

I wonder if you could please say whether the Unicode 4.0 book will have the
same chapter headings and numbering as the Unicode 3.0 book?

My reason for asking is that I am writing a paper about the possible
problems with using languages of the Indian subcontinent on the DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) interactive
television platform where the PFR0 Portable Font Resource system is used for
those fonts which are broadcast.  DVB-MHP uses Java and Unicode.  I want to
refer to Chapter 9 South and Southeast Asian Scripts as the place to look
for the details of what is necessary, yet the paper needs to be usable both
before and after the publication of Unicode 4.0, so I would like to know if
the chapter headings and numbering will be unchanged please as I would like
simply to refer readers to Chapter 9 South and Southeast Asian Scripts of
the Unicode Standard at the http://www.unicode.org webspace.

Also, if unchanged, is that a matter of continuing stability for future
issues as well, or is it just for Unicode 4.0 please?

William Overington

14 March 2003

Re: per-character stories in a database (derives from Re: geometric shapes)

2003-03-14 Thread William Overington

Markus Scherer wrote as follows.

quote

It has been suggested many times to build a database (list, document, XML,
...) where each designated/assigned code point and each character gets its
story: Comments on the glyphs, from what codepage it was inherited, usage
comments and examples, alternate names, etc.

I am talking about both code points and characters on purpose, and I
would go a step beyond documenting what's there. All the characters that
can be represented by a sequence of assigned Unicode characters should be
listed, with that sequence (or those sequences), and with further
explanation if necessary.

end quote

Yes, that is a very good point.  I have become interested in the languages
of the Indian subcontinent from the standpoint of trying to ensure that they
can be displayed properly using interactive television using portable font
technology, however I am not a linguist and I find it strange that the
Unicode Standard does not codify the ligatures which can be produced with
the languages of the Indian subcontinent at display time using specific
sequences of regular Unicode characters so that someone skilled in the art
of font design may design a font from the code charts.

Later he wrote.

quote

Now we just need to
- find someone to sponsor this effort technically and with humanpower
- squeeze the existing information out of the standard, the mailing lists,
FAQs, and of course out of the Unicode veterans before they retire by
Unicode 6...

end quote

Well, how about an approach like Project Gutenberg uses for proofreading
transcripts of classic books.  If there were a database where people could
post items about particular characters and people could read them and either
confirm what is said or put some other view or just add some other
information, then maybe the database could just sort of gradually become
generated over a period of years.  How big would that be?  About 100
thousand code points at, say, 200 words for each on average at about 5 or 6
characters per word on average with a space following each word would be
about 130 megabytes in total.  I fully realize that the phrase sort of
gradually might easily be quoted in a response to this posting, yet if the
database facility were there, accessible directly from the web, there may
well be many people who would stop by for a while and review what has been
entered and add a little more to the database.

PS: Sorry, I am not in a position to volunteer...

Well, it could be more of an informal thing.  If the facility were set up,
then people who are interested could simply visit the web site when they
felt like participating.  Certainly there might be a core of people who had
the ability to throw out rubbish and to convert fragments of text into a
good English narrative so that there was some overall structure to it all,
yet it does not necessarily need to be as formal and rigid as if it were a
commercial project with a time deadline, particularly if the alternative is
that it does not get done at all.

William Overington

14 March 2003

Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-13 Thread William Overington

Thank you both for your responses.

Yes, U+2502 or U+2503 would achieve the desired effect for which I devised
U+E700 STAFF without resorting to the Private Use Area.

The only reason for my not using one of those was that I was unaware of
those codes as such.  An interesting point is that they appear to be usable
with fonts which have descenders yet still fill the entire height of the
font.  I suppose that when I had, some time ago, when looking through what
Unicode offers, in a general context, not looking for the STAFF effect at
that time, seen the box drawing characters I thought of those characters in
the context of the character set of the old PET computer from the 1970s and
of the way that some software on older non-graphics terminals on mainframe
computers makes an attempt at message windows using such characters to
construct boxes.

Indeed, an interesting footnote to U+2502 states = Videotex Mosaic DG14.  I
cannot quite remember what Videotex was.  I remember Videotext (with a t at
the end) and seem to remember that Videotex (no t at the end) was a
different system, possibly from the USA or maybe France.  There was also a
system which started called NAPLPS, which was an acronym for something like
North American something and the word Presentation was in it, though I
forget the exact acronym derivation.

I was unaware of the VDMX table and so had a look at http://www.yahoo.com
and found a couple of useful documents.

However, VDMX appears to refer specifically to OpenType rather than ordinary
TrueType.

My reason for including the STAFF character, the intended effect of which I
can now produce using U+2502 or U+2503, was that, being fairly new to
producing fonts and just, thus far, using the Softy editor to produce
ordinary TrueType fonts, I had noticed, when trying it out in 2002, that if
I produce a font with a b c d e f then the font displays with lines packed
togather, yet that if I then add g the line spacing for all lines increases,
even if there is no g in that line.  So I reasoned that the system might
scan through a font when it is loaded and decide upon the lowest point for
the whole font and then proceed on that basis.  Now, in defining Quest text
I wanted to have the possibility of accents on capital letters and
descenders such as y and g and always look clear, so I decided effectively
to lock some leading into the font and set the maximum height right from the
start.

Features of Quest text are that it is designed so that characters are
produced directly from drawings in the Softy editor, not from template
graphics, and that Quest text is designed, as far as possible, by the
application of a set of rules, such as that verticals are all 256 font units
wide, with both edges at a font unit value which is a multiple of 256 and
that horizontals are all 168 font units in vertical height with one edge at
a font unit value which is a multiple of 256, corners which are curved are
curved with a single Bézier curve which has an action length, as I call it,
of 128 font units in both horizontal and vertical directions.  Some
characters, such as x and k are exceptions to the general rules, yet Quest
text is largely made up of horizontals and verticals, including for letters
such as A O e and s.  The idea is that hopefully Quest text will be very
clear at both 12 point and 18 point and that, as point size increases, it
will display its artistic look.  At 300 point, Quest text looks smooth and
rounded with an elegant combining of wider verticals with narrower
horizontals, almost as if drawn with a pen with a nib 256 font units wide
and 168 font units high.  The rules do produce the effect though that
capitals look lighter than lowercase letters as they are overall wider and
yet use the same width verticals.  I am wondering whether to consider that a
fault or a feature!  :-)

An important part of the development process of Quest text is to display
some text at 12 point in WordPad, make a Print Screen graphic and paste it
into Paint and then study the graphic at 8x magnification.  Hopefully Quest
text combines great clarity with an artistic look.

William Overington

13 March 2003

Ligatures fj etc (from Re: Ligatures (qj) )

2003-03-12 Thread William Overington

 displayed, though it can be displayed
for test purposes if desired.

William Overington

12 March 2003

Re: Ligatures (was: FAQ entry)

2003-03-10 Thread William Overington

Pim Blokland wrote as follows, responding to Doug Ewell.

quote

 I suspect it would end when you start talking about combinations like qj
 and f that are unlikely to appear in natural language text.  At least
 gj exists in Hungarian.

fb, fh and fk are very common in Dutch (much more so than fj). f exists in
Icelandic; at least I've found arfegi. However I don't speak Icelandic,
so I've no idea if this is a combination of two subwords.

end quote

During the spring and summer of 2002 I produced a number of web pages about
encodings for ligatures, the encodings using the Private Use Area.  Some of
the characters mentioned are encoded within the golden ligatures collection.

http://www.users.globalnet.co.uk/~ngo/golden.htm

I will try to add qj gj and f thorn in due course.  Where I have an f
ligature I have added an ff ligature into the encoding scheme, so I expect
to add ff thorn in as well, just in case it is needed, though I have no
knowledge of whether it is ever used, though I was unaware of the
possibility of an f thorn ligature until reading this thread.

While I am adding some more ligatures to the collection, if anyone wants any
other characters added in, please email me privately.  I found that encoding
the golden ligatures collection led to me learning about a number of
interesting aspects of typography of which I was previously unaware, so it
was an educational experience for me as well as being fun and useful in
practice within its limits.

Naturally, my production of the golden ligatures collection does not of
itself produce fonts which contain these ligatures, yet it does help a
little in making the possibility topical, so maybe a few of the font
designers who read this list might perhaps include more ligatures in their
fonts.

An interesting aspect of my codification of ligatures is that any documents
produced using them will not be standard Unicode documents.  However, the
encodings might be very useful so that someone may make artistic typography
fonts using a font production program such as the Softy shareware program
and be able to produce pages of hardcopy print out locally using such a
font, where a ligature character such as ct may be encoded as U+E707.
Naturally, there is nothing to stop anyone encoding a ct ligature however he
or she chooses within the Private Use Area, yet my collection of encodings
is a published, consistent set which would help with interoperability of
fonts from various artists.

I am currently producing a typeface which I am calling Quest text so that I
can have a typeface available which has whatever ligatures I choose.  I have
so far produced all of the lowercase letters, the digits, full stop and
twelve capitals and also lowercase long s, ash, eth and thorn.  I am hoping
that the font will be useful for English, Old English and Esperanto in
particular, though I can add characters where I choose, using both regular
Unicode code points and Private Use Area code points, both from the golden
ligatures collection and from other published Private Use Area encodings.  I
am producing Quest text using the Softy program and am finding it a very
effective program.

More recently, a new development, designed primarily as a means to produce
displays of languages of the Indian subcontinent upon the screens of
interactive televisions using the font format capability of those
interactive televisions using the ligatures of those languages, may be a
very useful way to use the ligature encodings of the golden ligatures
collection as well.

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

So, a document in which one wishes to have a ct ligature would have the ct
ligature encoded as ct or maybe c ZWJ t depending upon the circumstances,
and a .etf file would have one or both of the following lines, depending
upon the application.

ct U+EBEF U+E707(that is, four characters)

c ZWJ t U+EBEF U+E707(that is five characters)

Thus the combination of the golden ligatures collection, an .etf file and
various software tools to use them could be an effective way of allowing
people to use ligatures on a wide variety of platforms while having the
documents containing the original texts encoded using regular Unicode
characters only.  A text file containing codes from the golden ligatures
collection would thus only be used locally on a temporary basis for a
current task, though to useful effect.

Some of my small fonts produced using Softy are available at the following
web page.

http://www.users.globalnet.co.uk/~ngo/font7001.htm

William Overington

10 March 2003

Unicode 4.0 beta characters.

2003-02-24 Thread William Overington

I have now produced a small font which contains my implementation of the
U+2614 Umbrella with rain drops character, which is one of the new
characters in the Unicode 4.0 beta documents.

http://www.users.globalnet.co.uk/~ngo/font7001.htm

I have had a go at producing a glyph for U+26A0 Warning sign but am finding
it a learning exercise to make it both crisply legible at 12 point yet
artistic at larger sizes within the tight constraint of a triangular
surround which must itself be clear.  When producing the Unicode Standard,
is there a point size at which glyphs should be recognizably displayable
which is part of the criteria for characters?  For example, is it regarded
as fine if some characters in some languages cannot be displayed clearly
below, say, 24 point?   The map flags look interesting yet hopefully
straightforward and I hope to have a go at them too, also the high voltage
sign.  The high voltage sign has the note best glyph to be found in the
beta document U40-2600.pdf and I wonder what is the significance of that
note please?

In the same U40-2600.pdf document are six Yijing monogram and digram
symbols.  I wonder if someone could please say something about these
characters as to their meaning.  Also, and this is I feel an important issue
for the beta process which could be of importance for other characters,
could someone please give some guidance as to how these characters should be
implemented as a piece of electronic type as there is no indication in
U40-2600.pdf as to how this set of six symbols should sit within a character
cell and relate to one another as to whether they should join to each other
or must be clear of each other when side by side and how they should line up
with text characters if in a font which contains many characters of various
types.

William Overington

24 February 2003

[Private Use Area application] A font for research in multimedia authorship.

2003-02-22 Thread William Overington

Following discussion yesterday in another thread about changing text colour
in multimedia text files, I have today produced a font as a tool for
research in multimedia authorship.  I have devised glyphs for 19 of the
courtyard codes relating to text colour and encoded them in a font.

The font is available for free download from our family webspace at the
following address.

http://www.users.globalnet.co.uk/~ngo/COURTCOL.TTF

People interested in having a copy of this font may find the following
documents useful in applying the font.

http://www.users.globalnet.co.uk/~ngo/courtcol.htm

http://www.users.globalnet.co.uk/~ngo/court000.htm

I have been experimenting with using the font with WordPad and Word 97 on a
PC, where the glyphs give a monochrome indication to an author of which
colour is being used.  For example, I mixed English text in the Arial font
with codes from this font in one document.  The whole text of English and
colour codes can then be copied onto the clipboard and pasted into SC UniPad
(downloadable from the http://www.unipad.org webspace) in order to produce a
compact file without the text formatting of Word or Word 97.

I am hoping to carry out some experiments whereby such text can then be
pasted into a text box of a Java applet and produce appropriately coloured
text.

Readers who would like to comment about the design of the glyphs or about
the research are welcome to email me.  I have found it interesting to design
glyphs to represent colours in monochrome.

William Overington

22 February 2003

Re: XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)

2003-02-21 Thread William Overington

specification relates to plane 14 tags and how the Unicode specification
relates to element names in an XML file.  I feel that that is the essential
point which I am trying to convey.

 1. The text MUST be transmitted in UTF-8 (because the CEO of
Overington Inc. thinks that UTF-8 is cute).

Well, I, as an individual, was thinking in terms of UTF-16.

2. The transmission protocol MUST implement some form of language
tagging (the details of the protocol  are up to me). Particularly, the
system needs to distinguish English text from Italian text, because the two
languages will be displayed in different colors (green and red,
respectively).

Green for English, red for Italian.  Are you by any chance a fan of the
liveries of motor racing cars of the 1950s?

 3. The OveringtonHomeBox(tm) can only accept UTF-8 plain text
interspersed with escape sequences to  change color. The escape sequences
have the form {{color=1}}, where 1 is the id of a color (blue, in this
case).

If I were writing a one-off program I would use U+F3E2 for red and U+F3E5
for green.

http://www.users.globalnet.co.uk/~ngo/court000.htm

http://www.users.globalnet.co.uk/~ngo/courtcol.htm

However, the issue is not, in my opinion, about one-off programs and
proprietary encodings.  The issue is ensuring that plane 14 tags are not
totally deprecated so that, as an option for use with particular protocols,
they continue to be available so that encodings for general computing usage,
for general and widespread information availability, on a rigorous
non-proprietary encoding basis may be used.  Certainly, within certain
multimedia programs which might at some future time run upon the DVB-MHP
platform, codes such as U+F3BC might be particularly useful, yet that is a
matter which an individual programmer needs to consider when writing such a
program: it is not a standard system, though it is not a proprietary system
either in the usual sense of the word as those codes are published with the
hope of being a consistent set which people may use if they so choose.
Please note that, notwithstanding your pretend scenario of a company, that
that is not the way I am proceeding with my research.  I invented the
telesoftware concept and am doing what I can to get it used effectively and
to ensure that it can have scope for future development of content.  I
regard the continued availability of plane 14 tags as important, as it means
that content authors can then use codes which do the job by finding them in
an international standard, without having to use what I suggest.  I could
devise all manner of codes using plane 16 if I wish, copying the plane 14
tags across as a start, yet those codes, no matter how fine, no matter how
well publicised in research papers or in a book or whatever, those codes
would never have the provenance of the codes in an international standard.
That is why, although Private Use Area codes do certainly have a use for
research and for concept proving, and also for limited use between two or
more people studying something special topic, Private Use Area codes, and
XML element names made up by a programmer or even by a committee which is
not a standards committee, simply do not come into the same class of
provenance quality as plane 14 tags which are in the Unicode standard.  That
is why I hope that the Unicode Technical Committee will not totally
deprecate tags and will leave open the possibility of considering adding
additional tag types at some time in the future.

 4. The text files being transmitted MUST be  small (bandwidth is
limited!).

Yes, keep the text file size down, bandwidth is limited.

 5. The processing program must be  small (on-board memory is
limited!).

No, for DVB-MHP the on-board memory is fairly large.  The transmission link
is the key issue.

 6. A working prototypes must be ready by tomorrow.

Well, this is about the way that these things will be done well into the
future.  The idea of Unicode is that it will last, not be swept away within
ten or twenty years because it is outdated for future needs.

I have had a look through the example solutions, but, I do need to spend
some more time studying them and hopefully trying out the executable
programs with some other data files.  In the meantime I would be interested
to know any further views of Marco and the views of others on this topic.

Thank you for taking the time to write your post and prepare the programs.
I feel that it is important that this matter be studied thoroughly.

William Overington

21 February 2003

Leonardo da Vinci and printing.

2003-02-21 Thread William Overington

I recently enjoyed watching a two-part television programme entitled
Leonardo's Dream Machines on Channel 4, which is a television channel in
England.  Television programmes often get shown around the world and I can
certainly recommend this one if you get the chance to watch it on a
television channel where you live.

As I am interested in typography I noticed the typeface which was used for
the captions and the end credits.  The font turns out to be Da Vinci forward
and can be viewed at the following website.  It is based on the handwriting
of Leonardo da Vinci.

http://www.p22.com

In addition, it can be tried out on-line at the typecaster facility which is
at the following web address.

http://www.p22.com/typecaster/caster.html

I tried out various phrases and indeed made a few Print Screen copies of the
texts which I produced.

Leonardo da Vinci was born in 1452 in Vinci, a village near Florence, Italy.

In 1456 the first printed book was published, in Mainz, a city in what is
now Germany.

I began to wonder how Leonardo da Vinci relates to the invention which took
place at about the time he was born, and how that compares and contrasts
with how people today relate to the computer, the internet and the web.

Leonardo da Vinci could read and write.  Searching the web earlier today the
only reference to Leonardo da Vinci and printing that I could find was a
short note that he had made some prints of plant leaves.

Does anyone happen to know if Leonardo da Vinci read or owned printed books
please?  Was he involved in printing technology or letter design for fonts
or for plaques or stone engraving?  Are there any individual copies of books
surviving today for which there is provenance that Leonardo da Vinci ever
sat reading it, even by circumstantial evidence such as for example perhaps
a reference to reading some book while in the service of someone and that
person's collection of books having survived with provenance to the present
day?

I find it quite fascinating that Leonardo da Vinci lived in Europe at the
same time as when printing with movable type developed in Europe and wonder
whether when he first became aware of printed books whether they were an
amazing new thing to him or just came along as an everyday part of how
things were as but one of the things he found out about as he grew up.

By the way, if you do have a look at the typecaster facility at the website
mentioned above, various fonts may be tried.  I tried various fonts and
particularly like the Morris Troy font.  I found that characters such as e
acute and A umlaut are available in this font, using Alt 130 and Alt 142
respectively when keying text into the typecaster window.  However, it is
not clear to me as to how those characters are stored in the font itself,
that is whether they use Unicode layout or an older layout.

William Overington

21 February 2003

Hot Beverage font.

2003-02-18 Thread William Overington

Thinking that the new to Unicode 4.0 symbol U+2615 Hot Beverage might be
very useful in the preparation of meeting agendas and the like and also
wishing to try to design a glyph which would look good particularly at a 12
point size in documents, I have produced a font named Hot Beverage which I
have now added into our family webspace.

The font can be downloaded from the web from the following address.

http://www.users.globalnet.co.uk/~ngo/HOTBEVER.TTF

The font contains the Hot Beverage glyph which I have designed, accessible
at U+2615, which is decimal 9749, and also at lowercase h, for convenience
of use when used with the Microsoft Paint program.  The font also includes a
space character.

I have tested the font with WordPad and Word 97.  It looks quite good at 12
point in black, as in an agenda document for a meeting.  It also looks good
in the colour which WordPad calls green, which is a dark green colour, at
300 point, and also looks good in a fun logo at 36 point following the
wording Peppermint Tea Shoppe in Old English Text in the same dark green
colour.  Various other sizes mostly look good though 18 point does look a
bit strange.

An experiment with PowerPoint produced a nice slide with the following,
centred in a text box, in black at 36 point.

There will now be an intermission
for refreshments.

Below which, in colour red=51, green=204, blue=51 at 72 point, centred in a
text box, the Hot Beverage glyph.

Hopefully this font will be a useful item on computers around the world.

William Overington

18 February 2003

Re: Plane 14 Tag Deprecation Issue

2003-02-15 Thread William Overington

 in plane 14.

William Overington

15 February 2003

Re: Plane 14 Tag Deprecation Issue

2003-02-14 Thread William Overington

 the range U+EC00 through to U+EFFF for data and some
codes from the range U+EB00 through to U+EBFF for control codes, some of
these codes being particular to the eutovios system and some, such as the
codes for the colours of the objects, being the same codes used for
specifying colours in the eutocode graphics system generally.  The objects
thus all have symmetry about the vertical axis, which makes drawing out a
scene simpler than if objects such as cubes were in use.  The spheres
display as discs, the cylinders display as filled rectangles and the cones
display as filled triangles, each displaying the same shape regardless of
the angle from which they are viewed: they do change size though depending
upon how near they are to the present viewing point.  An interesting
activity is thinking about what objects have a shape which is symmetrical
around a vertical axis and which would look good in such a program and which
are expressible with a minimum number of supplied parameter values once one
knows which type of object has been chosen.  It is essentially just some of
those objects which could be produced in brass using a lathe only and
without using any of the screw cutting features of a lathe.

I have tried to find out what has happened to ViOS.  Does anyone know or
remember having seen a news item in a magazine about what has happened
please?  I recognise that this question is somewhat off-topic but I have
tried to find out in various places and have been unable to do so and this
list does seem to have an ability of providing answers to many questions.

Anyway, in relation to plane 14, I am hoping that in time it will be
possible for such a graphics system, including various three-dimensional
capabilities to become formally encoded in plane 14 as a ring-fenced option
for use with particular protocols.  It is at an early stage at present, so
what becomes encoded may have far greater possibilities than what is being
encoded now.  Yet what is being encoded now does work and works well.  It
allows a stream of Unicode characters from a text file to produce a
three-dimensional scene through which an end user can then move and select
objects.

This is all very futuristic and needs a lot more doing to it.  At present I
use a Java applet which is an extension of the original eutocode graphics
test system which is on the web.

http://www.users.globalnet.co.uk/~ngo/eutocodegraphics.htm

The test system for the eutovios system has buttons to simulate the push
buttons of an infra-red remote control device of a DVB-MHP television set.
Testing is by preparing a string of Private Use Area characters in the SC
UniPad program obtainable from http://www.unipad.org and then using a copy
and paste so as to paste the string into the text box of the applet, the
draw button of the applet then being pushed to produce the starting point
display.

However, I feel that I do need to mention this now as the Unicode Technical
Committee is about to consider what to do about tags and this is a related
issue because it relates to plane 14.  Perhaps all of plane 14 needs to be
declared an area considered as deprecated in general terms, yet where codes
for use with particular protocols can be defined by the Unicode Technical
Committee, so that the potential for using such futuristic developments and
encoding them within the Unicode framework is preserved?

William Overington

14 February 2003



For discoveries,
In Private Use Area
Phaistos Disc Script waits

Haiku written by William Overington.

Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: IndicDevanagari Query))

2003-02-13 Thread William Overington

Friday 14 February 2003 is the stated closing date for responses to the
Public Review of the issue of whether to deprecate plane 14 tags.

I am writing to enquire if the solution to mark the tags as deprecated
within the file PropList.txt yet including a wording in the Unicode
Specification that whereas the plane 14 tag characters are marked as
deprecated within the PropList.txt file that they are not deprecated in the
full sense of being deprecated but are in fact classed as reserved for use
with particular protocols , would be acceptable to all.

In particular, would such a solution be acceptable to Doug Ewell and others
as satisfying all of the points made in his paper In defense of Plane 14
language tags which was posted in this list on 2 November 2002?

If so, I wonder if I might please suggest that people discussing the matter
within the Unicode Technical Committee might like to consider Doug's paper
in some detail and perhaps consider making reference within the Unicode
Specification to some of the ideas which Doug pointed out, such as the
potential for using tags for speech synthesis and so on.  In addition, if
the tags are described as reserved for use with particular protocols ,
then it would seem reasonable to keep open the possibility to allow other
types of tags to be specified in the future if a need arises, as Doug
suggests, rather than using plane 14 tags only for languages as at present.
It would seem entirely reasonable that the Unicode Technical Committee could
possibly at some future meeting define one or more additional types of tag
within the unused lower part of plane 14 within the ring-fenced reserved
area.

William Overington

13 February 2003

Re: Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))

2003-02-07 Thread William Overington

I feel that as the matter was put forward for Public Review then it is
reasonable for someone reading of that review to respond to the review on
the basis of what is stated as the issue in the Public Review item itself.

Kenneth Whistler now states an opinion as to what the review is about and
mentions a file PropList.txt of which I was previously unaware.

Recent discussions in the later part of 2002 in this forum about the
possibilities of using language tags only started as a direct result of the
Unicode Consortium instituting the Public Review.

The recent statement by Asmus Freytag seems fine to me.  Certainly I might
be inclined to add in a little so as to produce Plane 14 tags are reserved
for use with particular protocols requiring, or providing facilities for,
their use so that the possibility of using them to add facilities rather
than simply using them when obligated to do so is included, but that is not
a great issue: what Asmus wrote is fine.

Public Review is, in my opinion, a valuable innovation.  Two issues have so
far been resolved using the Public Review process.  Those results do seem to
indicate the value of seeking opinions by Public Review.

As I have mentioned before I have a particular interest in the use of
Unicode in relation to the implementation of my telesoftware invention using
the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system.
I feel that language tags may potentially be very useful for broadcasts of
multimedia packages which include Unicode text files, by direct broadcast
satellites across whole continents.  Someone on this list, I forget who, but
I am grateful for the comment, mentioned that even if formal deprecation
goes ahead then that does not stop the language tags being used as once an
item is in Unicode it is always there.  So fine, though it would be nice if
the Unicode Specification did allow for such possibilities within its
wording.  The wording stated by Asmus Freytag pleases me, as it seems a
good, well-rounded balance between avoiding causing people who make many
widely used packages needing to include software to process language tags,
whilst still formally recognizing the opportunity for language tags to be
used to advantage in appropriate special circumstances.  I feel that that is
a magnificent compromise wording which will hopefully be widely applauded.

In using Unicode on the DVB-MHP platform I am thinking of using Unicode
characters in a file and the file being processed by a Java program which
has been broadcast.  The file PropList.txt just does not enter into it for
this usage, so it is not a problem for me as to what is in that file.  My
thinking is that many, maybe most, multimedia packages being broadcast will
not use language tags and will have no facilities for decoding them.
However, I feel that it is important to keep open the possibility that some
such packages can use language tags provided that the programs which handle
them are appropriately programmed.  There will need to be a protocol.
Hopefully a protocol already available in general internationalization and
globalization work can be used directly.  If not, hopefully a special
Panplanet protocol can be devised specifically for DVB-MHP broadcasting.

On the matter of using Unicode on the DVB-MHP platform, readers might like
to have a look at the following about the U+FFFC character.

http://www.users.globalnet.co.uk/~ngo/ast03200.htm

Readers who are interested in uses of the Private Use Area might like to
have a look at the following.  They are particularly oriented towards the
DVB-MHP platform but do have wider applications both on the web and in
computing generally.

http://www.users.globalnet.co.uk/~ngo/ast03000.htm

http://www.users.globalnet.co.uk/~ngo/ast03100.htm

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

The main index page of the webspace is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

7 February 2003

The result of the plane 14 tag characters review.

2002-11-08 Thread William Overington

As the Unicode Consortium invited public comments on the possible
deprecation of plane 14 tag characters, will the Unicode Consortium be
making a prompt public statement of the result of the review as soon as the
present meeting of the Unicode Technical Committee is completed, or even
earlier if the decision of the Unicode Technical Committee has already been
finalized?

William Overington

8 November 2002

Re: A .notdef glyph

2002-11-08 Thread William Overington

 designed a .notdef glyph in response to the exercise.
Certainly, now that there has been a discussion about .notdef glyphs and
references to various documents and examples, I might now think about
another design.

However, what I produced was a work of art produced without the knowledge
which might have constrained my thoughts had I previously known about the
various documents and examples beyond the plain black rectangle.  A sort of
primitive art, unconstrained by the chains of knowing about what is usually
expected of such a design?

William Overington

8 November 2002

Re: A .notdef glyph

2002-11-07 Thread William Overington

Michael Everson wrote as follows.

At 18:29 + 2002-11-06, William Overington wrote:

Thank you for the design brief.

Oh, my stars.

If anyone wants to make a graphic involving stars using Microsoft paint, he
or she might like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/pai3.htm

These graphics were produced using 1456 object code programs.

http://www.users.globalnet.co.uk/~ngo/14563100.htm

http://www.users.globalnet.co.uk/~ngo/1456.htm


Here is my design.

Better hurry and copyright it.


Actually, since you mention copyright, my understanding is that, under
United Kingdom law, that copyright existed in the design from the moment
that I put it into permanent form.  As there are various international
treaties and conventions about copyright, then I think that there is
copyright in my design through most of the world.  Copyright is a very
interesting aspect of the law.  Copyright does not depend upon any
assessment of artistic or literary merit at all.  Copyright is a very
important and valuable intellectual property right and my understanding is
that copyright licensing earns the United Kingdom economy a great amount of
money every year.

Providing evidence to support a claim of copyright is another matter.
However, I think that the fact that my design is archived in the mailing
list archive of the Unicode Corporation is high quality evidence in relation
to copyright.

The design consists of a single contour in as large a square box as is
possible for the particular font.

In my prototype I used a box 2048 font units by 2048 font units.  In this
case, the value of n is 1024.

The contour has seven points, the first point and the last point being at
the same place.

Point 1 is at (0,0) and is on the curve.
Point 2 is at (0,2n) and is off the curve.
Point 3 is at (2n,2n) and is on the curve.
Point 4 is at (2n,n) and is on the curve.
Point 5 is at (n,n) and is on the curve.
Point 6 is at (n,0) and is on the curve.
Point 7 is at (0,0) and is on the curve.

What curve? Your specification here produces a rectangular figure.

Thank you for trying it out.

The shape is a one piece solid within the area of a square, though the
square is not drawn.  The design idea came from wanting to have an arc which
goes against the normal arc of design of a graphical user interface of the
input screen of a computer program.  I started with a two arc design, from
point 1 to point 3 as at present, with another arc going back from point 3
to point 1 influenced by an off curve point in the bottom right corner of
the square.  This was rather like the hysteresis curve of a magnet.  Yet it
was too symmetrical.  The third example was what is the present design.  The
second example had a curve from the present point 4 to the present point 6
influenced by an off curve point at the present point 5.  Yet it looked too
symmetrical.  I wanted a design which would be awkward-looking in the
display, so as to draw the eye to it.

I used the Softy shareware program.  There is one curve and four straight
lines in the contour.  The curve is a quadratic Bézier curve from point 1 to
point 3, note please how point 2 is off the curve.  That is, point 2
influences the direction of the curve from point 1 to point 3.  The curve
starts off from point 1 instantaneously heading for point 2, but quickly
turns away from that direction so that it can make the smooth transition in
direction which is necessary so that the curve appears to arrive at point 3
instantaneously as if it had come from the direction of point 2.

I hope that you like the design.

But it fails to express .notdef in any meaningful way.

I think I understand what you mean.  Yet the meaning of symbols is often
part of the culture in which they exist.  So, as time goes by, perhaps this
symbol will become to have the meaning of being a .notdef symbol (in the
sense of one of the various possible .notdef symbols in widespread use)
perhaps being known as the .notdef symbol which features in that famous
thread in the archives of the Unicode Consortium's mailing list.

Perhaps a whole thread on symbols and their meaning is on the point of
starting in this mailing list.  For example, U+2603 has two meanings, one
the picture meaning, one the other meaning stated in the text of the
U2600.pdf document.  Do U+2622 and U+2623 convey their now well-known
meanings in any different manner to the way in which my design conveys the
.notdef concept?

How does U+2658 express the meaning of which directions are permissible to
move?

How is it that U+2678 brings thoughts of models of locomotives and U+2677
does not?

So, maybe it is not a matter of my design failing to express .notdef in
any meaningful way, perhaps it is a matter that my design, an abstract
shape, does express .notdef in a meaningful way because now lots of people
know that that is the intended meaning.

Expressing meaning is a very interesting matter.  Some readers might perhaps
be interested

Re: ct, fj and blackletter ligatures

2002-11-07 Thread William Overington

Peter Constable wrote as follows.

You'll probably come back to say, But I was talking about 'ordinary
TrueType fonts'.

No I won't.  It's not my personality type to do so.  Have a look at the
Myers Briggs Type Indicator for personality type, the key message is that
not everybody has the same personality type.

I may argue a point if I consider it right to do so, but I do not argue
something just for the sake of arguing or because of some notion of not
being willing to lose face or something like that in accepting that I did
not previously know something.

I mean, that is pointless and is a waste of time.  Anyway, it is not my
nature to be like that.

So, I did not know the correct situation and you have helped me by
explaining more about it.  Thank you.

If you insist on an invalid assumption, there's no way
to argue against it. It's like saying, software with a character-mode UI
is not capable of displaying bitmap graphics -- true, but irrelevant.

But I won't, it's not my personality to so so.

I genuinely did not understand and I am grateful to you for explaining the
matter to me.

If you really want a dialog box to popup providing notification to the
user, I'm wondering how many times as the file is opened and a page is
rendered you'd like this popup to appear?

Once.  A notification in a dialogue box that the problem exists with a
button to click for further detailed information as to which character or
characters, how many times for each, and on which pages and lines.

17 times if there are 17
instances of  c, ZWJ, t  that are not rendered as a ct ligature?

No, just the once.

Not on
my system, thank you.

Certainly not!

Thank you for explaining the matter about the TrueType fonts.

William Overington

7 November 2002

Re: ct, fj and blackletter ligatures

2002-11-06 Thread William Overington

John Hudson wrote as follows.

At 02:18 11/5/2002, William Overington wrote:

Not at 02:18, it was 09:18.


Well, I suppose it depends upon what one means by a file format that
supports Unicode.  The TrueType format does not support the ZWJ method and
thus does not provide means to access unencoded glyphs by transforming
certain strings of Unicode characters into them.

All three of the current 'smart font' formats are extensions of the
TrueType file format. Structurally, the only difference between a TrueType
font and an OpenType font is the presence of *optional* layout tables that
support glyph substitution and positioning. Officially, the only difference
is the presence of a digital signature.

I am unsure as to
whether, in formal terms, TrueType is a file format that supports
Unicode
as it does not allow the ZWJ sequences to be recognized.

Of course TrueType allows ZWJ sequences to be recognised. ZWJ is a
character that can appear in Unicode text and in the Unicode cmap of a
TrueType font. If a font does not contain a ligature for the sequence, or
does not contain layout information to render the sequence as a ligature,
the text is still processed according to the Unicode Standard, i.e. nothing
happens.

I am thinking here of ordinary TrueType fonts on a Windows 95 platform and
on a Windows 98 platform.  I was under the impression that the reason that
an ordinary TrueType font will not process a ZWJ sequence on those platforms
was that both the operating system and the ordinary TrueType font do not
have the capabilities to process ZWJ sequences.  My understanding is that
even an OpenType font with ZWJ sequence facilities will not work on a
Windows 95 or Windows 98 platform.  However, I thought that the ordinary
TrueType format would not support ZWJ sequences in itself and that not only
would a later operating system be needed but that also an OpenType font
would be needed and that an ordinary TrueType format would not be able to do
the job.  Was I wrong in that thinking?  My experience of fonts is very
limited.  I have tried making a few example TrueType fonts using the Softy
shareware facility and I wonder whether I have got it wrong as to what an
ordinary TrueType font will do when an ordinary TrueType font is made with
an expensive professional font making program.

To say that a font only supports Unicode if it can process and
render as a ligature every usage of the ZWJ character is foolish: every
font would have to contain glyphs and substitution lookups to support every
potential use of ZWJ in every possible
c+ZWJi+i+ZWJi+r+ZWJi+c+ZWJi+u+ZWJi+m+ZWJi+s+ZWJi+t+ZWJi+a+ZWJi+n+ZWJi+c+ZWJ
i+e.

I have had a long think about this.

Suppose that a sequence of Unicode characters in a plain text file is mostly
in English and  has the sequence c ZWJ t in it at various places.

Suppose that the font is an advanced format font which does not have a
special glyph for the sequence c ZWJ t yet will simply render it as ct just
as if the ligature had not been requested.

As far as I know, there is no requirement in Unicode that the rendering
system should notify, perhaps using an Alert dialogue box or similar, the
end user that the ZWJ request has been made yet not fulfilled.

Can an advanced format font supply such a message to the rendering system
for onward notification of the end user?

It seems to me that having the ZWJ mechanism in the Unicode Standard yet
having no reporting mechanism if a specific request is not fulfilled is
unfortunate.  As a font could have its own set of ZWJ sequences which it
recognizes, anything from an empty set to a set consisting of a full
complement of ligatures for Fraktur, it seems to me that whilst every font
would certainly not have to contain glyphs and substitution lookups to
support every potential use of ZWJ in every possible circumstance it would
not be unreasonable to hope that fonts could have a standardized reporting
mechanism as to whether a request for a particular ZWJ sequence has been
fulfilled.  Also, perhaps there could be a method for asking a font to
please display all its ZWJ sequences and their results.

Now it might be that some advanced font formats can do such things, I do not
know at present.

While on this topic, perhaps a standradized method of a font reporting that
it has no glyph for a character which it is asked to render might be a good
idea.  I am aware that a black line box could be displayed, yet in a long
document, one of those might easily slip past a general viewing of the text
in a printshop.  Also, perhaps some method of asking a font to declare a
list of the code points for which it has a specific glyph would be helpful.
Again, perhaps some advanced font formats have these abilities, I do not
know at present.

There seems to be a gap between the Unicode Technical Committee encoding
characters into a file and the process of making sure that the desired text
is rendered correctly on an end user's platform with good provenance.  I
feel

A .notdef glyph (derives from Re: ct, fj and blackletter ligatures)

2002-11-06 Thread William Overington

John Hudson wrote as follows.

Here's an exercise for your enthusiasm, William: devise the form of the
perfect .notdef glyph. It needs to unambiguously indicate that a glyph is
missing, i.e. it should be something that can easily be mistaken for a
dingbat, and it needs to be easy to spot in proofreading in both print and
onscreen (some applications, e.g. Adobe InDesign, make the latter a bit
easier by applying colour highlight to the .notdef glyph).

Thank you for the design brief.

Here is my design.

The design consists of a single contour in as large a square box as is
possible for the particular font.

In my prototype I used a box 2048 font units by 2048 font units.  In this
case, the value of n is 1024.

The contour has seven points, the first point and the last point being at
the same place.

Point 1 is at (0,0) and is on the curve.
Point 2 is at (0,2n) and is off the curve.
Point 3 is at (2n,2n) and is on the curve.
Point 4 is at (2n,n) and is on the curve.
Point 5 is at (n,n) and is on the curve.
Point 6 is at (n,0) and is on the curve.
Point 7 is at (0,0) and is on the curve.

This has the effect of making the glyph easy to draw, solid enough to be
specifically noticeable, distinctively shaped with both a curved line and
straight lines so that it stands out and in an arc which goes against the
normal arc of design of a graphical user interface of the input screen of a
computer program so as also hopefully to make it more noticeable.  In
addition, the design has white space set out in a manner such that where
several copies of the glyph appear in sequence on a page of text, they are
easily counted.

I hope that you like the design.

William Overington

6 November 2002

Re: ct, fj and blackletter ligatures

2002-11-05 Thread William Overington

Thomas Lotze wrote as follows.

William Overington wrote:

 I don't know for certain but I suspect that it is that font designers
 do this so that people can use an application such as Microsoft Paint
 to produce an illustration using the font.  In the absence of regular
 Unicode code points for the ligatures, a font designer has either to
 use the Private Use Area and be Unicode compatible or make a
 non-Unicode compatible font, if the font designer wishes people to be
 able to have direct access to the ligature characters.

Judging from what I' learned by now, this is not true: If a font
designer wants to make a Unicode-compatible font, he has to use a font
file format that supports Unicode, and those formats provide means to
access unencoded glyphs by transforming certain strings of Unicode
characters into them.

Well, I suppose it depends upon what one means by a file format that
supports Unicode.  The TrueType format does not support the ZWJ method and
thus does not provide means to access unencoded glyphs by transforming
certain strings of Unicode characters into them.  I am unsure as to
whether, in formal terms, TrueType is a file format that supports Unicode
as it does not allow the ZWJ sequences to be recognized.  Please note that
my sentence did have if the font designer wishes people to be able to have
direct access to the ligature characters.  However, certainly, a font
designer using an advanced font format may well not wish people to be able
to have direct access to the ligature characters.  The paragraph was
replying to your question as to why someone who wants to set and print out a
page of Fraktur at present is in practice likely to have to use a font with
the ligatures encoded with code points less than 255.  Please know that I am
not seeking to be pedantic over the meaning of the phrase a file format
that supports Unicode, it is just that I get the impression that you might
possibly have not quite understood that some font formats widely used for
Unicode encoded characters, such as the TrueType format, do not support the
ZWJ glyph substitution process or, in fact, any glyph substitution process,
such as noticing the two letter ct sequence and substituting a ct ligature
glyph within the font.

And if I understand it correctly, Unicode
compliance can only be achieved with all of compliant documents, fonts,
and renderers. So there appears to be no need for direct accessibility
of ligatures, alternates etc.

I said compatible, I did not say compliant and did not mean compliant.  I
was meaning compatible, in the sense that, if one wishes to produce a font
using the TrueType format and that font is to include glyphs for ligatures
such as ct and ppe, how does one do it so that the method used does not
conflct with Unicode.  Using Private Use Area code points avoids conflicting
with the regular Unicode code points used for other characters.

 There are some articles about using WordPad and Paint to produce
 graphic effects with large characters and gold textures and so on in
 our family webspace, together with the gold texture file and some
 other texture files too.

And what's the relevance to Unicode of that?


Well, in direct terms probably nothing.  However, as this is a widely
distributed mailing list it might be that some readers, having read about
the matter of using ligature characters in Paint and the way that one needs
a font with code points less than 255 in order to access the ligature
characters from Paint, might like to have a go at producing such graphics,
so, having available some articles on the matter, I mentioned them.

If one considers the Gutenberg sample font, the ct ligature is available as
well, at Alt 0201 using Paint.  One could use Wordpad to get the character
as well.  Yet, suppose that one has an advanced format font with a ct glyph
within it yet where the font does not provide a direct code point access
glyph, but only allows a ct ligature to be displayed using a combination of
computer hardware and software which supports the advanced font format.  How
is one going to get that ct ligature to display if one does not have access
to that hardware and software combination?  Now certainly the attempt has
been made to trivialise the matter by reference to very very old computer
systems, yet here the problems arise with PCs manufactured in 1999.

May I add that this posting is trying to be helpful to answer questions
which you have posed, I am not seeking to reopen the discussion of whether
the Unicode Technical Committee should encode any more precomposed
ligatures.  I raised that issue before the August 2002 meeting of the
Unicode Technical Committee, the committee discussed the matter at the
meeting, formed a consensus view and that consensus view was minuted and the
minutes have been published.  It is simply a matter that the Unicode
Technical Committee is not going to encode any more ligatures, I have my
golden ligatures collection on the web and if people choose

Re: ct, fj and blackletter ligatures

2002-11-04 Thread William Overington

Thomas Lotze asked.

Why below 255?

I don't know for certain but I suspect that it is that font designers do
this so that people can use an application such as Microsoft Paint to
produce an illustration using the font.  In the absence of regular Unicode
code points for the ligatures, a font designer has either to use the Private
Use Area and be Unicode compatible or make a non-Unicode compatible font, if
the font designer wishes people to be able to have direct access to the
ligature characters.

There is an interesting experiment which one can try if one wishes.

At the http://www.waldenfont.com website there are various Fraktur fonts for
sale.  There is a bundle of sample fonts available for download which have
only some of the letters and ligatures in the fonts.  The Gutenberg font has
the ppe ligature within it and indeed a number of other ligatures and
abbreviations and, in fact, a complete set of ten digit characters.

There is the manual gbpmanual.pdf available for download as well.  On page
14 of that document the ppe ligature is listed as being at 0171.

If on a PC one installs the sample Gutenberg font and then starts the
Microsoft Paint program and draws some text, selecting the Gutenberg font,
if one holds down the Alt key and keys 0171 using the digit keys at the far
right of the keyboard, hopefully the ppe ligature in the Gutenberg font will
appear on the screen.

In fact Paint only allows text up to 72 point.  However, if one uses
WordPad, then one can make the text something like 200 point in size if one
wishes and use the Print Screen facility to copy the display image onto the
clipboard.  On can then paste the image from the clipboard into Paint so
that one then has a 200 point Gutenberg ppe ligature in the Paint program.

There are some articles about using WordPad and Paint to produce graphic
effects with large characters and gold textures and so on in our family
webspace, together with the gold texture file and some other texture files
too.

http://www.users.globalnet.co.uk/~ngo

William Overington

4 November 2002

Re: Names for UTF-8 with and without BOM

2002-11-01 Thread William Overington

As you have UTF-8N where the N stands for the word no one could possibly
have UTF-8Y where the Y stands for the word yes.

Thus one could have the name of the format answering, or not answering, the
following question.

Is there a BOM encoded?

However, using the letter Y has three disadvantages for widespread use.  The
letter Y could be confused with the word why, the word yes is English,
so the designation would be anglocentric, and the letter Y sorts
alphabetically after the letter N.

However, if one considers the use of the international language Esperanto,
then the N would mean ne, that is, the Esperanto word for no and thus
one could use the letter J to stand for the Esperanto word jes which is
the Esperanto word for yes and which, in fact, is pronounced exactly the
same as the English word yes.

Thus, I suggest that the three formats could be UTF-8, UTF-8J and UTF-8N,
which would solve the problem in a manner which, being based upon a neutral
language, will hopefully be acceptable to all.

William Overington

2 November 2002

Re: ct, fj and blackletter ligatures

2002-11-01 Thread William Overington

The matter of ligatures arises fairly often in this discussion forum, often
in relation to German Fraktur, but also in relation to English printing of
the 18th Century and the use of fj in Norwegian.

In relation to regular Unicode the policy is that no more ligatures are to
be encoded.  My own view is that this should change.  However, that is
unlikely to do so.

Earlier this year, following from a posting about Fraktur ligatures, I
produced some encodings for ligatures using the Private Use Area.  I have
published them on the web at the following place.

http://www.users.globalnet.co.uk/~ngo/golden.htm

These are my own Private Use Area code point allocations for various
ligatures.  They are not in any way a standard yet they are a consistent set
which may be useful to those who wish to use them.  The only use I know of
any of them in a published font is in the Code2000 font, produced by James
Kass.  James uses the code points of this set for ct, fj and ffj in his
Code2000 font.

I feel that it might well be of interest to you, for your background
knowledge, to have a look at the encodings which I have produced, yet I
mention that these Private Use Area encodings are a matter of some
controversy.  Using them could lead to documents existing which could not be
text sorted alphabetically, or spellchecked.  However, if someone is just
wishing to produce a print out of some text with some ligatures in the text,
then the golden ligatures collection can be useful.  There seems to be a lot
of theoretical possibilities for doing ligatures with Unicode fonts using
advanced font technology using the latest computers, yet if, say, someone
wants to set and print out a page of Fraktur, that possibility does not
seem, as far as I know, to be a practically achievable result at the present
time using a piece of text encoded in regular Unicode using a font which
uses only regular Unicode encoding.  Indeed, it seems more likely that one
would need to use a Fraktur font with ligatures encoded with a code number
below 255, that is, a font which is not Unicode compatible.  The golden
ligatures collection is Unicode compatible, though, as I say, it is not a
standard.  It is just one person's self-published writing.  I like to think
of it as an artform, much as if I had produced a painting and placed a copy
of the painting on the web.  That is, it exists, it may be interesting to
people, yet it does not in any way prevent anyone else from doing something
different and it does not require anyone else to take any notice of it, yet
it is a cultural item in the world of art.

So, it depends what one is wanting to do.  If your enquiry is solely in
relation to formal encoding of ligatures in regular Unicode, then the golden
ligatures collection will be of no use to you.  However, if you are
producing a black letter font as part of your studies and would like to
encode ligatures, then the golden ligatures collection might perhaps be of
interest to you.  For example, if such a font were encoded using advanced
font technology, then the golden ligatures collection code points would not
be the way to approach the problem, though they could, if you so chose, be
used to provide an additional way of accessing the glyphs for people who
were trying to produce printouts using, say, a Windows 95 or a Windows 98
system.  If, however, such a font were produced as an ordinary TrueType
font, then in order to access the ligature glyphs you would need code points
in order to access the glyphs, one code point for each glyph.  In order to
be Unicode compatible, those code points would need to be in the Private Use
Area range of U+E000 to U+F8FF.  There is essentially complete freedom of
choice as to which code points to use, though the lower part is perhaps best
due to the suggestions about Private Use Area usage in the Unicode
specification.  However, the golden ligatures collection of code points is
there for your consideration if you wish.

Within my collection of code point allocations, ct is U+E707, fj is U+E70B,
ch is U+E708, ck is U+E709, tz is U+E70F.

These are all in the following document.

http://www.users.globalnet.co.uk/~ngo/ligature.htm

The ffj is encoded at U+E773 in the following document.

http://www.users.globalnet.co.uk/~ngo/ligatur2.htm

There are some black letter ligature encodings including pp at U+E76C and
ppe at U+E77E in the following document.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

The Private Use Area is described in Chapter 13, section 13.5 of the Unicode
specification.  There is a file named ch13.pdf available from one of the
pages in the http://www.unicode.org website.

The main index page of our family web site is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

2 November 2002

-Original Message-
From: Thomas Lotze [EMAIL PROTECTED]
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Date: Friday, November 01, 2002 12:28 PM
Subject: ct, fj and blackletter ligatures


Hi

Re: New Charakter Proposal

2002-10-31 Thread William Overington

Kenneth Whistler wrote the following.

I think Marku's suggestion is correct. If you want to do
something like this internally to a process, use a noncharacter
code point for it. If you want to have visible display of this
kind of error handling for conversion, then simply declare a
convention for the use of an already existing character.
My suggestion would be: U+2620. ;-) Then get people to share
your convention.

I find this suggestion curious, particularly coming as it does from an
officer of the Unicode Corporation.

The U2600.pdf file has U+2620 under Warning signs and has = poison in its
description.

Suppose for example that the source document encoded in UTF-8 is a document
about chemicals found around the house and that the U+2620 character is used
to indicate those which are poisonous.  If U+2620 is also used to include in
visible form an indication of an error found during decoding, then finding a
U+2620 character in the decoded document would lead to an ambiguous
situation.

One solution would be for the Unicode Consortium to encode an otherwise
unused character especially for the purpose.

If, however, the way forward is for an individual to declare a convention,
then I suggest that a sequence of at least two characters, the first being a
base character and the one or more others being combining items be used so
as to produce an otherwise highly unlikely sequence of characters.

For example, the character U+0304 COMBINING MACRON could be a good choice,
as it could be used to indicate a Boolean not condition with a character
which is otherwise unlikely to carry an accent.

As to which character to use for the base character, I am undecided, however
it should, in my opinion, not be U+2620 as that is a warning sign meaning
poison and could lead to confusion if looking at a document.

The advantage of a two character sequence is that a special piece of
software may be used to parse all incoming documents.  Only occurrences of
the otherwise highly unlikely sequence will be regarded as indicating a
conversion problem with the encoding.  If either of the two characters used
for the sequence is encountered other than with the rest of the sequence,
then it will not indicate the special effect.

In my comet circumflex system I use a three character detection sequence.
This means that in order to enter the markup universe then all three
characters of the sequence need to be present in sequence.  Thus, a piece of
software can scan all incoming text messages, even those which are not
designed to fit in with the comet circumflex system, and not indicate a
comet circumflex message if, say, a U+2604 COMET character arrives as part
of a message.

Using a two or three character sequence which is otherwise highly unlikely
to occur is, in my opinion, a good way to indicate the presence of a special
feature as it allows one to monitor all text files for the special feature
without causing undesired responses on text files which have been prepared
without any regard to the special feature.

I feel that the influence of posting a suggestion in this mailing list is
often greatly underestimated.  If you do post a suggested two or three
character sequence for the purpose that you seek, perhaps, if you wish,
after further discussion in this group, my feeling is that that sequence may
well become well known and accepted for the purpose very quickly, simply
because where there is a need for such a sequence then, in the absence of
any good reason not to do so, people will often happily use the suggested
format.

William Overington

1 November 2002

Re: The comet circumflex system.

2002-10-30 Thread William Overington

 fascinating
to get immersed in the simulation.  Hopefully it can go live on the web when
I can get it finished, then I can respond in comet circumflex language to
any emails in comet circumflex language which arrive from within the
simulation.  The use of the Songs about Landscape font is remarkably
effective in producing a web site for the purpose of the simulation, as
headings and paragraphs can be set out.

William Overington

30 October 2002

Re: Character identities

2002-10-30 Thread William Overington

Summary:

Would it be possible to define the U+FE00 variant sequence for a with two
dots above it to be a with an e above it, and similarly U+FE00 variant
sequences for o with two dots above it and for u with two dots above it, and
possibly for e with two dots above it as well?

I may not have got the details right about this suggestion, but, if the
general idea is thought good, I am sure that one of the experts on this list
could codify it properly.



It seems to me that there is middle ground between the two views being
expressed.

Suppose, for example, hypothetically, that there is a font available in
Germany, named Volksmusik which is a display font intended for setting
headings in modern German, such as for the headings in advertisements for
restaurants and so on, and that in that font the a umlaut, o umlaut and u
umlaut are all expressed using a mark which is something like a small letter
e.

Then, it seems to me that if a theatre restaurant manager has set out the
text required for a menu for the restaurant for some special gala evening to
be held soon using a plain text editor on a PC using a font such as Arial,
with a umlaut characters appearing many times, sometimes in headings and
sometimes in the main body of the text, then stored the text on a floppy
disc and walked down the road to the print shop and explained to the print
shop manager that here is the text content for the menus in Arial, could the
print shop please supply 500 menus using that text content yet jazzing it up
a bit so that the headings on each of the four pages is in a fancy typeface
in a different colour, then it should be quite straightforward for the print
shop manager to copy the text onto the clipboard from the Arial file, and
paste it into some other file, then change the font for each of the page
headings to the Volksmusik font, and make the font for the rest of the menu
some plainer font.  Thus, some a umlaut characters originally keyed by the
restaurant manager would display on the final menu as a with two dots above
and some a umlaut characters keyed by the restaurant manager would display
on the final menu as a with a small letter e above.

The restaurant manager is, however, studying part-time for a research degree
at the local university.  This involves producing essays about various
aspects of the printing of German literature, including quoting passages
from earlier times, taking care to distinguish clearly between a with two
dots above it and a with an e above it, all within using a plain text file,
so that there is maximum portability in sending copies of the essay to
various people, including the project supervisor at the University and the
editors of various learned journals.

How is the a with an e above it set, bearing in mind that there is no
precomposed a with an e accent above character in regular Unicode and also
that it would be nice if the text could be searched for keywords using just
the usual search methods?

Would it be possible to define the U+FE00 variant sequence for a with two
dots above it to be a with an e above it, and similarly U+FE00 variant
sequences for o with two dots above it and for u with two dots above it, and
possibly for e with two dots above it as well?

I may not have got the details right about this suggestion, but, if the
general idea is thought good, I am sure that one of the experts on this list
could codify it properly.

William Overington

30 October 2002

Re: Unicode plane 14 language tags.

2002-10-29 Thread William Overington

John Cowan commented.

William Overington scripsit:

 It seems to me that deprecating these language tags might be a bad thing
as
 the language tags could well have potential use in plain text files on
the
 DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) platform
in
 order to signal to a Java program accessing a text file the language in
 which any particular text is written.

Of course, deprecation does not mean that the characters cannot be used,
still less (what it means in most standards bodies) that they may be
removed in future.  Once in Unicode, always in Unicode.

Oh, that is interesting.  So what exactly is the public consultation about
deprecating the plane 14 language tags about?  If the Unicode Technical
Committee decided to deprecate the plane 14 language tags, what would be the
effect of that decision?


Nevertheless, on the facts described, I agree that this is an appropriate
use of Plane 14.  However, I am somewhat skeptical that the facts *are*
as described: is it really the case that *plain* text files are being
used here?

The DVB-MHP platform is a platform.  Java programs are broadcast in a
unidirectional, cyclic manner so as to produce effectively a disc in the
sky.  This uses my telesoftware invention.  The word telesoftware, and its
etymology, are in the Oxford English Dictionary, second edition, volume 17.
The telesoftware concept was also featured in the USA in the first issue of
the magazine Personal Computing, published back in the 1970s.

The Java programs are authored by content authors.  The Java programs may be
self-contained or may use support files.  The format of those support files
is up to the author of each Java program, though some formats such as png
(Portable Network Graphics) have special standing.  Plain text files is one
of the choices which a content author may choose to use.  A content author
could also use a fancy text format if he or she so chooses.  I am not
suggesting that all the files used by the Java programs which are broadcast
as telesoftware programs will be plain text, only that plain text files
could be used.  As the DVB-MHP system uses Java, and Java uses Unicode, then
the DVB-MHP system uses Unicode, and what is contained in Unicode is thus of
interest to content authors who would like to author content for the DVB-MHP
platform.

The DVB-MHP system is up and running on a regular basis in Finland and
Germany.  There is worldwide interest in the DVB-MHP system.

Certainly, from my own perspective, I feel that plain text files may be very
important for information content upon the DVB-MHP channel.

I feel that language tags could be very useful as a feature in such use.

William Overington

29 October 2002

Re: Unicode plane 14 language tags.

2002-10-29 Thread William Overington

Doug Ewell wrote as follows.

[snip]

Right off the bat, though, I thank the UTC for initiating this public
review process which allows non-members like me to get their two cents
in regarding Unicode policies.  (Hmm, two American-specific figures of
speech in one sentence -- perhaps it should have been tagged en-US.)

Yes, I too am grateful for this public review process.

I do note however that review 3 refers to a document which is only available
to Unicode Consortium members, which seems a strange thing if views of
interested individuals are being sought.

Also, it is a pity that this new era of Unicode glasnost (displayed with a
ligature?  :-)  ) comes so shortly after the last Unicode Technical
Committee meeting the minutes of which state the consensus about no more
ligatures being added to the U+FBxx block.  Surely the matter of ligatures
would be a good topic upon which to conduct such a public review.

So, I wonder if at the meeting due to be held from 5 November 2002, perhaps
it might please be considered as to whether consultation 5 - precomposed
ligatures could be made a topic of a public review in this manner ready to
be considered at the Unicode Technical Committee meeting after that meeting,
so that there is the time and opportunity for widespread consideration to
take place.

William Overington

29 October 2002

The comet circumflex system.

2002-10-28 Thread William Overington

Readers interested in internationalization using Unicode might like to know
that I have recently added some documents about the comet circumflex system
to the web.

The introduction and index page are as follows.

http://www.users.globalnet.co.uk/~ngo/c_c0.htm

The main index page of the webspace is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

29 October 2002

Re: Character identities

2002-10-28 Thread William Overington

John Hudson commented.

At 02:46 10/26/2002, William Overington wrote:

I don't know whether you might be interested in the use of a small letter
a
with an e as an accent codified within the Private Use Area, but in case
you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both
variants
may coexist in a document encoded in a plain text format and displayed
with
an ordinary TrueType font.

If anyone were interested, he could do this himself and use any codepoint
in the Private Use Area.

The meaning which I intended to convey was as follows.

I don't know whether you might be interested in having a look at a
particular example of the use of a small letter a with an e as an accent
codified within the Private Use Area by an individual with an interest in
applying Unicode, but in case you might be interested in having a look at
that particular example, the web page is as follows.

If, following from your response to the way that you read my sentence,
someone were interested in defining a codepoint in the Private Use Area then
certainly he or she could do that himself or herself and use any codepoint
in the Private Use Area.

However, exercising that freedom is something which could benefit from some
thought.

If someone wishes to encode an a with an e as an accent in the Private Use
Area, he or she may wish to be able to apply that code point allocation in a
document.  If he or she looks at which Private Use Area codepoints are
already in use within some existing fonts, then selecting a code point which
is at present unused in those fonts might give a greater chance of his or
her new character assignment being implemented than choosing a code point
for which those fonts already have a glyph in use.

Searching through such fonts takes time and requires some skill.

If someone does wish to use a Private Use Area code point for an a with an e
accent, then by using U+E7B4 does give a possible slight advantage in that
the code point is already part of a published set of code points available
on the web, for, even though that set of code points is not a standard, it
is a consistent set and other people might well use those codepoints as
well.  However, anyone may produce and publish such a set of code point
allocations of his or her own if he or she so wishes, or indeed keep them to
himself or herself.

Yet I was not seeking to make any such point in my posting.  I simply added
to a thread on a specialised topic what I thought might be a short
interesting note with a link to a web page at which some readers might like
to look.  The web page indeed provides two external links to interesting
documents on the web.

Maybe it is time to include a note in the Unicode
Standard to suggest that 'Private' Use Area means that one should keep it
to oneself 

Well, at the moment the Unicode Standard does include the word publish in
the text about the Private Use Area.

I have published details of various uses of the Private Use Area on the web
yet not mentioned them in this forum.  For example, readers might perhaps
like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/ast07101.htm

Anyone who chooses to do so might like to have a look at the following file
as well, which introduces the application area.

http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm

This is an application of the Unicode Private Use Area so as to produce a
set of soft buttons for a Java calculator so that the twenty hard button
minimum configuration of a hand held infra-red control device for a DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) television can be
used in a consistent manner to signal information from the end user to the
computer in the television set.  I am very pleased with the result.  The
encoding achieves a useful effect while being consistent for information
handling purposes with the Unicode specification, so that an input stream of
characters may be processed by a Java program without any ambiguity over
whether a particular code point is a printing character or a calculator
button (or indeed mouse event or simulated mouse event as mouse events are
also encoded using the Private Use Area in my research).

William Overington

29 October 2002

Re: Character identities

2002-10-26 Thread William Overington

I don't know whether you might be interested in the use of a small letter a
with an e as an accent codified within the Private Use Area, but in case you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both variants
may coexist in a document encoded in a plain text format and displayed with
an ordinary TrueType font.

http://www.users.globalnet.co.uk/~ngo

William Overington

25 October 2002

Unicode plane 14 language tags.

2002-10-26 Thread William Overington

On the http://www.unicode.org/ website is a link entitled

Public Issues for Review

which link leads to the http://www.unicode.org/review/ web page.

The first such issue upon which comments are invited is the following
proposal.

Deprecate the Plane 14 Language Tags

It seems to me that deprecating these language tags might be a bad thing as
the language tags could well have potential use in plain text files on the
DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) platform in
order to signal to a Java program accessing a text file the language in
which any particular text is written.

At the present time I have no plans to use the Unicode language tags myself,
yet it does seem to me a pity that just as DVB-MHP, which uses Unicode, is
starting to be run in more than one country that an existing method of
encoding information about languages is possibly to be formally deprecated.

Now, there may be good reasons for the deprecation, yet none are stated on
that web page.  I feel that I would like to mention the matter of the
possibility of using language tags upon the DVB-MHP platform so that that
can be taken into account by the Unicode Technical Committee when it
discusses the matter.

Certainly I am hoping to send in an informed comment upon the matter in the
manner mentioned on the web page using the online contact form.  However,
before doing so, I am wondering if perhaps the reasons for suggesting the
deprecation of plane 14 language tags could please be discussed in this
mailing list.

DVB-MHP broadcasts have recently begun in Germany, there is information on
the http://www.mhp-forum.de website.  The text information is in German,
though there are lots of pictures and for many of them clicking upon them
enlarges them.  I found the language translation facility at
http://www.google.com very useful for translating the text.  Germany follows
Finland in introducing regular DVB-MHP broadcasts.

Information on the DVB-MHP system is available at the http://www.mhp.org
website, in English.  There is also the discussion forum at the
http://forum.mhp.org website.

William Overington

26 October 2002

Re: XML Primer (was Keys. (derives from Re: Sequences of combining characters.))

2002-09-27 Thread William Overington


Shawn Steele wrote to the [EMAIL PROTECTED] list, not directly to me, yet
began by writing.

Mr. Overington,

There is then a long document of very helpful information, for which I am
grateful.

Mr Steele then concludes with the following.

I hope that this example improves your understanding of XML and how it may
be applied to your inventions.  As others have mentioned, this topic is
digressing from the purpose of this message board and would be best
discussed off line or in a different forum.

Well, a letter addressed to me could have been sent by private email.

- Shawn

Shawn Steele
Software Developer Engineer
Microsoft

Unfortunately, this is then followed by the following.

My comments in no way endorse the original

Well, that is fine, the letter has been posted to the Unicode list from a
Microsoft address, so a clarification makes the situation clear just in case
anyone had thought that in some way it might.

and are not intended to confer legitimacy,

Ah!  That is not fine.  The original is entirely legitimate and there is no
need for legitimacy to be conferred at all, also the conferring of
legitimacy is not something which is within the powers of Microsoft to
confer, as Microsoft is a corporation and does not vote in public elections,
let alone have jurisdiction in such matters.  Mentioning legitimacy in that
way in a document from Microsoft, a member of the Unicode Consortium, is
very unfair.

rather they are merely intended to be educational.

Well, they are merely intended to be educational.  No rather about it.

This posting is provided AS IS with no warranties,

Well, that is fine, the letter has been posted to the Unicode list from a
Microsoft address, so a clarification makes the situation clear just in case
anyone had thought that in some way it might.

and confers no rights.

What rights are being referred to here?

William Overington

27 September 2002

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread William Overington


Peter Constable commented as follows.


On 09/26/2002 06:05:45 AM William Overington wrote:

Dallas is 6 hours behind England on the clock.


I'm going to refrain from commenting on anything beyond the markup issues

As you wish.  Though did you stick to that even in the same sentence?

-- and I'm continuing with that only because it's an easy follow-on to what
I already wrote,

As you wish.

even though there is every indication that the sensibility
of it will be ignored.

This did not appear to have meaning.

I checked on the meaning of the word sensibility just to make sure.

Did you intend to convey the meaning the good sense of what I write rather
than the sensibility of it?

Yet what indication whatsoever do you have that I ignore what you write?

I do not always agree with you, yet where specific references to documents
on the web are made I always attempt to obtain them and study the points you
make.

Certainly, I may not agree with you.  Sometimes I agree, sometimes I do not
agree and sometimes I am undecided in a matter.  That surely is the nature
of critical scholarship and research.



A document would contain a sequence such as follows.

U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2


You could just as easily have used

S C=12001London/S

or

S C=12001 P1=London/

which are only slightly more verbose, but which follow a widely-implemented
standard that can be parsed by lots of existing software, for which there
are a large number of tools available, and which a vast number of
indivuals, businesses and other agencies have an interest in. Your markup
convention is completely proprietary,

Thank you.  That is excellent.  I designed the comet circumflex key with the
specific intention that it was creatively original whilst being expressible
using a standard all-Unicode font.

it has no existing software support,
and nobody but you has any interest in it.

You have no basis whatsoever for claiming that nobody other than me has any
interest in it.  Maybe you are not interested, maybe some people you know
are not interested, yet I feel that it is unfair for you to make such a
statement without evidence when writing from an established organization as
that remark may prejudice people from taking an interest in helping to
develop the idea because of a political dimension of going against the tide.
You have your position and I feel that you should allow someone who does not
have such a position an even-handed chance to put forward an idea and have
it considered on its merits.

You tell me which one is more
likely to result in productive work and adoption by others.

Likelihood of success and what actually happens are not the same thing.  I
do not know which is more likely as I do not know of what has happened
already.  Some people may have deleted the email, some may have read it and
disregarded it, yet it is possible that some people might have tried to
produce a comet circumflex button on the screen using an all-Unicode font
and might be considering the possibilities of how the system could be
applied or might even be writing an experimental software program which can
take comet circumflex sequences and process them through a database.

Look, for example, at The Respectfully Experiment in the Unicode mailing
list archives.  There a result was assumed and something different was
observed in practice.


that it is
because I am an inventor, interested in pushing the envelope as to what is
possible scientifically and technologically.

Marco asked me a specific question, so I answered what he had asked.


Perhaps there is an [EMAIL PROTECTED] list somewhere where you might
find greater interest in your ideas than here.

That is unfair of you.  You have chosen to respond to my posts and I have
answered the questions which you asked.

You even stated in the same post.

quote

I'm going to refrain from commenting on anything beyond the markup issues

end quote

The topic of keys generally which I have introduced is potentially a
far-reaching development in the application of markup in Unicode based
systems.  My own comet circumflex system may be highly useful in business
communications and distance education.  I am happy to respond to questions
and to consider documents which people suggest.

None of us here mind
invention, but I think most would believe that inventiveness is most
productive when building off the advancement of others rather than
reinventing wheels or widgets. XML exists, and it works.

XML exists and it uses U+003C in a way that makes using U+003C with the
meaning LESS-THAN SIGN in body text intermixed with markup sections awkward.
That feature of XML may not matter for situations involving encoding simply
literary works, yet for a comprehensive system which can include the U+003C
character with the meaning LESS-THAN SIGN in body text and in markup
parameters, it does not suit my need.


Beside the fact that your proposed markup convention is not a good idea, it
has nothing

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread William Overington


Peter Constable wrote as follows.


On 09/26/2002 03:42:16 AM William Overington wrote:


Well, it might have been 03:42:16 AM where you are, indeed it probably was,
as Dallas is six hours behind England on the clock, but I would not want
people to think that I write my posts in the middle of the night!


On the one hand, you say

XML does not suit my specific need as far as I can tell.


But you also said

Documents with the code sequence are intended to be sent over the internet
as email, used as web pages and broadcast in multimedia broadcasts over a
direct broadcast satellite system, so the codes which you suggest would be
unsuitable.

In that quote the codes which you suggest  was your list of specific
Unicode code points as follows.

quote

Sorry to be blunt, but that's silly. If you need a special-purpose
character (a code-sequence, to be more precise) for use within your
specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1,
2FFFE...  10FFFE, 10. They are non-characters available for exactly
this use.


end quote

I maintain that they are unsuitable for use in documents which are to be
sent from one end user to another.

Yet the first part of my sentence which you have quoted could by going to
the final comma and converting it to a full stop form a sentence on its own
as follows.

Documents with the code sequence are intended to be sent over the internet
as email, used as web pages and broadcast in multimedia broadcasts over a
direct broadcast satellite system.

So, I will reason from that.

You also quote me as stating the following sentence.

XML does not suit my specific need as far as I can tell.

I am happy with that.

The two sentences are entirely consistent.

Are you perhaps trying to make a deduction by the fallacy of the
undistributed middle, along the following lines.

William's need is a markup system.
XML is a markup system.

William's need is XML.

It may well be that XML could be used to carry the comet circumflex code
numbers which I am devising.  I am not saying that it could not be so used.

I am simply saying that XML, as I understand it, does not suit my specific
need.

For example, if I understand it correctly, XML uses U+003C in a document in
such a manner that its use for the meaning LESS-THAN SIGN in the body of the
text cannot be used directly.  For me, that is a major limitation of XML.
Now, I am not trying to make some big issue out of this by criticising XML
as I am not trying to criticise XML, yet to my mind that is a very big
legacy issue of which I do not want to have the problem with my research in
language translation and distance education.  Maybe one day Unicode will
encode special XML opening and closing angle brackets so that XML can
operate without that problem.  However, as XML uses the U+003C character in
that manner at the moment, for me it is a problem and it has led me to use
the key method using a comet circumflex key.

Also, I do not need to have all those  characters and = characters and /
characters within messages.

One of the things that is especially useful about XML and related
technologies is the facility with which data can be repurposed. You have
one schema for marking up data, and stylesheets that transform it as needed
for different publishing / usage contexts.

Also, I don't see how it can be that a character sequence such as U+003C
U+0061 U+003E can't be useful to you when some ridiculous character
sequence like U+2604 U+0302 U+20E3 is.


Well, U+2604 U+0302 U+20E3 is not ridiculous.  It is entirely permissible
within the Unicode specification.  I have used combining characters
productively, in accordance with the rules set out in the specification.
Please see section 7.9.  The button displays using an all-Unicode font.  If
you think it ridiculous then maybe that is good evidence of its originality
as a piece of creativity.  A comet circumflex key could be viewed as a piece
of original art.  I specifically designed it so as to be a design which
involves an inventive leap so as to produce something new and unexpected,
which someone skilled in the art would not produce as the application of
skill in the existing art without invention, yet which would display
properly using an all-Unicode font.

The sequence U+003C U+0061 U+003E is unsuitable because it begins with a
U+003C character and  I do not want the use of U+003C to mean LESS-THAN SIGN
to be unavailable in a simple direct manner.  I want to be able to use the
comet circumflex translation system in documents which contain mathematics
and software listings as well as literary text.  So, I have decided to use a
straightforward system which allows me to do that without problems.

An added bonus of using the comet circumflex key is that documents
containing comet circumflex codes do not necessarily need to contain any
characters from the Latin alphabet.

William Overington

27 September 2002

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-26 Thread William Overington


Peter Constable commented as follows.

On 09/25/2002 05:55:02 AM William Overington wrote:

For example, I am looking at using the following sequence so as to produce
a
special purpose key within documents.

U+2604 U+0302 U+20E3

Hopefully that sequence will be so unlikely to occur other than in my
specialised application that the sequence can be used uniquely for that
specialised application.

Sorry to be blunt, but that's silly. If you need a special-purpose
character (a code-sequence, to be more precise) for use within your
specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1,
2FFFE...  10FFFE, 10. They are non-characters available for exactly
this use.


Documents with the code sequence are intended to be sent over the internet
as email, used as web pages and broadcast in multimedia broadcasts over a
direct broadcast satellite system, so the codes which you suggest would be
unsuitable.

If you need real character sequences for markup, there's this thing called
XML. Perhaps you've heard of it. It's worth taking a look at; I think it
really might catch on some day.

I have heard of XML, though I know little about it.

I have read some introductory documents about XML.

XML does not suit my specific need as far as I can tell.

William Overington

26 September 2002

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-26 Thread William Overington

 inert and not translated, or else
translated in some parameterized form.

Mr Cimarosti added the following.

Mr. Overington, why do you have this irresistible compulsion to mix up
apples and horses? (I feel that the usual apples and oranges is not
enough to convey the idea fully.)

I like the phrase apples and horses.  I have not heard it before, is it
your original?

It has inspired me to write a song.

http://www.users.globalnet.co.uk/~ngo/song1018.htm

I suppose that the answer to your question is that, if indeed it is a
personality feature which can be described as you suggest, that it is
because I am an inventor, interested in pushing the envelope as to what is
possible scientifically and technologically.  Sometimes such an approach is
fruitless, yet at other times it can be very successful.  In relation to the
keys technique which I have suggested generally, and to the Comet Circumflex
system in particular, whether these ideas will be successful or fruitless is
something which cannot presently be determined.

William Overington

26 September 2002

Keys. (derives from Re: Sequences of combining characters.)

2002-09-25 Thread William Overington


The recent discussion on sequences has led me to have a look through the
various combining characters and I have found the following.

U+20E3 COMBINING ENCLOSING KEYCAP

It has occurred to me that the use of a sequence of a base character, then
one or more combining characters so as to produce a sequence which would be
otherwise unlikely, followed by U+20E3 might be a very effective way to
include specialised markup systems within a plain text file without
disrupting the normal textual information conveying capabilities of a file.
An all-Unicode font would then produce a graphic representation of the key,
without any prior arrangement being necessary, so that such marked-up
sequences could be produced using just a regular all-Unicode plain text
editor.  A receiving program with a specialized plug-in could then decode
the markup, or it could be decoded manually in some cases.

For example, I am looking at using the following sequence so as to produce a
special purpose key within documents.

U+2604 U+0302 U+20E3

Hopefully that sequence will be so unlikely to occur other than in my
specialised application that the sequence can be used uniquely for that
specialised application.

I am also thinking in terms of using the following sequence to indicate the
end of the markup sequence.

U+2604 U+0302 U+20E2

I have it in mind that characters in the range U+2460 through to U+2473
could be used before parameters within the markup system.



Also, I have noticed that in the document U02D0.pdf that U+20E4 is shown, in
the listing, in magenta whereas U+20DF is shown in black.  Could someone say
what significance the magenta colouring in the document has please?  Is it
perhaps to indicate additions since the previous issue of the document?

William Overington

25 September 2002

Re: entities with breve

2002-09-25 Thread William Overington


Peter Constable wrote as follows.

The answer would be to encoded characters comparable to U+0361. A combining
double breve has already been approved for version 4.0. I intend to propose
(unless someone gets around to it before me) a combining double inverted
breve below.

In the mean time, one can encode these as PUA characters (which is an
interim solution we're going to be using, at least for some purposes).

Could you please say some more about what is going to be encoded in regular
Unicode and with which code points please?

In relation to your encoding these characters as Private Use Area
characters, I wonder if you could please say some more about this please,
both in relation to which code points you are intending to use and also as
to whether encoding a combining accent character or a combining double into
the Private Use Area could lead potentially to any problems over a rendering
system recognizing the character as being a combining character (please know
that I have no specific reason to think that it would, it is just a
possibility about which I wondered when considering various uses of the
Private Use Area).

William Overington

25 September 2002

Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-19 Thread William Overington


Kenneth Whistler wrote, as part of a longer response to my original posting.

William Overington asked:

[snip]

 I wonder if consideration could please be given as to whether this matter
 should be left unregulated or whether some level of regulation should be
 used.

I think this should depend first on a determination of whether there
is a demonstrated need for an actual representation of these sequences --
which ought to be determined by the people responsible for the
data stores which might contain them, namely the online bibliographic
community.

[further remarks here snipped]

Actually, this matter to which I was intending to refer was as follows,
being more general than just the romanization of Cyrillic characters.

quote

It seems to me that this matter of sequences of combining characters being
used to give glyphs where different meanings are needed other than just
locally and that glyphs for such meanings are only correctly displayed if a
particular rendering system or a particular font are used touches at the
roots of the Unicode system.

It seems to me that the glyphs for such sequences are being left as if they
were a Private Use Area unregulated system.  I recognize that fonts have
glyph variations in that, say, an Arial letter b looks different to a
Bookman Old Style letter b, yet in that case the meaning is the same.

I wonder if consideration could please be given as to whether this matter
should be left unregulated or whether some level of regulation should be
used.

end quote

In another post in the same thread, Ken states as follows.

quote

But that wasn't my point. There is no particular evidence
that the ALA-LC conventions with the dot above the graphic
ligature ties is in widespread use for romanizations of these
particular languages, that I can see. So the *urgency* of
solving this problem isn't there, unless the LC/library/bibliographic
community comes to the UTC and indicates that they have a data interchange
problem with USMARC records using ANSEL that requires a clear
representation solution in Unicode.

end quote

The problem of which I am seeking discussion please is as to whether, in the
present state of the rules, there would be any need for any bibliographic
community to approach the Unicode Consortium over such a matter, and, if it
is the case that they would not need to do so, would it be better to seek to
change the rules now.

It is convenient to consider the situation in relation to the romanization
of Cyrillic characters, yet similar considerations may well potentially also
apply to topics such as the Byzantine legal texts.  There may well be other
topics to which similar considerations may apply.

For example, please suppose that there were a committee called the
Romanization of Cyrillic Committee.  Suppose that that committee were to
have various meetings and decide that for a ts romanization ligature that

t U+FE20 s U+FE21

suits them fine, and that for the ts with a dot above romanization ligature
that

t U+FE20 s U+FE21 U+0307

suits them fine and publishes a list of assignments and example glyphs.  The
glyph for the ts with a dot above ligature in that publication has the dot
above the curved line, centred horizontally.  It is only later that someone
with expert knowledge of the Unicode standard sees the published list and
notices that the glyph shown in the document is, in fact, not the way that
the glyph should appear according to the Unicode standard.  By this time,
many copies of the document have been published and sent to libraries around
the world!  Databases having started to be converted to what that
publication may well be calling the new Unicode based system.

This might sound impossible, yet what is the present alternative?  There is
no way to formally register such sequences with the Unicode Consortium!

I suggest that it might be a good idea to have an infrastructure whereby the
Unicode Consortium registers sequences of combining characters and example
glyphs, categorized as to application.

This would have potentially far reaching benefits.

Suppose, for example, that such an infrastructure existed, and that there is
a mathematician, M, and a font designer, F, who do not know each other.

M is writing a research paper on a particular branch of mathematics, where
one of the key reference papers was written by an author whose name is
written in Cyrillic characters, yet which name also has a romanized version.
M finds that that romanization needs a character to represent the ts
romanization ligature.  How can M, who is using a word processor to prepare
the research paper, insert that character into the document, because M is
keen to insert the ts ligature in a form compatible with the standard
bibliographic method for romanization of Cyrillic names?

Fortunately, M finds that the word processor has available various special
characters and finds a ts ligature and inserts it in the document.  Behind
the scenes the wordprocessor software inserts

Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-18 Thread William Overington


In the discussion about romanization of Cyrillic ligatures I asked how one
expresses in Unicode the ts ligature with a dot above.

Regarding Ken's response to the Byzantine legal codes matter, it would
appear possible that the way that the ts ligature with a dot above for
romanization of Cyrillic could be represented in Unicode would be by the
following sequence.

t U+FE20 s U+FE21 U+0307

The ordinary ts ligature for romanization of Cyrillic being expressed as
follows.

t U+FE20 s U+FE21

The second example is from the recent thread on Romanized Cyrillic
bibliographic data.

In the recent thread about Byzantine legal codes, the following sequences
were suggested.

U+0069 U+0313 U+0301

U+0055 U+0313

The second of the above requiring a rendering different from what direct
reading of the Unicode specification might suggest.

Ken's reply seems to suggest that display of such sequences would be
renderer dependent or font dependent.

It appears to me that the ts ligature with a dot above, and a similar ng
ligature with a dot above, are already needed for the Library of Congress
romanization of Cyrillic system.

The following directory contains a lot of pdf files.

http://lcweb.loc.gov/catdir/cpso/romanization

The ts ligature with a dot above can be found on page 2 of the nonslav.pdf
file.  The ng ligature with a dot above can be found on page 13 of the same
file.

Capital letter versions of the two ligatures are needed as well.

The two sequences U+0069 U+0313 U+0301 and U+0055 U+0313 mentioned above,
and possibly others, will be needed for the Byzantine legal codes.

It seems to me that this matter of sequences of combining characters being
used to give glyphs where different meanings are needed other than just
locally and that glyphs for such meanings are only correctly displayed if a
particular rendering system or a particular font are used touches at the
roots of the Unicode system.

It seems to me that the glyphs for such sequences are being left as if they
were a Private Use Area unregulated system.  I recognize that fonts have
glyph variations in that, say, an Arial letter b looks different to a
Bookman Old Style letter b, yet in that case the meaning is the same.

I wonder if consideration could please be given as to whether this matter
should be left unregulated or whether some level of regulation should be
used.

William Overington

18 September 2002

Re: ISRI SoEuro has just been created!!

2002-09-12 Thread William Overington


One practical use of this code page which occurs to me is as follows.

Suppose that on a Windows 95 PC, (I am preparing this email on a Windows 95
PC), suppose that someone wishes to produce a graphic which includes the
words of an Esperanto poem or song, the graphic being prepared using the
Paint program.

If a font with the layout which is being suggested is used, then the text
can be set within the Paint program.  It appears that, with a suitable font,
sequences of holding down the Alt key then keying 0 from the numeric keypad
at the right of the keyboard, followed by a number from 128 to 255 keyed
from the numeric keypad at the right of the keyboard then releasing the Alt
key would permit the keying of the characters of the new 8-bit set.

On a Windows 98 platform the text can be set in WordPad using a Unicode font
using Alt sequences (for example Alt 264 for C circumflex, Alt 265 for c
circumflex) and then the graphic image copied onto the clipboard using Print
Screen then pasted into Paint, yet for WordPad on at least this Windows 95
machine that will not work.

Certainly, on a Windows 95 machine if someone has Word 97 installed, then
Word 97 can be used to set the Esperanto text before using a Print Screen
operation, though Word 97 is a premium package not available to people using
minimum systems and possibly not available to people using an open access PC
in a public library.

So, as far as Esperanto goes, this code page offers the chance, if someone
will make available a font using those codings, that people using a minimum
Windows 95 system, perhaps in a public library setting, could produce
elegant graphics using the Paint program.

It would appear, on the face of it, that this new code page suggestion makes
that facility available not only for Esperanto but also for a number of
other languages.

I hope that a suitable font becomes published on the web using this set of
code points so that people who are using Windows 95 systems can have this
additional facility available to them.  I know, for example, that there is a
font available for Tamil which uses the 8-bit code space.  I feel that
having such facilities available does not detract from Unicode, I feel that
they tend to get people interested in producing end results and that in the
long term that that may well get them interested in Unicode.  Or indeed, end
users of such facilities may well have good knowledge of what is needed to
use Unicode and might like to use Unicode but have to make the best of only
having less than the very latest equipment available.

I also know that there are various fonts available, such as some Fraktur
fonts, which use the 8 bit codes from 128 to 255 for ligatures.  Those fonts
too are not using Unicode code point assignments, yet hopefully, as time
goes on, those fonts will become updated so as to use Unicode code points,
though that would appear to only be possible on operating system and
software combinations which will recognize and act upon sequences using the
U+200D ZERO WIDTH JOINER character so as to produce the ligatures, unless
Private Use Area encodings for the ligatures are used.

Unicode is very important, yet I feel that it is also very important that
facilities are provided for people using the many older machines which are
still in use around the world.  This new code page may well help in the
process of solving computing problems now.  Those same problems can also be
solved now using more modern equipment with later facilities, for those
people that have access to those facilities.  As time passes, maybe the
Unicode solution will become universally the useful solution, yet for the
present, the new code page may well have usefulness for some end users of
computing equipment.

While writing, can I please ask as to what characters A9 and B9 are meant to
represent as they come out as black squares here?

In using the Microsoft Paint program using the text tool I have found that
some fonts such as Arial, Code2000 and Times New Roman offer various
versions of the font with names such as Baltic within parentheses after the
name of the font, which can be used using Alt ddd sequences and Alt 0ddd
sequences, where ddd is a base 10 integer less than or equal to 255, to
produce various sets of characters.  How please does this mechanism work?  I
have tried various values of ddd with various of the language groups and
found a wide range of characters, yet so far I have not found any way of
getting Esperanto accented characters into Paint on a Windows 95 machine
using that technique.  Is it possible to do so?  Are there any charts of
these code point allocations available please?

So, I am wondering if the new code page could be added into some of those
fonts in some way as that would then make Esperanto poems and songs settable
using Microsoft WordPad and Microsoft Paint on Windows 95 machines?  Would
that produce additional facilities for end users of Windows 95 machines?

William Overington

12

Variation selector sequences for alternate glyphs. (derives from Re: various stroked characters)

2002-09-09 Thread William Overington


Peter Constable kindly responded to my question in the original thread.

Would the encoding that would be intended to be used in the long term use
of Unicode be to use one of the characters from the range U+FE00 to U+FE0F
following the main character code so as to indicate the glyph alternate?

I have not to this point anticipated requesting variation selector
sequences for these, but that is not beyond the realm of possibility.

I wonder if the matter of variation selector sequences could be clarified
for the general situation of applying variation selector sequences please,
that is, not necessarily in relation to the particular topic of stroked
characters.

Would a variation selector sequence be something specified and encoded by
the Unicode Consortium or by some other standardization body or would it be
a matter for end users on much the same basis as Private Use Area allocation
please?

William Overington

9 September 2002

Re: various stroked characters

2002-09-06 Thread William Overington

 products be easily adapted or would
new products be needed?

A good benchmark test might be to send c ZWJ t in a document and, using
U+E707 to access a precomposed ct ligature glyph, display the ligature on
the screen of a computer which cannot use an advanced format font by means
of the receiving software automatically producing a temporary local document
wherein the U+E707 code is used instead of the c ZWJ t sequence of the
transmitted document.

How difficult is that benchmark to achieve please?  Is it a major software
development or could it be written into a macro by a knowledgeable person
within a few hours?

William Overington

6 September 2002

Re: Double Macrons on gh...

2002-08-31 Thread William Overington


Kenneth Whistler wrote as follows.

In practice, fonts might simply choose to have ligatures for
the entire sequence, to avoid complications of calculating
the accent positions dynamically.

For more examples, just look in dictionary pronunciation guides.

--Ken

An interesting problem which may arise is that the Unicode Consortium will
not be specifying particular ligatures to include in fonts and that font
designers may not have available from any public source a list of such
ligatures for which to prepare the glyphs to include in a font.  This could
then result in a muddle in the future when end users are trying to use such
ligatures in a document and find that for some key ligatures which they wish
to use that the implementation in some fonts is by default action rather
than special glyph, which default action may, for some requested ligatures,
result in a typographically awful display.

This issue first came to my attention in the matter of the ligatures for the
romanization of Cyrillic names and unknown words, where special ligatures
would be desirable due to the need to have U+FE20 and U+FE21 act in both TS
and iu ligatures.

I wonder if, for the guidance of font designers, there should be a list of
desirable ligatures for which font designers might choose to prepare
specific glyphs for inclusion in an advanced format font, the list prepared
by consultation between the various dictionary publishers, libraries and so
on.  Such a list, while not obligatory for anyone to use, would nevertheless
be a useful collected guide which font designers could use so that fonts
could be designed so as have individual glyphs for all of the ligatures on
the list.  The list could include the specific Unicode sequence to access
each ligature.

It may be that there would need to be more than one list, so as to provide
for various specialised areas of activity without making a general list too
large.

Do you think that such a published list or lists would be useful?

William Overington

31 August 2002

Re: Romanized Cyrillic bibliographic data--viable fonts?)

2002-08-31 Thread William Overington


Edward H Trager wrote as follows.

 ... I was also thinking
about the issue of how do you get the highly qualified designers
interested in such a project?

In answer to the specific question.

One might consider the possibility of offering them a fee-paid assessment of
a portfolio of their work with the hope of receiving a qualification or some
formal academic credit from an appropriate body.

In relation to obtaining highly skilled and experienced designers to
participate in the project.

The project could be organized so as to include a training facility, as a
distance education process, for those of us who are not expert type
designers so that we learn on the project and thus the reward for the time
and skill which we spend on the project is enhanced skills and professional
experience in participating in a typographic project of such world class
standing.

If the second suggestion could be combined with the first suggestion, then
the prospect of a high quality distance education and participation
opportunity without the participants having to pay any fees might be the way
to get the result which you desire.

It could be organized as if each participant were carrying out a consecutive
series of final year undergraduate projects.

This approach may not be something in which everybody would wish to
participate, yet it could be regarded as a magnificent opportunity by some
of us.

The including of a training and learning opportunity in the project could be
the factor that provides the necessary amplification of effectiveness
factor to whatever funds are available so as to attract a good number of
keen people who would put in a lot of effort.  Participation in the project
and being able to include that participation in a curriculum vitae would be
a huge incentive to people who are not currently employed in the typography
field and cannot realistically attend a full time course yet who would value
the opportunity to gain professional quality skills and experience.

This could be a wonderful opportunity!

William Overington

31 August 2002

[Possibly off-topic] Fonts for experimental usage. (spins off from Re: Romanized Cyrillic bibliographic data--viable fonts?)

2002-08-30 Thread William Overington


Peter Constable wrote as follows.

On 08/27/2002 12:08:09 AM James Kass wrote:

William Overington has mentioned the Softy editor.  Please keep
in mind that fonts are copyrighted material, and, mostly users
are forbidden to modify them, even for internal use purposes.

The best way to get characters added to a font is to ask the
font's developer.

I agree completely. Also, it's worth noting that font engineering involves
rather more than just adding a few extra characters, especially when
smart fonts are involved. Note, for instance, that some tools may trash
the hints in a font. The overarching issue, though, as James pointed out,
is that very often it is simply not legal to make such changes.



- Peter


James raises the important matter of intellectual property rights in fonts
and suggests that the best way to get characters added to a font is to ask
the font's developer.

Peter agrees with James and adds some good computing reasons as to why, even
if permission were available, simply adding a few extra characters without
highly expert skills would not be an effective solution.

The matter which concerns me is as to whether James' suggestion that the
best way to get characters added to a font is to ask the font's developer,
while probably quite true, is nevertheless, in effect, what the theory of
procedural rules would, if that course of action were a formal motion for a
meeting, term a pious motion.  What I mean by this is that, for example, if
someone does want a particular character added to a font, how effective, in
practice, is such a request likely to be, qualitatively in terms of whether
such a request would be accepted at all, and quantitatively, in terms of
time scale and financial charge, as to how accessible such an addition would
be to someone making such a request of a font's developer.

Now, let me say at once that James has already shown in his posting that, in
the particular case of the situation in the thread from which this
discussion has spun, that he has reacted proactively in setting about adding
U+FE20 and U+FE21 to his own font, and hopefully the results of that
addition will be available to all at the next release of that font.

Yet is that a response which could be anticipated as typical of font
designers?  James produces mainly one huge font which covers many Unicode
characters and is continually adding items to produce a better version for a
later release.

What is the situation with other font developers?  Is it perhaps the case
that some font designers, or a team,  produce a particular font and then
wrap up the project, so that adding a few extra characters at a later date
would mean a substantial restarting up of the project?  I do not know the
answer to this and I wonder if some font designers could perhaps comment
upon the possibilities and the modalities of someone getting a font
developer to add a few characters to an existing font please.

The whole situation has led to me trying to think out the problem of how
someone could get a few extra characters added to a font and, recognizing
the issues and problems that James and Peter mention, I wonder if I may
perhaps put forward a few thoughts on the matter, which might perhaps lead
to a new infrastructural facility for end users of the Unicode system.

Firstly, I mention that I know, at present, very little about font
authorship.  I have only used the Softy program and not all of the
facilities in Softy yet.  I am aware that there are various sophisticated
font authoring packages available, which are expensive and not widely
accessible by many end users of Unicode.

When using Softy, one method of designing a glyph is to load a template,
which is a .bmp file of a large monochrome image of the desired end result
in a .bmp file, say about 200 pixels by 200 pixels or thereabouts, and then
to use Softy to automatically outline the template so as to produce the
Bézier curves for the glyph.  The template file can be produced using a
widely available package such as Microsoft Paint.  I have had a lot of
learning fun producing experimental glyphs for a few characters using Paint
in this way, including using the line, ellipse and curve tools of Paint to
produce two tengwar-inspired fantasy characters, namely a double thorn and a
double thorn with tilde, in a manuscript style, by drawing upon a background
grid of a different colour.

I wonder whether it would be possible for some interested people to devise
some basic grids in .bmp format with green and cyan lines upon them so that
the containing boxes for letters x, h, p, a circumflex, A circumflex and so
on were indicated, so that any interested end user could draw a desired
character, using Paint, upon a copy of such a grid using black, then erase
the green and cyan lines.  This could have the effect that if, say, twenty
to fifty end users each produced designs for five or more characters, that
the artwork for an easily extendible font could become available.  Clearly

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-27 Thread William Overington


James Kass wrote as follows.

Unless a font is fixed width, Latin combiners can't currently
consistently combine well without smart font technology
support enabled on the system.  So, don't blame the Arial
Unicode MS font if these glyphs don't always merge well.

While awaiting Latin OpenType support, it might be a good
idea to take a look at a well populated fixed width pan-Unicode
font like Everson Mono.

James had also previously written.

Best regards,

James Kass,
who is now adding U+FE20 .. U+FE23 to the font here.

I have had a look at the problem and decided, as the saying might go, that
even the best cook cannot make herb and carrot sauce starting with parsnips
and some huge quantity of thyme!

The characters need to cover ligatures for both TS and also iu so they need
to be high and arranged so that the result looks reasonable.

So, I began to think that the best display option would probably, in the
long term, be for an advanced format font to carry all of the necessary
glyphs and to produce a glyph in response to an appropriate four character
sequence.

However, the problem remains for people with other than the very latest
equipment, so I have decided to add some of these ligatures into the golden
ligatures collection.

This is quite an interesting task, as, starting from the reference to the
pdf file I worked back to the directory and there found various other pdf
files, some for other languages which use Cyrillic characters.

http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf

http://lcweb.loc.gov/catdir/cpso/romanization

Thus far, I have made the following allocations for the golden ligatures
collection.  Please know that my approach is mathematical rather than
linguistic.

The first four pairs are for romanizing Russian names and unknown terms.

U+E7A0 for T U+FE20 S U+FE21
U+E7A1 for t U+FE20 s U+FE21

U+E7A2 for I U+FE20 E U+FE21
U+E7A3 for i U+FE20 e U+FE21

U+E7A4 for I U+FE20 U U+FE21
U+E7A5 for i U+FE20 u U+FE21

U+E7A6 for I U+FE20 A U+FE21
U+E7A7 for i U+FE20 a U+FE21

The next two pairs are additions for Belorussian and Ukrainian.

U+E7A8 for I U+FE20 O U+FE21
U+E7A9 for i U+FE20 o U+FE21

U+E7AA for Z U+FE20 H U+FE21
U+E7AB for z U+FE20 h U+FE21

I have started writing it all up for our web site, where it will hopefully
be posted, making clear the use of these code point allocations for
producing displays, not for storing text in databases that need to be
searched and sorted.

However, I would like to make the list of encodings more comprehensive and
would welcome feedback on which ligatures to include.  The files
churchsl.pdf and nonslav.pdf from the above named directory are the source
material that I have found so far which has not yet been covered in the
above encodings.  Suggestions of other source material are welcome.

I have looked through them and found some very interesting characters, such
as what looks like an o macron and t ligature and also a t s ligature with a
dot above the whole ligature as well as other ligatures along similar lines.
I would appreciate any information about expressing those in Unicode which
anyone can provide please, either to the mailing list or, if a writer
prefers, privately by email.

The current documents about other ligatures already in the golden ligatures
collection can be found from the following introduction and index page.

http://www.users.globalnet.co.uk/~ngo/golden.htm

William Overington

27 August 2002

SC UniPad 0.99 released.

2002-08-26 Thread William Overington


As an end user of Unicode I was interested to learn recently that the latest
version of SC UniPad, a Unicode plain text editor for various PCs, has been
released.

This latest version is SC UniPad 0.99 and is available for free download
from the following address on the web.

http://www.unipad.org

A particularly interesting new feature is that one may hold down the Control
key and press the Q key and a small dialogue box appears within which one
may enter the hexadecimal code for any Unicode character.  Upon pressing the
Enter key, that character is entered into the document.  SC UniPad contains
its own font.

Please note in particular the buttons in a column down the left hand side of
the display.  These alter the way in which some code points are indicated in
the display.  For example, if one clicks on the button labelled FMT (which
controls Character Rendering: Formatting Characters)and selects Picture
Glyph, then entry of U+200D into the text document shows a box with the
letters ZWJ in it.

I first learned of the existence of the UniPad program in a response to a
question which I asked in this forum, so I am posting this note so that any
end users of the Unicode system who are at present unaware of the existence
of the UniPad program might know of the opportunity to have a look at it if
they so choose.

The web site has a facility to request email notification of developments to
SC UniPad.  It was by a such requested email notification that I became
aware of the availability of SC UniPad 0.99.

William Overington

26 August 2002

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread William Overington


J M Craig wrote as follows.

[snipped]

Any suggestions welcomed! Is there a tool out there that will allow you
to edit a font to add a couple of missing characters?

You might like to have a look at Softy, which is a shareware font editor for
TrueType fonts.  Softy can be used to produce new TrueType fonts and to edit
existing TrueType fonts.

http://users.iclway.co.uk/l.emmett/

There is some more information about Softy, including the correct email
address for registrations, at the following page.

http://cgm.cs.mcgill.ca/~luc/editors.html

Having a look for

Softy

and

Softy font

at http://www.yahoo.com might be helpful.

I am trying to obtain a copy of the tutorial by Grumpy, so far without
success.

I have found the other tutorial and it is very useful.

I have had lots of fun with the Softy program and although I have not tried
to implement the U+FE20 and U+FE21 which you mention, I have tried various
experiments using Softy and have found it a very satisfactory package to
use.

Softy is shareware, so perhaps you might think it worth a try to find out if
it will help you do what you want to achieve.

Also, you might like to have a look at the SC UniPad program which I
mentioned earlier today in another thread.  When I was studying your posting
I used SC UniPad to have a look at the various Cyrillic characters which you
mentioned.  As far as I can tell at present SC UniPad does not position the
U+FE20 and U+FE21 characters as you might want them to appear, yet SC UniPad
would seem like a good way to key in the text, ready to copy and paste it
into another program which would be used to display the thus keyed text
using a font of your choice.

William Overington

26 August 2002

The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-21 Thread William Overington


As many readers may know, the Unicode Technical Committee was due to start a
four day meeting yesterday at the Redmond, Washington State, USA campus of
Microsoft, that is, on 20 August 2002.

Here in England I am interested to know of what is happening and to learn of
news from the meeting.

So, from here in England I am starting this thread in the mailing list in
the hope that some of the people at the meeting might like to post news of
what is happening at the meeting please.

This is not in the same news gathering league as having CNN and other
reporters providing live reports from outside the venue and catching quotes
from prominent members of the Committee as they arrive and depart and there
being live press briefings from an official spokesperson, yet in its way it
is still news gathering and hopefully will be of interest to other
participants in this mailing list as well.

It is the early hours of the morning in Washington State at present.  It is
hoped that when delegates get up for breakfast that they might look in their
emails and make early morning responses, or perhaps arrange for an official
briefing to be posted later in the day.

If I were conducting a live interview with the committee chairman or with an
official spokesperson I would ask the following questions.

* What was discussed yesterday (Tuesday) please, and what formal decisions,
if any, were taken please?

* How many people attended please?

* Is it only companies which are full members of the Unicode Consortium who
send delegates to the meeting, or are there also representatives of
organizations who do not vote in decisions present as well?

* What is the agenda for today please?

* What is the agenda for the rest of the week please?

* Will there be a press statement at the close of the meeting please, and if
so, will it also be posted in the Unicode mailing list please?

Depending upon the responses to the above, I would, if the topics had not
been covered, ask specific questions related to the following.

* Has there been, or is there on the agenda, any discussion of the wording
in the Unicode specification about the use of the Private Use Area and, if
so, are any changes to that wording being implemented?

* Has there been, or is there on the agenda, any discussion concerning the
status of the code points U+FFF9 through to U+FFFC please?  There has been
some discussion recently in the Unicode mailing list about these code
points, as regards issues of U+FFF9 through to U+FFFB as an issue, the issue
of using U+FFFC as a single issue, and the issue of using U+FFF9 through to
U+FFFC all together.  Is the committee discussing these issues at all and,
if so, are they discussing the matter of whether U+FFFC can be used in
sending documents from a sender to a receiver please?  Is there any
discussion of a possible rewording, or changing of meaning, of the wording
about the U+FFF9 through to U+FFFC code points in the Unicode specification
please?

* Are any matters concerning how the Unicode specification interacts with
the way that fonts are implemented being discussed please?  If so, is due
care being taken that as font format is not, at present, an international
standards matter that therefore the committee must take great care to ensure
that Unicode does not become dependent upon a usage, express or implied, of
the intellectual property rights or format of any particular font format
specification?

* Is there any discussion of the possibility of adding further noncharacters
please, considering either or both adding some more noncharacters in plane 0
and a large block of noncharacters in one of the planes 1 through to 14?

* Is the committee discussing the issue of interpretation, namely as to how,
if various people read the published specification so as to have different
meanings, how people may receive a ruling as to the formally correct meaning
of the wording of the specification.  This recently arose in relation to the
U+FFFC character and has previously arisen in relation to what is correct
usage of the Private Use Area, so there are at least two areas where the
issue of interpretation has arisen.

I am hoping that regular postings of what is happening in the meeting will
appear as the meeting progresses so that there is both information for
people who may be affected by what is decided at the meeting and also so
that participants in the meeting might be able to gather end user feedback
upon any topics that arise at the meeting before they make any decision
which may affect end users.

Is there an official press spokesperson for the meeting please?

William Overington

21 August 2002

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread William Overington

 there is no question that the publication of a .uof
file specification by the Unicode Consortium would prejudice the rights of
anyone to use the U+FFFC character in any other manner.

Publication of such a .uof file specification would also prevent U+FFFC
being made into a noncharacter and keep the facility of using the U+FFFC
character in interchanged documents available for all, whether they choose
to use the .uof file format or some other format for explaining the meaning
of any U+FFFC codes in a given document.

Could this be discussed at the Unicode Technical Committee meeting next week
please?

William Overington

16 August 2002

Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)

2002-08-16 Thread William Overington

 the name of the graphic file
would seem perfectly suitable.

Then there is this curious passage.

Note that it is also *permissible* in Unicode to spell permissible
as purrmisuhbal. That doesn't mean that it would be a good idea
to do so, but the standard does not preclude you from doing so.
You could even write a rendering algorithm which would display the
sequence of Unicode characters p,u,r,r,m,i,s,u,h,b,a,l with the glyphs
{permissible} if you so choose.

--Ken


Well, who is this you, certainly not me!  :-)

Thank you for your response, which has been very helpful.

William Overington

16 August 2002

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread William Overington


James Kass wrote as follows.

William Overington wrote,


 No, it is a story about an artist who wanted to paint a picture of a
horse
 and a picture of a dog and, since he knew that the horse and the dog were
 great friends and liked to be together and also that he only had one
canvas
 upon which to paint, the artist painted a picture of a landscape with the
 horse and the dog in the foreground, thereby, as the saying goes,
painting
 two birds on one canvas,
http://www.users.globalnet.co.uk/~ngo/bird0001.htm
 in that he achieved two results by one activity.  In addition the picture
 has various interesting details in the background, such as a windmill in
a
 plain (or is that a windmill in a plain text file).  :-)


1)  It's gif file format rather than plain text.*
2)  There isn't any windmill.

The picture of the birds has been in our family webspace since 1998 as an
illustration for the saying Painting two birds on one canvas.  That
saying, originated by me, is a peaceful saying meaning to achieve two
results by one activity.  I made the picture from clip art as a learning
exercise.

The picture of the birds is referenced as a way of illustrating the saying
Painting two birds on one canvas.  It is not the picture in the story
about which Ken asked.  I may well have a go at constructing such a picture,
perhaps using clip art.  The reference to a windmill is meant as a humourous
aside to Don Quixote tilting at windmills.

I am interested in creative writing, so when Ken asked about the story, I
just thought of something to put in my response.  Part of the training in,
and the fun of, creative writing is to be able to write something promptly
to a topic.

William Overington

16 August 2002

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread William Overington

 cannot quite do
everything which a .uof file could do, as far as I aware, though I am
willing to learn if the situation is different.

For example, suppose that a book is being made available as a Unicode plain
text file and it is desired to add just a few illustrations without a major
reformatting of the whole text, which uses CARRIAGE RETURNS to indicate
paragraphs.  A text editor could be used to insert a few U+FFFC characters
at appropriate places in the file and a .uof file could be used to carry a
list of the names of the illustration files in the order in which they are
used.  Conversion to HTML format would require a larger file and would limit
the ways in which the file could be displayed to just the use an HTML
browser.

 Also, as there are more than a million characters
in Unicode, most are unused so far, so changing the meaning of just FFFC
in this one context doesn't seem like a big win, considering also every
line of code that might work with FFFC now needs to consider the context
to determine its semantics.

I don't follow what you mean.

However, the meaning of U+FFFC is not, I hope, going to be changed at all.
I have simply suggested that an optional way of indicating, outside of the
plain text file which contains one or more U+FFFC characters, the extra
information as to which object the U+FFFC character is anchoring.

But every invention deserves to be implemented, we need not look at
whether the invention satisfies some demand of its customers.

I disagree with this.  My view is that not every invention deserves to be
implemented, and indeed that not every invention needs to be considered as
consideration takes time and may cost money.  However, I do feel strongly,
and have for many years, that when an invention is considered it should be
considered on its merits and without prejudice, such as, for example, when
an invention is turned down because of  not representing an organisation
discrimination because the invention has been suggested by someone who is
not representing a company.  As to customer needs, certainly an invention
that satisfies an existing need meets that criterion, yet it is also the
case that sometimes the need does not exist until potential customers become
aware of what has become possible and then begin to have a need, or desire,
for it.

I like the 2 birds picture and I assume it was a metaphor for the idea-
one bird was html the other unicode. I was a little disappointed that
you used html instead of .uof format though.

The picture of the birds has been in our family webspace since 1998 as an
illustration for the saying Painting two birds on one canvas.  That
saying, originated by me, is a peaceful saying meaning to achieve two
results by one activity.  I made the picture from clip art as a learning
exercise.

The picture of the birds is referenced as a way of illustrating the saying
Painting two birds on one canvas.  It is not the picture in the story
about which Ken asked.

I am interested in creative writing, so when Ken asked about the story, I
just thought of something to put in my response.  Part of the training in,
and the fun of, creative writing is to be able to write something promptly
to a topic.

The two birds are not a metaphor for HTML and Unicode at all.  Ken put two
illustrations in his posting so I put one in mine.  It all adds to the
interest for readers.

William Overington

16 August 2002

Re: Tildes on vowels

2002-08-14 Thread William Overington


James Kass wrote as follows.

 Indeed, a program designed to display actual superscripts based on
the notational form would work pretty much the same regardless
of whether standard or non-standard characters are used, and the
editing or input screen would also look essentially identical.

Yes, yet using Private Use Area codes would not clash the meanings of the
regular Unicode characters, so maybe that is preferable.

 Standard only amongst those end users who choose seems to be a way of
saying non-standard.

Almost but not quite.  It is like describing someone's request for something
as not unreasonable rather than as reasonable.  A distinction of the use
of the English language where Boolean alternatives are not quite the case.

Thank you for your help.

William Overington

14 August 2002

Re: Tildes on vowels

2002-08-14 Thread William Overington

 to problems for new end users.

Especially given that no software whatsoever supports the codes, and
if it did, one would have to work with custom application software
(that knows that U+ means BOLD) and/or with special
Courtyard-Code-compatible fonts that know about Golden Ligatures, of
which there are none in existence today.

The golden ligatures collection is of character glyphs and is a separate
thing from courtyard codes which are mostly control codes for formatting and
markup.

As far as no fonts or software being available today, well that might well
be true.  However, that might change.  Please know that I am not expecting a
major reorientation of the software industry, it is just a matter that if
someone does like to make a font which includes, in a Unicode compatible
manner, whole precomposed ligature glyphs which are accessible directly from
a code point, whether for ligatures such as ct or ch or for long s b or for
ppe, then there is available a set of code point allocations, which, while
not standard, are perhaps more likely to be consistent than any other set of
code point allocations which might otherwise be used.  As for writing
software which recognizes courtyard codes, well maybe people might use
courtyard codes in computing generally, and if so, good, yet a primary
reason for introducing them is for using them in educational software
packages to be broadcast on digital television channels throughout the world
using the DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform)
system.  DVB-MHP uses Java and Java uses Unicode, so DVB-MHP programs which
are broadcast use Unicode.  These telesoftware programs are a specialist
niche in broadcasting, yet, though I say it myself, telesoftware is an
extremely powerful computational technique and hopefully in the next few
years will begin to fulfil its potential.  There is much to be done, yet it
is an exciting field and Unicode is a key feature in being able to use it
effectively throughout the world.

Thank you for your help.

William Overington

14 August 2002

Re: Tildes on vowels

2002-08-14 Thread William Overington


Marco Cimarosti wrote as follows.

As you see, it is nowhere said that markup is necessarily something
beginning with  or any other character. The additional information
(markup) can be in any format, in fact the definition says: It is
expected that systems and applications will implement proprietary forms.

Ah!  The key point.  So my courtyard codes are both fancy text and markup.
The fact that they do not enter a markup bubble but instead use individual
code points to convey the formatting information does not alter the fact
that they are markup.

 [...] I am not knocking markup, [...]

Of course you aren't! Your idea of defining format controls as PUA code
point totally fits in the above definition.

Yes.

So, FARMYARD CODES ARE IS JUST ONE MORE FORM OF MARKUP. And text including
the controls IS NOT PLAIN TEXT: it is William Overington's own proprietary
form of rich text.

I understand what you mean.  However, as regards the second sentence in the
above quote, so as not to seem to agree tosomething with which I am not
agreeing, can I please say that in the dictionary before me at present, the
word proprietary is stated as an adjective meaning belonging to owner; made
by firm with exclusive rights of manufacture, so I would not wish courtyard
codes to be regarded as a proprietary form of rich text.  I fully accept
that you were probably not using the word proprietary to convey that meaning
but to convey a sense that I had made it up myself on my own initiative as
between making it up myself on my own initiative and it being devised by a
standardization body.

You are out of Unicode rules not because you defined your Farmyard codes in
the PUA (which is perfectly legal, as I explain below), but because you
fail
to accept (or understand) that these codes are a form of markup, and that
text containing them is a proprietary form... of fancy text.

Yes.  I now understand.  Thank you for the explanation.

The only questionable usage of PUA that I can think of is duplicating
existing characters. But this would be an absurd deed. Your other proposal
of defining PUA ligatures goes near to this, but not quite.

Well, I did not define codes for long s t ligature and st ligature in the
golden ligatures collection because they are already in regular Unicode.

Thank you for your help.

William Overington

14 August 2002

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-14 Thread William Overington


 U+0360 COMBINING DOUBLE TILDE

 U+035D COMBINING DOUBLE BREVE
 U+035E COMBINING DOUBLE MACRON
 U+035F COMBINING DOUBLE LOW LINE

I also note U+0361 COMBINING DOUBLE INVERTED BREVE and U+0362 COMBINING
DOUBLE RIGHTWARDS ARROW BELOW in the code chart.

I wonder if someone could please clarify how an advanced format font would
be expected to use such codes.

I understand from an earlier posting in this thread that the format to use
in a Unicode plain text file would be as follows.

first letter then combining double accent then second letter

As first letter and second letter could be theoretically almost any other
Unicode characters, would the approach be to just place all three glyphs
superimposed onto the screen and hope that the visual effect is reasonable
or would a font have a special glyph within it for each of the permutations
of three characters which the font designer thought might reasonably occur
yet default to a superimposing of three glyphs for any unexpected
permutation which arises?

As a matter of interest, how many characters are there where such double
accents are likely to be used please?  Is it just a few or lots?

While in this general area, could someone possibly say something about how
and why U+034F COMBINING GRAPHEME JOINER is used please?

William Overington

14 August 2002

The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)

2002-08-14 Thread William Overington


John Cowan wrote as follows.

In essence, though not formally, U+FFF9..U+FFFC are non-characters as
well, and the Unicode semantics just tells what programs *may* find them
useful for.  Unicode 4.0 editors: it might be a good idea to emphasize
the close relationship of this small repertoire with the non-characters.

That is not what the specification says.  Something can only be emphasised
if it is true in the first place!  If it is desired to make U+FFF9 through
to U+FFFC noncharacters then that needs to be done explicitly with a fair
opportunity for people to object and make representations before a decision
is made.

A saying of my own is as follows.

When goalposts are moved, aromatic herbs should be scattered around.

It seems to me, not having known about annotation characters previously,
yet, due to this thread now having read the published rules in Chapter 13
that these are not noncharacters.

It appears to me that the use of the annotation characters in document
interchange is never forbidden and is strongly discouraged only where there
is no prior agreement between the sender and the receiver, and that that
strong discouragement is because the content may be misinterpreted
otherwise.  So, if there is a prior agreement, then there is no problem
about using them in interchanged documents.

There appears to be nothing that suggests that U+FFFC cannot be used in an
interchanged document.

I know little about Bliss symbols, though I have seen a few of them and have
read a brief introduction to them, yet it seems to me that annotating Bliss
symbols with English or Swedish is entirely within the specification
absolutely and would be no more than strongly discouraged even if there is
no prior agreement between the sender and the receiver.

Further, it seems to me from the published rules that these annotation
characters could possibly be used to provide a footnote annotation facility
within a plain text file, so that, if a plain text file is being printed out
in book format, then a footnote about a word or phrase could be encoded
using this technique so that the rendering software could place the footnote
on the same page as the word or phrase which is being annotated, regardless
of whether that word or phrase occurs near the start, middle or end of that
page.  It seems to me that the statement of the meaning of U+FFFA means that
Figure 13-3 of the specification are just examples, though as the word exact
is used, perhaps they are guiding examples and the use in footnotes is
perhaps stretching the variation from the examples in the diagram.

An interesting point for consideration is as to whether the following
sequence is permitted in interchanged documents.

U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB

That is, the annotated text is an object replacement character and the
annotation is a caption for a graphic.

It seems to me that if that is indeed permissible that it could potentially
be a useful facility.

On balance, it seems to me that if both sender and receiver are clear as to
what is meant, then the use of annotation characters for Bliss symbols and
for footnotes and for captions for illustrations harms no one, for a person
skilled in the art seeking to use the file without knowledge of the
interpretation agreement which should ideally exist between sender and
receiver and who has only the Unicode specification to go on would probably
be unlikely to get a wrong interpretation of the intended meaning, even if
the actual graphical layout were imprecise, as the Unicode standard locks
together the two parts of the annotation sequence and shows that one of the
parts is the annotation for the other part.

William Overington

15 August 2002



.

Re: Tildes on vowels

2002-08-13 Thread William Overington

 or defining markup for some of these solutions, instead
of the PUA. By analogy, include some other tools in your repertoire, so
that everything does not look like a code point ready to be hammered.


[snipped]


I hope that helps. I hope the message does not read as being harsh.

Not at all.

 I intended to just be explanatory.

Yes.

 My attempt to be concise and specific,
gives this a more pointed tone then I intend I suspect, but please
believe it is not intended.

I am happy that that is your intention.  I did not read it otherwise.  I
very much believe that people can have an academic debate without
personalities being an issue.  When people raise personality issues, they
are just using power to get a win, without answering the underlying
questions which still continue to exist, even if their asking has been made
 er, taboo!  :-)

William Overington

13 August 2002

Re: Tildes on vowels

2002-08-10 Thread William Overington


Stefan Persson commented on my suggestion as follows.

 Well, why not go ahead and decide on two code points within the Private
Use
 Area as values for  and XXXY, post them in this list and perhaps that
 action will lead to that facility becoming available as a facility to
 document transcribers all around the world.

There have been several messages sent to this list about why this would be
inappropriate. Just read the answers to some of your recent discussions,
and
you'll understand what I mean.

Stefan



I wonder if you could please state exactly what you mean as I do not
understand what is the point which you are trying to make.

As far as I am aware, the particular set of circumstances relating to this
particular topic have not been discussed previously.

William Overington

10 August 2002

Re: Tildes on vowels

2002-08-09 Thread William Overington


David Possin wrote as follows.

quote

In German it was common to use a macron over m and n to show mm and nn,
I saw it being written this way up to the 1970's. But I never saw it
used for any other double letters.

Dave

end quote

There is a very interesting document entitled The Gutenberg Press available
as a file named gbpmanual.pdf from the Walden Font website.

The website address is as follows.

http://www.waldenfont.com

The address for the file is as follows.

http://www.waldenfont.com/public/gbpmanual.pdf

On page 14 are some special characters, ligatures and abbreviations, as used
by Gutenberg.

Searching through the table is great fun so I will only mention here the
first entry in the table which shows a letter a with a horizontal line over
the top which is stated as am, an in the pdf file.

The Walden Font website also has some sample fonts showing some of the
characters in each font.  With the Gutenberg sample some of the special
characters with a horizontal line over the top are in the sample.  I managed
to find them using the Insert Symbol facility of Word 97 on a Windows 98
platform.

I have also experimented using WordPad on a Windows 98 platform and found
that I could get one of the characters by using Alt+0200.

I also managed to get that same character into WordPad on an older Windows
95 PC.

I have not referred to the line over the top as a macron as I am not sure
whether it is a macron.  I say not sure because I am learning and am not
sure in that context, not in any way because I am expressing a learned
opinion on the matter or anything like that.

The document refers to Gutenberg having 290 characters in his typeset.

However, the Walden Font font seems not to have that many characters, so
perhaps someone might like to say something about Gutenberg's character set
please.

An email correspondent recently informed me that Gutenberg used a qv
ligature.  Does anyone know please of what ligatures and abbreviations were
used by Gutenberg, if any, which are not in Walden Font font please?

I recently saw a television programme in the United Kingdom about Gutenberg
not having used a reusable matrix for typecasting but having to make a new
matrix for each casting, without the benefit of having a punch to make the
matrix.  This was discovered by really high magnification of characters in
some of Gutenberg's printing.  It appears that the type was reused on
different pages but that no two versions of the same letter on any given
page were congruently identical.

William Overington

9 August 2002

Re: Tildes on vowels

2002-08-09 Thread William Overington

 character and to understand an indication of
the presence of any regular Unicode character superscripted in the original
document one would only need to have a Unicode font augmented with two arrow
glyphs in the appropriate code points.

Well, why not go ahead and decide on two code points within the Private Use
Area as values for  and XXXY, post them in this list and perhaps that
action will lead to that facility becoming available as a facility to
document transcribers all around the world.

If the code points were published in this manner, maybe a font and maybe a
UniPad soft keypad which use those code points will become available in
time, and so researchers transcribing documents in libraries around the
world would have a lasting enhancement of the facilities available to them.

This method would not produce a visually correct display, yet in order to
convey meaning in a research environment, this method could help in getting
the transcribing done and thus would be a valuable addition to the
facilities available.

William Overington

9 August 2002

Re: Digraphs as Distinct Logical Units

2002-08-03 Thread William Overington

 obligated to use the golden ligatures collection code points for
the direct access route.  Also, the golden ligatures collection does not
provide code points for all of the ligatures that might be needed by a font
designer, however, if anyone does want code points for some other ligatures
then I will be interested to try to add them into the golden ligatures
collection upon request.

In the mail list archive at http://www.unicode.org there are various
discussions about ligatures.  Recently there was some discussion about the
golden ligatures collection and about a rather fun occurrence which is
archived as The Respectfully Experiment.

William Overington

3 August 2002

http://www.users.globalnet.co.uk/~ngo

Re: Subscript Superscript

2002-07-31 Thread William Overington


Some time ago in this list, Mr Bernard Miller posted a note about his Bytext
system.

If one goes to http://www.bytext.org and then goes through to the
documentation page at http://www.bytext.org/documentation.htm one may
download a copy of the latest edition of The Bytext Standard.  I chose to
download the pdf file which is 606 kilobytes.

On pages 34 and 35 of that document are details of arrow parentheses
invented by Mr Miller.

On page 72 is a statement concerning intellectual property rights.

I feel that it would be very useful if these eight arrow parenthesis
characters are used in a Unicode compatible environment.

As some readers may know I have been researching on my courtyard codes
system.

http://www.users.globalnet.co.uk/~ngo/court000.htm

Courtyard codes are placed within the Private Use Area of Unicode.  The
above document being indexed from an index page about some of my other uses
of the Private Use Area.

http://www.users.globalnet.co.uk/~ngo/golden.htm

It occurs to me that if the eight arrow parenthesis characters were encoded
into my courtyard codes system, then that would be potentially of great
usefulness.

I am thinking in terms of U+F388 through to U+F38F being used for this
purpose, with the codes being assigned to the arrow parentheses in the order
in which Mr Miller lists them in The Bytext Standard.

If this happens then the way to express a subscript uppercase A character
would be as follows.

U+F38A U+0041 U+F38B

The U+0041 is the code for A in regular Unicode, so immediately there is a
general method for subscripting any Unicode character.  Indeed subscripts of
subscripts could be used by nesting the arrow parentheses.

For example, a subscript A subscript B could be expressed as follows.

U+0061 U+F38A U+0041 U+F38A U+0042 U+F38B U+F38B

The U+0061 is the code for a in regular Unicode and the U+0042 is the code
for B in regular Unicode.

Arrow parentheses allow a mathematical expression involving superscripts,
subscripts, integral limits, summation limits and various other items to be
expressed in a linear manner, which makes those expressions able to be
stored in a Unicode file in what is essentially a plain text storage format,
though I mention that this will not be plain text as such as it involves the
use of code points for what might be considered markup.

I know little about XML so I do not know whether this suggestion will be a
suitable solution for the requirement of the person who wrote to the Unicode
Consortium.

However, perhaps it will be a helpful suggestion.

Certainly using the codings which I suggest would involve use of code points
from the Private Use Area.  However, as the need is now, then even if the
arrow parenthesis characters are one day promoted to regular Unicode, the
use of Private Use Area characters now may be what is needed to achieve the
desired result.

By placing these code point ideas into this posting to the Unicode mail
list, they will be archived in the archives of the Unicode mail list and
also sent to many people interested in Unicode around the world.  So,
although they are only Private Use Area encodings, it is possible that these
encodings will be noted in many places by many people.  It is simply
speculation as to whether few or many people will choose to recognize such
code point allocations for their own uses.

The use of these code points would raise the question as to how a string
containing them should be displayed.  The idea is that in a plain text
editor mode, the arrow parenthesis characters would be displayed with the
glyphs shown by Mr Miller in The Bytext Standard.  In a graphical display,
the arrow parenthesis characters would not be displayed, yet would influence
how characters included between matching pairs of arrow parenthesis
characters are displayed.  This is no more complicated in principle than
viewing an HTML page in Internet Explorer then viewing the source code of
the HTML page in Notepad then going back to the Internet Explorer display.

Whether any font makers would add glyphs for the eight arrow parenthesis
characters into the code positions U+F388 though to U+F38F remains to be
seen, though I am cautiously optimistic in the matter.  Also the possibility
exists for the person who originally wrote to the Unicode Consortium to have
his or her own font produced in addition to any font maker making such a
font available.

William Overington

31 July 2002

-Original Message-
From: Magda Danish (Unicode) [EMAIL PROTECTED]
To: unicode [EMAIL PROTECTED]
Date: Tuesday, July 30, 2002 8:46 PM
Subject: Subscript  Superscript



 -Original Message-
 Date/Time:Tue Jul 30 12:26:40 EDT 2002
 Contact:  [EMAIL PROTECTED]
 Report Type:  FAQ Suggestion

 We need to know how to express a Subscript letter in Unicode.
 On your site, we've found in 2070-208E how to express a
 Superscript letter or number or a Subscript number, but there
 is no information about how to write a Subscript letter.
 We're

Teletext

2002-07-31 Thread William Overington


In the United Kingdom there is a widely used information system known as
teletext.  It is also used in many other countries.

Teletext is a digital technology used in conjunction with analogue
television systems.  Digital information is inserted in several of the
otherwise unused lines of the television signal within what is known as the
vertical blanking interval of the television picture.

In the United Kingdom the government is to eventually switch off all
analogue television broadcasts, as part of the already started process of
migration to digital television technology.  Thus teletext in its present
form will finish.  There are digital television text and graphics displaying
systems which may continue the teletext name, yet the original teletext
display format is likely to go.

Teletext started in the early 1970s and the currently implemented
specification essentially dates from 1976, (with the exception of the later
fast text linking system).

The government is thinking in terms of turning off the analogue
transmissions sometime between 2006 and 2010.

I am thinking that it would be a good idea to encode the archive copies of
teletext pages that exist into a Unicode compatible format for the future.
Teletext has been around for about a quarter of a century in more or less
its present form and within another quarter of a century that form might
well be gone completely.

I have looked in the Unicode mail list archive and found various items about
encoding teletext pages using existing Unicode characters.

I am here suggesting a different approach, a teletext archiving approach.

I suggest that, in a discussion within this mailing list, a Private Use Area
encoding for archiving teletext pages is agreed, with a view that eventually
it will be put as a proposal for promotion to regular Unicode, probably into
one of the higher planes.

The reason for this approach is that it will permit teletext pages to be
encoded in a plain text file within a document which discusses the
technology.  The teletext characters need to be implemented with the same
width as each other, whereas characters in a discussion document need to be
displayable with possibly different widths one from another.

I suggest, as a starting point for a discussion the following.

U+E200 through to U+E27F for the United Kingdom teletext character set 0x00
to 0x7F.

U+E280 through to U+E2FF to be used to define teletext characters defined in
other countries where those characters are not the same as in the United
Kingdom character set.  This means all of the German accented characters and
so on.  The notes for each encoding to include details of the location
within the 0x00 to 0x7F range where that character was originally encoded
and in which country or countries it was so encoded.

All teletext pages could then be encoded using the above characters.

In addition, the following could be used.

Where a character is to be displayed in contiguous graphics mode, and is a
graphic, not a capital letter push through, the character may be represented
using U+E320 to U+E33F and U+E360 and U+E37F.

Where a character is to be displayed in separated graphics mode, and is a
graphic, not a capital letter push through, the character may be represented
using U+E3A0 to U+E3BF and U+E3E0 and U+E3FF.

This will enable a good idea of the look of a teletext page to be displayed
using an ordinary TrueType font in a wordprocessing document.  Naturally
there is also scope for special teletext displaying programs to be produced
so that graphics with different combinations of foreground and background
colours can be displayed properly.

I feel that this encoding will be useful as a stepping stone to a permanent
regular Unicode encoding of teletext characters for archiving purposes.

Hopefully this initiative will encourage people to get out any old 5 1/4
inch floppy discs that they may have and transfer any teletext pages saved
upon them into an archived form.

Readers interested in teletext might like to have a look at the following.

http://teletext.mb21.co.uk

I am hopeful that by having a specific encoding within Unicode for teletext
that the archives of teletext pages that exist will be conserved for
posterity and that an important aspect of social history will be preserved
for the future.

Does anyone know if the early graphic art from Oracle (Oracle being the name
of the then ITV teletext service as well as of the technology, being an
acronym for Optional Reception of Announcements by Coded Line Electronics)
in the mid 1970s has survived?

Also, does anyone archive Viewdata pages?  Viewdata was not a broadcasting
technology but provided pages with a compatible display format to teletext
which pages could be accessed over a telephone line connection.

William Overington

31 July 2002

Re: Chromatic text, ligatures and Fraktur ligatures.

2002-07-09 Thread William Overington


Doug Ewell wrote as follows.

Nobody with the intelligence of a tree could possibly read the
character-glyph document and come away with the impression that font
styles, sizes, colors, etc. are central to the notion of what belongs
in character encoding.  Intelligence is clearly not the problem here.

Actually, I did not write that.

What I wrote was as follows.

quote of what I previously wrote

Courtyard codes and codes for chromatic fonts, in my opinion, fall within
the definition of character in Annex B of that document.  This is not me
finding some definition tucked away obscurely, it is central.  The
introduction section of the document states as follows.

quote

This Technical Report is written for a reader who is familiar with the work
of SC 2 and SC 18.  Readers without this background should first read Annex
B, Characters and Annex C, Glyphs.

end quote

end quote of what I previously wrote

I have been referred to the ISO/IEC TR 15285 document about characters and
glyphs and yet no one seems willing to discuss the definition of character
that is clearly stated in that document, people just keep saying that markup
exists, as if the very existence of XML in some way precludes single code
point colour codes and single code point formatting codes and so on.

The quote of what I previously wrote is saying that I have not found that
definition tucked away obscurely, that definition is central to the ISO/IEC
TR 15285 document.  That is, I am not trying to push my ideas for colour
codes through some obscure legal and technical loophole, I am saying that
they are entirely consistent with the definition of character in the ISO/IEC
TR 15285 document, where that definition is central to the ISO/IEC TR 15285
document.

As you have already made your decision about my research and indeed about
me, then I am not going to try to convince you otherwise and this posting is
not intended to do so.  I am merely answering a specific accusation as to my
ideas and my personality.

Unfortunately various responses to my research have been on other than the
scientific aspects of my research and unfortunately in human society that
type of response outweighs intellectual discussions on the facts, such as
the specific fact of the definition of character in the ISO/IEC TR 15285
document which no one responding to my posts seems willing to discuss.

I feel that if the definition of character in the ISO/IEC TR 15285 document
is considered, with the meanings of the words in that definition considered,
then scientific progress can be made.  If people are simply going to
question my motives and my personality and not discuss the definition of
character in the ISO/IEC TR 15285 document, then that is just an example of
the way that human society unfortunately works, in that scientific ideas can
be dismissed without explanation by bringing in a questioning of the
personality of the person suggesting them.

William Overington

9 July 2002

1 2 >

1 - 100 of 199 matches

Mail list logo