Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread wjgo_10...@btinternet.com via Unicode


Joel Kalvesmaki asks nine questions, six in the first block and three in 
the second block.
Numbering from 1 through to 9 in the order that they are asked, I do 
not, at present understand the question for many of them and I can, at 
present, only answer question 7 definitively. Some questions may need an 
answer in two parts, one of the parts about my specific project, and the 
other part about if one or more people also decide to have his or her 
own encoding space in a similar manner.
I realize that not even understanding the question at this time may not 
sound very good to just some of the people who do understand the 
question, but I am not someone who knowingly purports that he knows what 
he is talking about when he does not. I am a researcher and as I am now 
on awareness of these questions.I need to find out so that in the future 
I can answer such questions with a sound background knowledge of the 
topic.
It might be that I know of some matters but that I am not aware of the 
parlance used to describe them in the post to which I am replying..

So now to my thoughts on some of the questions.
1 to 4. I do not at present understand the question.
5. Perhaps, independent of each other, you bind !123 to a character 
semantically identical to one I've bound to !234. What rules are in 
place to allow interchangeability?
I am not sure this is the best possible answer, but with care the 
problem should not happen in the first place. I am thinking that people 
could perhaps avoid it happening in the first place by using an informal 
discussion method similar that used when proposing a new alt. group in 
the usenet system that was in widespread use before the web was 
invented.

6. I do not at present understand the question.
7. Or maybe you're not so much concerned about interoperability as are 
you are with extending the PUA block beyond its current limits?
No, absolutely not. I have used the Private Use Areas on a number of 
occasions and found them extremely useful to have available. Yet any 
assignment in not unique and, except in very limited special limited 
prearranged circumstances, interoperability is not possible. My research 
project is very much about interoperability with provenance. 
Interoperabilty with provenance is central to what I am trying fo 
achieve.

8. Something like SGML/XML entities?
Until mention in the post to which I am replying, I had never known of 
them.
9.  Couldn't you simply capitalize on the rules that already exist for 
entities?
From what I have read about them today, well, I suppose that I could, 
but that is not my approach and I am not going to use them.
My items are not emoji, but emoji are either expressed by an atomic 
character or by a sequence of atomic characters, such sequences decoded 
upon reception to produce a glyph. My proposed system uses sequences of 
atomic character such that such sequences could be decoded upon 
reception to produce localized output. A similar yet different process. 
I simply do not want, as a design choice, all that angled bracket stuff, 
it is just not what I am trying to do.


If anyone on this mailing list who understands some or all of what I do 
not, your comments in this thread would be very welcome please.
The first three links on my webspace are relevant to my research 
project.

http://www.users.globalnet.co.uk/~ngo/
The website is safe to use. It is hosted on a server run these days by 
Plusnet PLC, a United Kingdom internet service provider. It is not 
hosted on my computer.

William Overington
Saturday 15 February 2020



-- Original Message --
From: "via Unicode" 
To: wjgo_10...@btinternet.com
Cc: unicode@unicode.org
Sent: Saturday, 2020 Feb 15 At 10:11
Subject: Re: What should or should not be encoded in Unicode? (from Re: 
Egyptian Hieroglyph Man with a Laptop)

Hi William,

I don't fully understand your proposed encoding scheme (e.g., Is there a 
namespace each encoding scheme is bound to? How do namespaces get 
encoded? How are syntax strictures encoded?), but even then, presuming 
it's sound, you've said in the message before that this encoding space 
will enhance interoperability. What mechanism is in place to make my 
encoding space interoperable with yours? Perhaps, independent of each 
other, you bind !123 to a character semantically identical to one I've 
bound to !234. What rules are in place to allow interchangeability? What 
about one-to-many or many-to-many or vague or ambiguous mappings across 
encoding schemes, or mappings that we might reasonably contest?


Or maybe you're not so much concerned about interoperability as are you 
are with extending the PUA block beyond its current limits? Something 
like SGML/XML entities? Couldn't you simply capitalize on the rules that 
already exist for entities?


Best wishes,

jk
--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net 

On 2020-02-14 15:52, wjgo_10...@btinternet.com via Unicode wrote:
The 

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread via Unicode

Hi William,

I don't fully understand your proposed encoding scheme (e.g., Is there a 
namespace each encoding scheme is bound to? How do namespaces get 
encoded? How are syntax strictures encoded?), but even then, presuming 
it's sound, you've said in the message before that this encoding space 
will enhance interoperability. What mechanism is in place to make my 
encoding space interoperable with yours? Perhaps, independent of each 
other, you bind !123 to a character semantically identical to one I've 
bound to !234. What rules are in place to allow interchangeability? What 
about one-to-many or many-to-many or vague or ambiguous mappings across 
encoding schemes, or mappings that we might reasonably contest?


Or maybe you're not so much concerned about interoperability as are you 
are with extending the PUA block beyond its current limits? Something 
like SGML/XML entities? Couldn't you simply capitalize on the rules that 
already exist for entities?


Best wishes,

jk
--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net

On 2020-02-14 15:52, wjgo_10...@btinternet.com via Unicode wrote:
The solution is to invent my own encoding space. This sits on top of 
Unicode, could be (perhaps?) called markup, but it works!


It may be perilous, because some software may enforce the strict 
official code point limits.


I  have now realized that what I wrote before is ambiguous.

When I wrote "sits on top of Unicode" I was not meaning at some code
points above U+10 in the Unicode map, though I accept that it
could quite reasonably be read as meaning that.

My encoding space sits on top of Unicode in the sense that it uses a
sequence of regular Unicode characters for each code point in my
encoding space.

For example

∫⑦⑧①

or

!781

or

a character sequence of a base character, followed by a tag
exclamation mark followed by three tag digits and a cancel tag.

All three examples above have the same meaning.

∫⑦⑧① is useful as more unlikely otherwise than !123, though !123 is
easier to use and could be used in a GS1-128 barcode.

The tag sequence has the potential to become incorporated into Unicode
for universal standardization of unambiguous interoperability
everywhere. That is a long term goal for me.

The example above uses a three-digit code number. My encoding space
allows for various numbers of digits, with a minimum of three digits
and a much larger theoretical maximum. The most digits in use at
present in my research project in any one code number is six.

William Overington

Friday 14 February 2020


Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread wjgo_10...@btinternet.com via Unicode
The solution is to invent my own encoding space. This sits on top of 
Unicode, could be (perhaps?) called markup, but it works!


It may be perilous, because some software may enforce the strict 
official code point limits.


I  have now realized that what I wrote before is ambiguous.

When I wrote "sits on top of Unicode" I was not meaning at some code 
points above U+10 in the Unicode map, though I accept that it could 
quite reasonably be read as meaning that.


My encoding space sits on top of Unicode in the sense that it uses a 
sequence of regular Unicode characters for each code point in my 
encoding space.


For example

∫⑦⑧①

or

!781

or

a character sequence of a base character, followed by a tag exclamation 
mark followed by three tag digits and a cancel tag.


All three examples above have the same meaning.

∫⑦⑧① is useful as more unlikely otherwise than !123, though !123 is 
easier to use and could be used in a GS1-128 barcode.


The tag sequence has the potential to become incorporated into Unicode 
for universal standardization of unambiguous interoperability 
everywhere. That is a long term goal for me.


The example above uses a three-digit code number. My encoding space 
allows for various numbers of digits, with a minimum of three digits and 
a much larger theoretical maximum. The most digits in use at present in 
my research project in any one code number is six.


William Overington

Friday 14 February 2020




Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread Hans Åberg via Unicode

> On 13 Feb 2020, at 16:41, wjgo_10...@btinternet.com via Unicode 
>  wrote:
> 
> Yet a Private Use Area encoding at a particular code point is not unique. 
> Thus, except with care amongst people who are aware of the particular 
> encoding, there is no interoperability, such as with regular Unicode encoded 
> characters.
> 
> However faced with a need for interoperability for my research project, I 
> have found a solution making use of the Glyph Substitution capability of an 
> OpenType font.
> 
> The solution is to invent my own encoding space. This sits on top of Unicode, 
> could be (perhaps?) called markup, but it works!

It may be perilous, because some software may enforce the strict official code 
point limits.



What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-13 Thread wjgo_10...@btinternet.com via Unicode
Hans Åberg >>> From the point of view of Unicode, it is simpler: If the 
character is in use or have had use, it should be included somehow.


Shawn Steele >> That bar, to me, seems too low.  Many things are only 
used briefly or in a private context that doesn;t really require 
encoding.


Hans Åberg > That is a private use area for more special use.

I have used the Private Use Area, quite a lot over many years.

I have a licence for a fontmaking program, FontCreator. A good feature 
of the Windows operating system is that all installed fonts can be used 
in most installed programs. Private Use Area code points are official 
Unicode code points. These three factors together allow me to design and 
produce TrueType fonts for new symbols each encoded at a Private Use 
Area code point (a different code point for each such novel symbol), 
install the fonts, and use them in various programs, including a desktop 
publishing program and thereby make PDF (Portable Document Format) 
documents that include both ordinary text and the novel symbols. These 
PDF documents are then suitable for placing on the web and for Legal 
Deposit with The British Library.


Yet a Private Use Area encoding at a particular code point is not 
unique. Thus, except with care amongst people who are aware of the 
particular encoding, there is no interoperability, such as with regular 
Unicode encoded characters.


However faced with a need for interoperability for my research project, 
I have found a solution making use of the Glyph Substitution capability 
of an OpenType font.


The solution is to invent my own encoding space. This sits on top of 
Unicode, could be (perhaps?) called markup, but it works!


I am hoping that at some future time the results of my research will 
become encoded as an International Standard, and that my encoding space 
will then after that become integrated into Unicode, thus achieving 
fully standardized unique interoperable encoding as part of Unicode. 
Quite a dream, but the way to achieve such a fully standardized unique 
interoperable encoding as part of Unicode is from a technological point 
of view, quite straightforward. There are details of this in the 
Accumulated Feedback on Public Review Issue #408.


https://www.unicode.org/review/pri408/

Yet having my encoding space in this manner is just something that I 
have done on my own initiative. Anybody can have his or her own encoding 
space if he or she so chooses. With a little care and consideration for 
others these encodings need not clash one with another and all could 
even coexist in one document.


Having my own encoding space has enabled me to make progress with my 
research project.


William Overington

Thursday 13 February 2020