Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-23 Thread Mark E. Shoulson via Unicode

On 08/23/2018 06:48 AM, Asmus Freytag (c) via Unicode wrote:

On 8/23/2018 3:28 AM, "Jörg Knappen" wrote:

Asmus,
I know your style of humor, but to keep it straight:
All known human languages, even Piraha, have pronouns for "I" and "you".


And languages like Japanese, tend to use them - mostly not.

Even if the concepts are known, and can be named, there are deep 
differences across languages concerning the need  or conventions for 
demarcating them with words in any given context.


Replacing words by symbols is not going to fix this - the only way to 
get a 'universal' system of symbolic expression is to invent a new 
language, with its own conventions for use of these symbols in any 
given context.




It isn't like replacing words with symbols hasn't been tried... I think 
Francis Lodwick had a "universal symbology" like this in the works in 
the 1600s.


~mark



Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-23 Thread Mark E. Shoulson via Unicode
Still, pronouns may be universal, but their features aren't... Pronouns 
in Japanese are not a closed class, and it is not uncommon to use a 
person's name/title instead of "you".  Happens in English and other 
languages too, with extremely formal speech, even down to conjugating 
with 3rd-person verb forms.  (it's really cool to see the mid-sentence 
back-and-forth shifting in Biblical Hebrew, e.g. Genesis chapter 44.)  
All of which is to say, as Asmus did, that even "I" and "you" are not 
interchangeable pieces between languages, easily symbolized by a single 
"fits-all-languages" placeholder.


~mark

On 08/23/2018 06:28 AM, "Jörg Knappen" via Unicode wrote:

Asmus,
I know your style of humor, but to keep it straight:
All known human languages, even Piraha, have pronouns for "I" and "you".
--Jörg Knappen
*Gesendet:* Montag, 20. August 2018 um 16:20 Uhr
*Von:* "Asmus Freytag via Unicode" 
*An:* unicode@unicode.org
*Betreff:* Re: Thoughts on working with the Emoji Subcommittee (was 
Re: Thoughts on Emoji Selection Process)


What about languages that don't have or don't use personal pronouns. 
Their speakers might find their use odd or awkward.


The same for many other grammatical concepts: they work reasonably 
well if used by someone from a related language, or for linguists 
trained in general concepts, but languages differ so much in what they 
express explicitly that if any native speaker transcribes the features 
that are exposed (and not implied) in their native language it may not 
be what a reader used to a different language is expecting to see.


A./





Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-23 Thread Julian Wels via Unicode
I think Blissymbols could be a separate, well-defined script in Unicode
because they are already more or less well defined by their respective
groups. This community of interest can lobby for these implementations as a
whole instead of multiple individuals separately.

Emoji were born in quite a different way and are in no way as well defined
as Blissymbols are for example. There is no self-governing forum of people
to discuss the future of emoji and forthcoming additions. Obviously,
because they gained international attention just as they were added to
Unicode-Standard but also maybe because "working with the Emoji
Subcommittee" is rather hard.

The conversation about Blissymbols made me think about a solution on how to
solve the current communication problem, although it might be a bit radical:
Why not remove the authority to propose new emojis from the ESC and give it
to a dedicated, public Emoji-Community. Such a community could formulate
additional guidelines for upcoming emojis, draft roadmaps and send a
quarterly proposal to the ESC for individual approval. Unicode Members
could still express ideas and exercise power through participating in the
community and appointing people to the ESC.

[image: diagram.png]

This change would remove pressure and workload from the ESC while retaining
most of the control, especially the last word, but the Emoji-Standart would
benefit from a dedicated community.

I'm just putting this out there. What are your thoughts on this? Do you
think this is unreasonable, or achievable?

Julian 

On Tue, Aug 21, 2018 at 3:25 PM James Kass via Unicode 
wrote:

> Rebecca Bettencourt wrote,
>
> > Why don't we just get Blissymbolics encoded as it is?
>
> The Pipeline still has the Everson proposal from 1998, but Blissymbols
> are still in the Pipeline.
>
> Scripts Encoding Initiative
> ( http://linguistics.berkeley.edu/sei/ )
>  page,
> http://linguistics.berkeley.edu/sei/scripts-not-encoded.html
> shows Blissymbols and links the same proposal.
>
> Blissymbolics Communication International,
> http://www.blissymbolics.org/
> will likely produce the next proposal.
>
> Both Scripts Encoding Initiative and Blissymbolics Communication
> International depend upon funding.
>


Emacs Verbose Character Entry (was Private Use Areas)

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 21:47:03 +0200
"Janusz S. Bień via Unicode"  wrote:

> My needs are very simple, for example C-x 8 Return LATIN CAPITAL
> LETTER A WITH MACRON AND BREVE [MUFI] should yield the character with
> the code E010. I can provide the list of names and codes.

While it should obviously yield, if anything,  or
 for 'LATIN CAPITAL LETTER A WITH MACRON AND
BREVE', it would probably be more important to recognise formal
aliases, such as 'LAO LETTER LO' for the input of the Lao letter lo
ling (U+0EA5 LAO LETTER LO LOOT), not to be be confused with the Lao
letter lo lot (a.k.a. ro rot), U+0EA5 LETTER LO LING.

For , I prefer to type "A\_M_X", but then I learnt
XSAMPA. 

Richard.



Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 20:34:20 +0200
"Janusz S. Bień via Unicode"  wrote:

> This is a typical but IMHO obsolete perspective. Fonts are for
> *rendering*, new characters and variants are more and more often
> needed for *input* of real life old texts with sufficient precision.

If we're talking about glyphs which don't actually correspond to new
characters, then that sounds like a good case for private use variation
selectors. To quote Tully, "Abusus non tollit usum".

Richard.



Re: Private Use areas

2018-08-23 Thread Janusz S. Bień via Unicode
On Thu, Aug 23 2018 at 22:17 +0300, e...@gnu.org writes:
>> Date: Thu, 23 Aug 2018 20:30:52 +0200
>> Cc: Richard Wordingham 
>> From: "Janusz S. Bień via Unicode" 
>> 
>> >> and in Emacs - to my disappointed it looks like the Unicode data are
>> >> set at the compile time, but perhaps this can be negotiated with the
>> >> developers.
>> >
>> > Can you be more specific?
>> 
>> I often search characters by name with C-x 8 Return. I would like to use
>> it also for MUFI characters, I have already the name list (the example
>> directory at https://bitbucket.org/jsbien/unihistext/). I haven't looked
>> very closely into the problem and don't remember now the details, but my
>> impression was that it's not simple.
>
> What is "it" in the last sentence?  IOW, what is not simple about that
> with Emacs?

I'm very glad you join the discussion.

My needs are very simple, for example C-x 8 Return LATIN CAPITAL LETTER
A WITH MACRON AND BREVE [MUFI] should yield the character with the code
E010. I can provide the list of names and codes.

>
> It is true that the Unicode related data is produced at build time,
> but only some of that is actually recorded in the Emacs binary, the
> rest is loaded upon demand.  But all the data is stored in data
> structures that are mutable, given some Lisp programming.

I never was fluent in Lisp programming and by now I forgot almost
everything I knew, so it's not a task for me. I was thinking about
submitting a feature request, but I forgot also the proper procedures to
do it. Moreover I had the impression that I'm the only person who needs
it...

>
> (It is not clear to me which part of the Unicode data you would like
> to change; are you talking about adding characters to the list of
> those defined by Unicode?  If you are using the PUA codepoints, it's
> possible that you will need to update Emacs's notion of PUA as well.)

Yes, I would like the PUA codepoints to be handled analogically as the
proper ones. What do you mean by Emacs's notion of PUA?

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



Re: Private Use areas

2018-08-23 Thread Rebecca Bettencourt via Unicode
On Thu, Aug 23, 2018 at 5:10 AM, Janusz S. Bień  wrote:

> > I already provide this myself for my uses of the PUA as well as the
> > CSUR and any vendor-specific agreements I can find:
> >
> > http://www.kreativekorp.com/charset/PUADATA/
>
> I would prefer to see the data in a repository, so others can can
> comment and contribute.
>

That is actually my intent for the future. Though it's not quite ready yet:

https://github.com/kreativekorp/charset/tree/master/puadata

That's the data in a "pre-compiled" form; it's turned into a "proper"
PUADATA directory using this script:

https://github.com/kreativekorp/charset/blob/master/bin/build-public.py


As for "any vendor-specific agreements", do MUFI and LINCUA qualify?
>

I certainly do want to see MUFI and LINCUA provided in this form, but I put
them in a different category along with CSUR. I basically have three
categories of PUA agreements:

Fonts - PUA assignments specific to a font family, e.g. Constructium,
Fairfax, Nishiki-teki, Quivira, Junicode, etc.

Public - PUA agreements meant to be widely used, e.g. CSUR, UCSUR, MUFI,
LINCUA, etc.

Vendors - PUA assignments meant to be used by a single vendor or platform,
e.g. Adobe, Apple, etc. but also Linux, MirOS, etc.

Thank you for those links by the way. I had tried to find charts for MUFI
in the past but had somehow been unsuccessful.


> Of course there is no way to get software to use this information.
>
> What kind of software do you have in mind?
>

Unicode-related utilities, text editors to start with. You pretty much hit
the nail on the head with uniname and emacs as examples. :)


Re: Private Use areas

2018-08-23 Thread Janusz S. Bień via Unicode
On Thu, Aug 23 2018 at 17:26 +0100, unicode@unicode.org writes:
> On Thu, 23 Aug 2018 17:39:15 +0200
> Philippe Verdy via Unicode  wrote:
>
>> You make a confusion: I do not propose "hacking" existing codes, but
>> instead adding new codes for private variations. It's then up to PUV
>> sequence authors to choose an appropropriate base character that can
>> have the properties they want to be inherited by the private-use
>> variation sequence, or to choose a base character that will provide
>> some reasonnable reading if rendererd as is (by renderers or fonts
>> not implementing the pricate viaration sequence, give nthat they will
>> also append a symbol for the PUV itself after the standard character).
>
> Variation sequences cannot be used to add new characters.  Most PUA
> characters are used to represent new characters.  A
> standard-conformant private variation sequence would generally achieve
> the same effect as could be achieved by a font feature (typically one
> of the cvxx, though possibly one of the ssxx),

This is a typical but IMHO obsolete perspective. Fonts are for
*rendering*, new characters and variants are more and more often needed
for *input* of real life old texts with sufficient precision.

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien


Re: Private Use areas

2018-08-23 Thread Janusz S. Bień via Unicode
On Thu, Aug 23 2018 at 17:11 +0100, unicode@unicode.org writes:
> On Thu, 23 Aug 2018 14:10:35 +0200
> "Janusz S. Bień via Unicode"  wrote:
>
>> What kind of software do you have in mind?
>> 
>> I'm primarily interested in the locally developed programs
>> 
>> https://bitbucket.org/jsbien/unihistext/
>> 
>> https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/
>
> It looks as though the security certificates are awry - has someone
> forgotten to pay the protection money to the right people?  (Firefox
> objects with "The page you are trying to view cannot be shown because
> the authenticity of the received data could not be verified.")

I see no such problems with Firefox ESR 52.9.0 on Debian
testing. Moreover the program reports that the certificate is valid till
04/21/2020.

>
>> and in Emacs - to my disappointed it looks like the Unicode data are
>> set at the compile time, but perhaps this can be negotiated with the
>> developers.
>
> Can you be more specific?

I often search characters by name with C-x 8 Return. I would like to use
it also for MUFI characters, I have already the name list (the example
directory at https://bitbucket.org/jsbien/unihistext/). I haven't looked
very closely into the problem and don't remember now the details, but my
impression was that it's not simple.

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



Re: Private Use areas

2018-08-23 Thread Philippe Verdy via Unicode
Le jeu. 23 août 2018 à 18:31, Richard Wordingham via Unicode <
unicode@unicode.org> a écrit :

> On Thu, 23 Aug 2018 17:39:15 +0200
> Philippe Verdy via Unicode  wrote:
>
> > You make a confusion: I do not propose "hacking" existing codes, but
> > instead adding new codes for private variations. It's then up to PUV
> > sequence authors to choose an appropropriate base character that can
> > have the properties they want to be inherited by the private-use
> > variation sequence, or to choose a base character that will provide
> > some reasonnable reading if rendererd as is (by renderers or fonts
> > not implementing the pricate viaration sequence, give nthat they will
> > also append a symbol for the PUV itself after the standard character).
>
> Variation sequences cannot be used to add new characters.


Did you remember I did not speak about existing variation sequences ? Only
about the new encocing do provite use variation sequences which do not have
to obey the policy of exising VS, and whose purpose whould be to inherit
most properties (notably direction, breaking, spacing, general category of
another existing character).



> Most PUA
> characters are used to represent new characters.


I did not speak as well about PUAs.


Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 17:39:15 +0200
Philippe Verdy via Unicode  wrote:

> You make a confusion: I do not propose "hacking" existing codes, but
> instead adding new codes for private variations. It's then up to PUV
> sequence authors to choose an appropropriate base character that can
> have the properties they want to be inherited by the private-use
> variation sequence, or to choose a base character that will provide
> some reasonnable reading if rendererd as is (by renderers or fonts
> not implementing the pricate viaration sequence, give nthat they will
> also append a symbol for the PUV itself after the standard character).

Variation sequences cannot be used to add new characters.  Most PUA
characters are used to represent new characters.  A
standard-conformant private variation sequence would generally achieve
the same effect as could be achieved by a font feature (typically one
of the cvxx, though possibly one of the ssxx), though using font
features would be fiddlier and have more limited support, and variation
sequences would facilitate data processing.

Richard.


Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Thu, 23 Aug 2018 14:10:35 +0200
"Janusz S. Bień via Unicode"  wrote:

> What kind of software do you have in mind?
> 
> I'm primarily interested in the locally developed programs
> 
> https://bitbucket.org/jsbien/unihistext/
> 
> https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/

It looks as though the security certificates are awry - has someone
forgotten to pay the protection money to the right people?  (Firefox
objects with "The page you are trying to view cannot be shown because
the authenticity of the received data could not be verified.")

> and in Emacs - to my disappointed it looks like the Unicode data are
> set at the compile time, but perhaps this can be negotiated with the
> developers.

Can you be more specific?  For Indic rearrangement I had to define
syllables myself with definitions which I then added to
composition-function-table.  Unfortunately, I then hit the problem
that I had to define Indic rearrangement myself, and OpenType fonts
fall into several incompatible families, which is why I haven't
released a general solution.  My emacs kit for Tai Tham is given via
http://www.wrdingham.co.uk/lanna/toolkit.html (a probable kinsman got
the 'o'), but there are a lot of odds and ends that need sorting out.

I would expect that you would be able to override any relevant
'compiler' settings via your Emacs start up file - I expect Eli
Zaretski will be along soon with more details.  Of course, you could
always revert to the old tradition and recompile Emacs yourself -
though it may need something like MinGW to compile for Windows.

Richard.



Re: Private Use areas

2018-08-23 Thread Philippe Verdy via Unicode
You make a confusion: I do not propose "hacking" existing codes, but
instead adding new codes for private variations. It's then up to PUV
sequence authors to choose an appropropriate base character that can have
the properties they want to be inherited by the private-use variation
sequence, or to choose a base character that will provide some reasonnable
reading if rendererd as is (by renderers or fonts not implementing the
pricate viaration sequence, give nthat they will also append a symbol for
the PUV itself after the standard character).

Also I do not want to change anything to any existing variation sequences
(using VS1 and so on) and their encoding policies, requiring a prior
registration and standardisation.

Le jeu. 23 août 2018 à 11:42, Richard Wordingham via Unicode <
unicode@unicode.org> a écrit :

> On Wed, 22 Aug 2018 11:58:58 +0200
> Philippe Verdy via Unicode  wrote:
>
> > For now there's still no way to have variant sequences unless they are
> > registered and standardized by Unicode but registration should be not
> > needed (forbidden) for sequences containing PUV.
>
> I believe this scheme is no worse than hack encodings that using Latin
> character codes for other characters.  These schemes often work.
> (Indeed, the currently best method of getting Tai Tham displayed as rich
> text that I can find is to use a transliteration-type encoding and a
> special font, though I can now get pretty close using the proper
> character codes in the order laid down in the proposals.)
>
> The major problems I can see with appropriating variation sequences
> are:
> (1) It might be restricted to base characters - I have no
> experimental evidence on whether this would happen.  Fonts can happily
> convert base characters to combining characters, though this works
> best if Latin line-breaking rules take effect.
>
> (2) The appropriated variation sequence might be assigned a meaning -
> but this is no worse than the general ambiguity of PUA characters.
>
> (3) Some base characters get special treatment.  For example, I had
> to change my transliteration scheme because hyphen-minus is treated
> specially by MS Edge - I was using it as a digraph disjunctor - and
> so clusters were not being formed.  In this case, I would have come
> unstuck as soon as line-wrapping started, so it was a bad choice anyway.
>
> Or are there significant renderers that deliberately ignore variation
> selectors in unregistered, unstandardised variation sequences?  I don't
> recall any problems from when we were discussing variation
> sequences for chess pieces.
>
> For supplementing a script, it might be best to start at
> VARIATION-SELECTOR-256, and work down if need be with specialist
> characters.
>
> Richard.
>


Re: Private Use areas

2018-08-23 Thread Janusz S. Bień via Unicode
On Tue, Aug 21 2018 at 11:23 -0700, unicode@unicode.org writes:
> On Tue, Aug 21, 2018 at 10:21 AM, Janusz S. Bień via Unicode 
>  wrote:
>
>  I think PUA users should provide the
>  properties of the characters used in a form analogical to the Unicode
>  itself, and the software should be able to use this additional
>  information.
>
> I already provide this myself for my uses of the PUA as well as the
> CSUR and any vendor-specific agreements I can find:
>
> http://www.kreativekorp.com/charset/PUADATA/

I would prefer to see the data in a repository, so others can can
comment and contribute.

As for "any vendor-specific agreements", do MUFI and LINCUA qualify?

https://folk.uib.no/hnooh/mufi/
http://andron-typeforum.xobor.de/t10f13-Towards-a-linguistic-corporate-use-area-LINCUA.html

>
> Of course there is no way to get software to use this information.

What kind of software do you have in mind?

I'm primarily interested in the locally developed programs

https://bitbucket.org/jsbien/unihistext/

https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/

and in Emacs - to my disappointed it looks like the Unicode data are set
at the compile time, but perhaps this can be negotiated with the
developers.

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-23 Thread Asmus Freytag (c) via Unicode

On 8/23/2018 3:28 AM, "Jörg Knappen" wrote:

Asmus,
I know your style of humor, but to keep it straight:
All known human languages, even Piraha, have pronouns for "I" and "you".


And languages like Japanese, tend to use them - mostly not.

Even if the concepts are known, and can be named, there are deep 
differences across languages concerning the need  or conventions for 
demarcating them with words in any given context.


Replacing words by symbols is not going to fix this - the only way to 
get a 'universal' system of symbolic expression is to invent a new 
language, with its own conventions for use of these symbols in any given 
context.


A./


--Jörg Knappen
*Gesendet:* Montag, 20. August 2018 um 16:20 Uhr
*Von:* "Asmus Freytag via Unicode" 
*An:* unicode@unicode.org
*Betreff:* Re: Thoughts on working with the Emoji Subcommittee (was 
Re: Thoughts on Emoji Selection Process)


What about languages that don't have or don't use personal pronouns. 
Their speakers might find their use odd or awkward.


The same for many other grammatical concepts: they work reasonably 
well if used by someone from a related language, or for linguists 
trained in general concepts, but languages differ so much in what they 
express explicitly that if any native speaker transcribes the features 
that are exposed (and not implied) in their native language it may not 
be what a reader used to a different language is expecting to see.


A./





Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-23 Thread Jörg Knappen

Asmus,

 

I know your style of humor, but to keep it straight:

 

All known human languages, even Piraha, have pronouns for "I" and "you".

 

--Jörg Knappen

 


Gesendet: Montag, 20. August 2018 um 16:20 Uhr
Von: "Asmus Freytag via Unicode" 
An: unicode@unicode.org
Betreff: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

 

What about languages that don't have or don't use personal pronouns. Their speakers might find their use odd or awkward.

The same for many other grammatical concepts: they work reasonably well if used by someone from a related language, or for linguists trained in general concepts, but languages differ so much in what they express explicitly that if any native speaker transcribes the features that are exposed (and not implied) in their native language it may not be what a reader used to a different language is expecting to see.

A./

 

 






Re: Private Use areas

2018-08-23 Thread Richard Wordingham via Unicode
On Wed, 22 Aug 2018 11:58:58 +0200
Philippe Verdy via Unicode  wrote:

> For now there's still no way to have variant sequences unless they are
> registered and standardized by Unicode but registration should be not
> needed (forbidden) for sequences containing PUV.

I believe this scheme is no worse than hack encodings that using Latin
character codes for other characters.  These schemes often work.
(Indeed, the currently best method of getting Tai Tham displayed as rich
text that I can find is to use a transliteration-type encoding and a
special font, though I can now get pretty close using the proper
character codes in the order laid down in the proposals.)

The major problems I can see with appropriating variation sequences
are:
(1) It might be restricted to base characters - I have no
experimental evidence on whether this would happen.  Fonts can happily
convert base characters to combining characters, though this works
best if Latin line-breaking rules take effect.

(2) The appropriated variation sequence might be assigned a meaning -
but this is no worse than the general ambiguity of PUA characters.

(3) Some base characters get special treatment.  For example, I had
to change my transliteration scheme because hyphen-minus is treated
specially by MS Edge - I was using it as a digraph disjunctor - and
so clusters were not being formed.  In this case, I would have come
unstuck as soon as line-wrapping started, so it was a bad choice anyway.

Or are there significant renderers that deliberately ignore variation
selectors in unregistered, unstandardised variation sequences?  I don't
recall any problems from when we were discussing variation
sequences for chess pieces.

For supplementing a script, it might be best to start at
VARIATION-SELECTOR-256, and work down if need be with specialist
characters.

Richard.