Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread John D Burger
 - Indentation codepoint, with no fixed defined graphical representation. For 
 indentation based programming languages. 
 
 That wouldn’t be compliant with existing languages and future languages might 
 use any existing character. 
 
 Because: 
 -- specific clients may want to show it different (for example as arrows, 
 lines etc., using another color): 
 
 Can’t good editors display tabs in a different color when required ? 

Lots of them already do, e.g. Emacs in various modes.

- John Burger
  MITRE

 
 --- browsers could let the web page creator let decide the visual 
 representation (character and size) via CSS 
 --- the same with editors, independent from the actual font 
 --- in case of visual impairment, the user could even change the accoustical 
 representation if the editor allows it 
 -- unlike a space symbol, it wouldn't need more than one character per 
 indentation 
 -- unlike tabs or space, it wouldn't be whitespace 
 -- unlike normal arrow characters, one could customize the length in an 
 editor and wouldn't have to insert extra spaces for a better visual imagery 
 
 - A codepoint for string literal quotes, that would spare one the escaping. 
 
 I rarely escape quotes. 
 In a text, I use ’ (U+2019) as an apostrophe and «»“”‘’ as quotes, so I don’t 
 need to escape them. 
 When I use PHP to generate some HTML code, I try to alternate simple and 
 double quotes as much as possible. That way I rarely need to escape them. 
 
 - A statement separator symbol. 
 
 To replace the semicolon in C and the languages based on its syntax? 
 
 - Other ideas? 
 
 Aren’t you trying to reinvent APL? 
 
 
 You may now think, this is highly specific and you are right. 
 However, so are EMOJI signs, in particular those like PINE DECORATION. 
 
 These days, there are a lot of tools to create small embedded scripting 
 languages and DSLs, which are used in-program in special editors. And there 
 is a lot of people using them. 
 Exactly these could really profit from such a codeblock instead of using 
 conventional ASCII subset characters. 
 Also, there is a lot of potential with really good text editors and IDEs 
 where semantics may matter a lot. 
 
 Excuse my english, I hope this was understandable. 
 
 Best regards, 
 
 A. Z. 
 ___ 
 Unicode mailing list 
 Unicode@unicode.org 
 http://unicode.org/mailman/listinfo/unicode 
 
 ___
 Unicode mailing list
 Unicode@unicode.org
 http://unicode.org/mailman/listinfo/unicode


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Mark Davis ☕️
On Tue, Feb 10, 2015 at 12:11 AM, Ken Whistler kenwhist...@att.net wrote:

 for the full context, and for the current 26x26 letter matrix which is
 the basis for the flag glyph implementations of regional indicator
 code pairs on smartphones.

 SC, SO, ST are already taken, but might I suggest putting in for
 registering
 AB for Alba? That one is currently unassigned.

 Yeah, yeah, what is the likelihood of BSI pushing for a Scots two-letter
 code?! But seriously, if folks are planning ahead for Scots independence
 or even some kind of greater autonomy, this is an issue that needs to
 be worked, anyway.

 In the meantime, let me reiterate that there is *no* formal relationship
 between TLD's and the regional indicator codes in Unicode (or the
 implementations
 built upon them). Well, yes, a bunch of registered TLD's do match the
 country
 codes, but there is no two-letter constraint on TLD's. This should already
 be apparent, as Scotland has registered .scot At this point there isn't
 even
 a limitation of TLD's to ASCII letters, so there is no way to map them
 to the limited set of regional indicator codes in the Unicode Standard.

 Not having a two letter country code for Scotland that matches the
 four letter TLD for Scotland might indeed be a problem for someone,
 but I don't see *this* as a problem that the Unicode Standard needs
 to solve.


​I want to add to that that there are already a fair number of ISO 2-letter
codes for regions that are administered as part of another country, like
Hong Kong. There are also codes for crown possessions like Guernsey. So
having a code for Scotland (and Wales, and N. Ireland) do not really break
precedent. But as Ken says, the best mechanism is for the UK to push for a
code in ISO and the UN.

Mark https://google.com/+MarkDavis

*— Il meglio è l’inimico del bene —*
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Ken Whistler
I think this discussion is confusing the need for separate syntactic 
functions

in formal language definitions with the need for *encoding* of characters.

The distinction between assignment and test for equality has been around for
decades in formal languages, and of course it is almost always carefully 
distinguished

in the formal syntax:

C, C++ and kindred

Use = for assignment.
Use == for equivalence operator.

Pascal and kindred

Use := for assignment.
Use = for equivalence operator.

Lisp

Assignment: let (a 6)
Equivalence evaluation: (= a 6)

And so on. The fact that these formal languages do not use a *single* 
distinct
character for each of these syntactic functions is not a formal defect 
-- there

are many, many concepts in formal languages which are defined using
sequences of characters, rather than a single character. As has already
been alluded to in this thread, trying to stack all functionality into 
single

character definitions heads back in the direction of relatively illegible
APL program text. It might have its place, but isn't much of a choice for
widely used general programming languages.

There are two basic issues with using sequences of (typically ASCII) 
characters

for fundamental operators:

1. It marginally complicates parsing.
2. If chosen badly, they can confuse programmers using the syntax.

#1 is basically trivial, as long as the formal syntax passes the bar of not
introducing syntactic ambiguity.

#2 is the *real* problem, imo. The use in C of = and == was badly 
designed

from the start, and is the source of bezillions of inadvertent programming
errors in practice.

But if a left arrow, for example, might be a better choice for an assignment
operator in a programming language, and a two-character ASCII operator
like := or - doesn't seem appropriate or causes other confusion, there
still isn't a character *encoding* issue here. Just use ←, which 
already exists (U+2190),

and is a fine left arrow!

What is *not* appropriate for Unicode consideration here is trying to
encode programming *functions* per se. That turns the problem on
its head really. There are lots and lots of symbols already defined in
the standard: it is the job of formal language designers to simply pick
from them and *define* their formal functions in their language design.

Just because the UTC occasionally invents new control functions and
encodes them in characters -- as for the bidirectional algorithm -- does
not mean that every new function conceived for a programming language
is automatically a character encoding problem. Coming to the UTC
looking to encode a new functional character on spec should be
a matter of *last* resort -- not a first resort. It requires a carefully
built case demonstrating a real use and showing that alternative
approaches using existing characters do not (and cannot) work.

--Ken

P.S. Arrow symbols like U+2190 have been in the Unicode Standard
since Unicode 1.0 in 1991. They are far, far more widely supported nowadays
than any new, language-specific functional symbol addition would be.
Even if the UTC agreed to such character additions at the next meeting
in May, its earliest opportunity for publication would be Unicode 10
in June, 2017. That amounts to a 26 year impedance mismatch for
implementations. Why would a designer of a new formal language
syntax want to buy into that kind of grief for character availability,
when there are hundreds of symbols in the standard to choose from
that have been encoded for decades now?



On 2/9/2015 8:41 AM, Andre Schappo wrote:



Let me take as an example the use of = in programming. The = is used 
for test of equality and assignment in various programming languages. 
The equality and assignment operations should have different 
characters. e.g.


U+XXX1 TEST FOR EQUALITY
U+XXX2 ASSIGNMENT OPERATOR

Initially the glyphs used for these characters could be = but then 
this mechanism can be used to transition to a new and less ambiguous 
visual representation. The new  visual representation could be 
something like


U+XXX1 TEST FOR EQUALITY =
U+XXX2 ASSIGNMENT OPERATOR ⬅

Such a visual and character distinction between the 2 functions must 
surely make it easier for those learning to program and for 
interpreter and compiler writers. I think it would also make for 
easier to read/understand program code.


André



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Shervin Afshar
 But then it would be incompatible from IDE to IDE, like Python is
incompatible using 2 spaces, 4 spaces and tabs.
 It's the data that is important, not the software.

Specifically talking about Python, we should not solve what PEP 8[1] is
intended for in Unicode. Pythonistas and their IDEs are encouraged to use
linters to address syntactical discrepancies. This, more or less, applies
to other programming language as well.

[1]: https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces

 You know, the fact that this consortium ever took emoji into
consideration immediately justifies to include everything everyone ever
wanted. There is no such thing as important data including emoji. :)

If you read the background information (in TR51 or elsewhere) on Unicode
emoji, you will see how common and widespread use of PUA by Japanese
providers introduced interoperability issues with the rest of the world.

And no...Addressing that major compatibility/interoperability issue (and
any future issue raised from address that) do not justify inclusion of
everything everyone ever wanted.


↪ Shervin

On Mon, Feb 9, 2015 at 4:55 AM, Alfred Zett alfre...@web.de wrote:

 OK, I will now try to answer all of you in one mail, otherwise it gets
 hard to overlook...

 Shervin Afshar:

 All of the requirements mentioned here can be (and are) implemented in
 higher levels of software (like IDEs). IMO, there isn't any need for adding
 new characters to Unicode to address these issues.

 But then it would be incompatible from IDE to IDE, like Python is
 incompatible using 2 spaces, 4 spaces and tabs.
 It's the data that is important, not the software.


 Additionally, people tend to forget that simply because Unicode is doing
 emoji out of compatibility (or other) requirements, it does not mean that
 now anything goes. I refer folks to TR51[1] (specifically sections 1.3,
 8, and Annex C).

 [1]: http://www.unicode.org/reports/tr51

  You know, the fact that this consortium ever took emoji into
 consideration immediately justifies to include everything everyone ever
 wanted. There is no such thing as important data including emoji. :)

 Jean-Francois Colson:

 I need a few tens of characters for a conlang I’m developping. ☺

 Except two or three control characters don't make a con language.
 Also, if you don't like con languages in Unicode, what's this:
 http://unicode.org/charts/PDF/U1F700.pdf

  The problem is that Unicode only encodes characters which are effectively
 used today or which have been used in the past. It doesn’t encode
 characters which could perhaps be used in a hypothetical new programing
 language in the future.

 So you want the font encoding scheme to be a limitating factor for new
 things?

 Pierpaolo Bernardi:

 How would your proposed character be displayed as plain text?

 There is no such thing as plain text.
 Even line breaks and tabs are a matter of interpretation. It's just that
 they usually have typographic semantics, even in programming editors, with
 all the side effects.

 In very simple (and with that I mean shitty or not even remotely
 programming oriented) editors, it may show like a control character, like ␄.

 Browsers and any editor passing the based on scintilla complexity mark
 of course should display something that makes more sense, like an arrow or
 ⍈ plus surrounding space.

  Unicode is a standard for plain text.  If you require a special IDE
 for your programming language then why use plain text at all?

 Because binary custom encoded databases or blob files are the death of
 interoperability.

 Konstantin Ritt:

 Easier than latin1, a layout one could find on [almost] every keyboard?
 Good luck.

 Also:

 Jean-Francois Colson:

 Hard to input? Not harder than the new symbols you’d like to propose.
 That’s only a matter of keyboard layout and input method.


 Indent by pressing tab and insert the literal thing by pressing . Nothing
 changes, the IDE/editor does the work on the fly.
 Just that you have clean semantics, interoperability and customizability.

 Beat that, APL. Where you would 10 key bindings or an annoying software
 keyboard.

  I’ve never used APL so I don’t remember the meanings of its symbols, but
 couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E APL
 FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a new
 programming language?

 That's a good idea.

 That still leaves the indentation character, which is harder than that,
 because one would want a control character with certain semantics.
 E.G.: For programming editors it would make sense to only allow it after
 line breaks and convert other occurences into tabs.

  If the IDE inputs your new character when you press tab, then your new
 character is a tab…

 Not if it detects the beginning of a line.

 Best regards


 A. Z.

 ___
 Unicode mailing list
 Unicode@unicode.org
 http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Markus Scherer
On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi 
andrea.giammar...@gmail.com wrote:

  if a cultural/language TLD is typed with Unicode RIS, then show the flag
 for these culture/language:


This does not work. The Unicode RIS are defined to be used in pairs, with
semantics according to corresponding ISO 3166 alpha2 codes. In your
examples, each successive pair will encode a flag.

If you want to represent every flag of every locality, you first have to
figure out how to catalog and label them. You are mentioning provinces, one
level down from nation states; I guess there are thousands of them. In much
of Europe, every little village http://de.wikipedia.org/wiki/Butterstadt
has its own flag and coat of arms. Where do you want the text encoding and
fonts to stop?

markus
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Doug Ewell
Frédéric Grosshans frederic dot grosshans at gmail dot com wrote:

 The including of emoji was a considerable debate here, with people
 strongly against and strongly for. The trick is that they were already
 used as digital characters by Japanese Telcos and their millions of
 customers. They were de facto encoded as characters in Japanese text
 messages. At the time of encoding, the spread of smartphones made them
 appear in other places (emails, web forums, etc.)

Sorry, I can't let the compatibility argument go unchallenged again.

It can be argued — and was, repeatedly and persuasively — that the
initial collection of emoji in Unicode 6.1 [1] were added for
compatibility with Japanese telco extensions to JIS.

But the additional emoji added to Unicode 6.2 and 7.0, and planned for
8.0, do not have even this provenance; they were added on foot of novel
proposals sent directly to Unicode, or (more recently) by popular
request. There is no longer any requirement that the robot faces and
burritos appear first in any sort of industry character set extension,
with which Unicode is then obliged to maintain compatibility.

[1] No, I am not counting the ARIB symbols or any other long-encoded
symbols that have been retroactively defined as emoji, to help
legitimize the latter.

Alfred Zett alfred underscore z at web dot de) replied:

 The trick is that one doesn't bargain with Telcos and similar
 criminals. Gotta drop them hard and the pest will go away from itself
 after five years or so.

This does not help to make a case for or against encoding of anything.

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Joan Montané
Thanks for your replies,


As far as I see, my informal request for expanding current RIS design
hasn't a good response. I understand it. Flags are cause of disputes, and
it isn't an issue for Unicode encode them.

IMHO keept tied to 2-alpha codes is a poor choice for users. May be
industry manufactures could find a better approach.

Best regards,
Joan Montané
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Alfred Zett

@ John D Burger:

And out of the sudden a war wages what counts as good editor. :D

@ Andre Schappo:

That's a good idea. We need it in the name of science and education. :D

William_J_G Overington:

Hi

You might like the following post.

http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0001.html

William

Hi, I'm really not sure what this is about, but it seems like an 
interface to deliver instructions to the rendering VM?


Martin v. Löwis:

So if you can't demonstrate usage, you should at least demonstrate
demand (rather than just claiming that there might be demand).

The problem is, you can't do that with the topic at hand.
Because most programmers don't even see the possibilities.

It's like asking a blind what colors look like. Although that may sound 
kind of arrogant.


Among language designers and people interested in stuff like this, there 
is only a small fraction that doesn't hold the ill-minded opinion that 
syntax doesn't matter at all.


Among those who care for syntax there is only a small fraction that 
really knows enough about Unicode. And who can blame them, I still see 
broken characters on a weekly base.

Among those there is only a small fraction that cares enough.
Among those there is only a small fraction that has the nerves/balls to 
put up with a consortium.
This small subset is a handful of people, like André, me and maybe 3 
other persons.


I don't really feel comfortable to sound that elitist, but in this case 
I dare say that the consortium shouldn't care for established 
popularity, the same way they should have handled emoji characters.


Best regards

A. Z.


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


About cultural/languages communities flags

2015-02-09 Thread Andrea Giammarchi
Hello everyone,
  I've had an interesting request [1] that makes sense to me, but I'd like
to understand Unicode position about it.

The TL;DR version of the request is the following:

There are communities, let's take Scottish people as example, that have
even a domain but not an emoji flag.

Some flag s related project adopted more than what we have now in emoji,
inclucing 239 flags: http://www.famfamfam.com/archive/flag-icons-released/

The proposal is quite simple, and I am quoting from the request:

 if a cultural/language TLD is typed with Unicode RIS, then show the flag
for these culture/language:

 -- it shows Scottish flag
 -- it shows a Welsh flag
 -- it shows a Breton flag
 -- it shows Catalan flag
 -- it shows a Basque flag
 -- it shows a Gallician flag


Thanks in advance for any sort of outcome.

Best Regards


[1] https://github.com/twitter/twemoji/issues/40
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Doug Ewell
Shervin Afshar shervinafshar at gmail dot com wrote:

 There is no longer any requirement that the robot faces and
 burritos appear first in any sort of industry character set
 extension, with which Unicode is then obliged to maintain
 compatibility.

 Only if you don't consider existing usage and popular requests as
 requirement and precedence; for example Gmail had Robot Face for a
 long time.

I said there was no longer a requirement *that the items appear first in
an industry character set extension*, right?

In what character encoding standard, or extension, does ROBOT FACE
appear? Gmail has it is not a character encoding standard. Neither is
People want to see it.

Most popularly requested, as a criterion for adding a character, is
absolutely new to Unicode. Earlier I wrote privately to a Unicode
officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON
DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no
reply. (What, you've forgotten the ice-bucket craze already? That's
exactly why most popular at the moment wasn't supposed to be a
criterion.)

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Frédéric Grosshans
Le 9 févr. 2015 20:27, Doug Ewell d...@ewellic.org a écrit :


 Sorry, I can't let the compatibility argument go unchallenged again.


I stand corrected (and I should have known better! )
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Shervin Afshar

 I said there was no longer a requirement *that the items appear first in
 an industry character set extension*, right?


The issue is with your very rigid interpretation of the criteria for
encoding new symbols. Is appearing in an industry character set extension
an official phrasing that you keep referring to?

In what character encoding standard, or extension, does ROBOT FACE
 appear? Gmail has it is not a character encoding standard. Neither is
 People want to see it.


Robot Face is available on Gmail (GChat), Facebook, and Twitch among others
(calculating the size of user community is left as an assignment for the
reader). That's enough usage for consideration by the UTC even if the
symbol is not present in a character encoding standard. Also, since Unicode
is an industry standard maintained by industry members (among others), then
if there is enough request to these corporations from communities of users,
then there might be some reason for considering those symbols. I think
that's the case for the newer symbols.


 Most popularly requested, as a criterion for adding a character, is
 absolutely new to Unicode. Earlier I wrote privately to a Unicode
 officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON
 DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no
 reply. (What, you've forgotten the ice-bucket craze already? That's
 exactly why most popular at the moment wasn't supposed to be a
 criterion.)


IMO, Unicode officers seems to have low patience for such sentiments. You
might want to reconsider your tone. There is a time and place for sarcasm.


↪ Shervin

On Mon, Feb 9, 2015 at 12:16 PM, Doug Ewell d...@ewellic.org wrote:

 Shervin Afshar shervinafshar at gmail dot com wrote:

  There is no longer any requirement that the robot faces and
  burritos appear first in any sort of industry character set
  extension, with which Unicode is then obliged to maintain
  compatibility.
 
  Only if you don't consider existing usage and popular requests as
  requirement and precedence; for example Gmail had Robot Face for a
  long time.

 I said there was no longer a requirement *that the items appear first in
 an industry character set extension*, right?

 In what character encoding standard, or extension, does ROBOT FACE
 appear? Gmail has it is not a character encoding standard. Neither is
 People want to see it.

 Most popularly requested, as a criterion for adding a character, is
 absolutely new to Unicode. Earlier I wrote privately to a Unicode
 officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON
 DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no
 reply. (What, you've forgotten the ice-bucket craze already? That's
 exactly why most popular at the moment wasn't supposed to be a
 criterion.)

 --
 Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Andrea Giammarchi
Thanks, that was somehow indeed my very first concern. Everyone could claim
an emoji, at that point.

Enough info for me so far, so thanks again.

Best Regards

On Mon, Feb 9, 2015 at 8:16 PM, Markus Scherer markus@gmail.com wrote:

 On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi 
 andrea.giammar...@gmail.com wrote:

  if a cultural/language TLD is typed with Unicode RIS, then show the
 flag for these culture/language:


 This does not work. The Unicode RIS are defined to be used in pairs,
 with semantics according to corresponding ISO 3166 alpha2 codes. In your
 examples, each successive pair will encode a flag.

 If you want to represent every flag of every locality, you first have to
 figure out how to catalog and label them. You are mentioning provinces, one
 level down from nation states; I guess there are thousands of them. In much
 of Europe, every little village http://de.wikipedia.org/wiki/Butterstadt
 has its own flag and coat of arms. Where do you want the text encoding and
 fonts to stop?

 markus

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Michael Everson
I like symbols a lot. But I know that I and a number of people have been 
thinking that too much emphasis is being put on emoji. 

Michael Everson * http://www.evertype.com/


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Alfred Zett


Doug Ewell:
Most popularly requested, as a criterion for adding a character, is 
absolutely new to Unicode. Earlier I wrote privately to a Unicode 
officer about whether PERSON TAKING SELFIE and GIRL TWERKING and 
PERSON DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got 
no reply. (What, you've forgotten the ice-bucket craze already? That's 
exactly why most popular at the moment wasn't supposed to be a 
criterion.)


There is much truth in this.

I'll now leave the discussion, because it doesn't lead anywhere.

Best regards,

A. Z.


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Markus Scherer
On Mon, Feb 9, 2015 at 1:11 PM, Joan Montané j...@montane.cat wrote:

 AFAIK, this is done in font side. Emoji flags are just ligatures, so a
 font can provide a ligature for 4 RIS characters.


Technically true, but a font that violates the encoding standard would
cause large problems. Imagine a font that ligates letters 't' and 'h' and
displays an Egyptian hieroglyph for the combination.

What's the way for encoding them in Unicode standard?


In principle, the way for encoding anything in the Unicode Standard is to
write a well-formed proposal, and convince the Unicode Technical Committee
and ISO JTC1/SC2 that the proposal has merit.

However, I would much prefer if everyone spent their considerable energy on
upgrading protocols (e.g., IETF RFCs for email subject lines) and lobby
relevant vendors (e.g., chat services  social network messages) to support
images embedded in the text stream, ideally with scaling and other behavior
that would make them behave somewhat text-like.

Best regards,
markus
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Frédéric Grosshans

Le 09/02/2015 13:55, Alfred Zett a écrit :


Additionally, people tend to forget that simply because Unicode is 
doing emoji out of compatibility (or other) requirements, it does not 
mean that now anything goes. I refer folks to TR51[1] (specifically 
sections 1.3, 8, and Annex C).


[1]: http://www.unicode.org/reports/tr51

You know, the fact that this consortium ever took emoji into 
consideration immediately justifies to include everything everyone 
ever wanted. There is no such thing as important data including emoji. :)
The including of emoji was a considerable debate here, with people 
strongly against and strongly for. The trick is that they were already 
used as digital characters by Japanese Telcos and their millions of 
customers. They were de facto encoded as characters in Japanese text 
messages. At the time of encoding, the spread of smartphones made them 
appear in other places (emails, web forums, etc.)





Jean-Francois Colson:
I need a few tens of characters for a conlang I’m developping. ☺ 

Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this: 
http://unicode.org/charts/PDF/U1F700.pdf
I doubt that “not liking con languages” is a faithful description of 
Jean-François ;-)


On a more serious notes, this block is actually a set of “scientific” 
(at his time) notations used by Isaac Newton in its time. They were 
encoded in Unicode following an academic project to digitize his 
manuscripts. So here, you have characters used 3 centuries ago by no 
less than Isaac Newton, most of them having a much longer history, and 
useful for science historians. See 
http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details.
This does not compares with a few characters invented for a conlang 
invented by an amateur and used by no one but himself. I think that is 
the point Jean-François wanted to make.


A closer counter-example to Jean-François's “wish” would be Shavian 
(10450..1047F), but this alphabet has shown some use, and I guess that 
its encoding would have been much harder without its association with 
someone as famous as George Berard Shaw or without the existing 
publication of a full text in Shavian.




The problem is that Unicode only encodes characters which are 
effectively used today or which have been used in the past. It 
doesn’t encode characters which could perhaps be used in a 
hypothetical new programing language in the future. 
So you want the font encoding scheme to be a limitating factor for new 
things?


It is more or less the rule, expt that is not a font encoding, but a 
standard encoding. Once something is encoded , it can never be 
unencoded. And the Unicode standard is built to stay relevant as long as 
possible (decades or centuries). So you ask for your character top be 
encoded in billions of devices for decades. It is more than a mere font 
encoding. There are a few exceptions, but only when a widespread use is 
really expected, like for monetary symbols (it was the case for the Euro).


What you are asking, is a character for an untested idea. You are 
convinced it is useful, but cannot prove anyone beyond yourself will use 
it, hence Jean-François’s parallel with conlangs. In order to have a 
chance of success, design a language using existing characters (e.g. 
some APL + → for TAB) and/or private use codepoints. Once your language 
start gathering steam, come back and argue that using an arrow or a tab 
is awkward, and that U+ SHINY TAB FOR PROGRAMMERS would be an 
improvement for a significant community. I know it is a lot of work, but 
that is probably what it takes.




Pierpaolo Bernardi:

How would your proposed character be displayed as plain text?

There is no such thing as plain text.
When you say that, you don’t accept the premise of Unicode encoding. 
Unicode’s goal is to encode all plain text characters, but only plain 
text characters.
Even line breaks and tabs are a matter of interpretation. It's just 
that they usually have typographic semantics, even in programming 
editors, with all the side effects.


In very simple (and with that I mean shitty or not even remotely 
programming oriented) editors, it may show like a control character, 
like ␄.


Browsers and any editor passing the based on scintilla complexity 
mark of course should display something that makes more sense, like an 
arrow or ⍈ plus surrounding space.


I think everyone her knows what you are saying, and that the notion of 
plain text is a bit fuzzy. But if you cannot argue that your character 
has a meaning in plaint text, for some value of “plain text”, then you 
can not hope for an encoding in Unicode.



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Joan Montané
Sorry, my reply was sended CC: to Unicode ML,

My apologies,

Joan Montané

2015-02-09 22:11 GMT+01:00 Joan Montané j...@montane.cat:


 Hi all,

 I am the one who made the request to tweemoji Github.


 2015-02-09 20:16 GMT+01:00 Markus Scherer markus@gmail.com:

 On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi 
 andrea.giammar...@gmail.com wrote:

  if a cultural/language TLD is typed with Unicode RIS, then show the
 flag for these culture/language:


 This does not work. The Unicode RIS are defined to be used in pairs,
 with semantics according to corresponding ISO 3166 alpha2 codes. In your
 examples, each successive pair will encode a flag.


 AFAIK, this is done in font side. Emoji flags are just ligatures, so a
 font can provide a ligature for 4 RIS characters. This is not an issue here.

 I agree some strange behaviour can appear if a 3 RIS string, take CAT, is
 shown in a system with only 2 RIS support (a Canadian will appear followed
 by a T).


 If you want to represent every flag of every locality, you first have to
 figure out how to catalog and label them. You are mentioning provinces, one
 level down from nation states; I guess there are thousands of them. In much
 of Europe, every little village
 http://de.wikipedia.org/wiki/Butterstadt has its own flag and coat of
 arms. Where do you want the text encoding and fonts to stop?


 I don't request flag support for every flag in the world. I requested
 flags for culture/language communities *with* an approved TLD (Top Level
 Domain).

 I know flags are an issue, and I know flags represents territories, not
 languages, but I think some support should be done for these active
 communities. As I pointed, some country flag collections expand with a fews
 non-independent country.  See [1], [2] and [3] (search for Scottish or
 Welsh flag). You can check this [4] petition requesting Catalan flag on
 WhatsApp.

 So, there is a demand and they are used in real world. What's the way for
 encoding them in Unicode standard?

 Thanks,

 Joan Montané

 [1] http://www.famfamfam.com/lab/icons/flags/
 [2] https://www.gosquared.com/resources/flag-icons/
 [3] http://www.sherv.net/flag-emoticons.html
 [4]
 https://www.change.org/p/whatsapp-inc-incloure-la-senyera-de-catalunya-a-whatsapp

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Doug Ewell
Shervin Afshar shervinafshar at gmail dot com wrote:

 The issue is with your very rigid interpretation of the criteria for
 encoding new symbols. Is appearing in an industry character set
 extension an official phrasing that you keep referring to?

It was either from the WG2 Principles and Procedures document, or some
other bit of Unicode/10646 folklore that I've read over the past 22
years of keeping up with Unicode/10646. I should look up the exact
wording.

Of course, Unicode can encode anything they please. That's not in
question. But in order to claim compatibility as the basis for
encoding something, these specific, rigid definitions and criteria
have historically been required. Compatibility with any random JPEG or
meme that makes the rounds on the Internet was not enough.

 Robot Face is available on Gmail (GChat), Facebook, and Twitch among
 others (calculating the size of user community is left as an
 assignment for the reader). That's enough usage for consideration by
 the UTC even if the symbol is not present in a character encoding
 standard. Also, since Unicode is an industry standard maintained by
 industry members (among others), then if there is enough request to
 these corporations from communities of users, then there might be some
 reason for considering those symbols. I think that's the case for the
 newer symbols.

Great. Go ahead and encode them, UTC. But don't say it's because your
hands are tied and you have no choice.

 IMO, Unicode officers seems to have low patience for such sentiments.
 You might want to reconsider your tone. There is a time and place for
 sarcasm.

I'll take my chances. I've been called out before for discouraging list
members from requesting things that were out of scope according to the
old rules. All I'm saying now is, if the old rules no longer apply, say
so.

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Doug Ewell
Joan Montané joan at montane dot cat wrote:

 I don't request flag support for every flag in the world. I requested
 flags for culture/language communities *with* an approved TLD (Top
 Level Domain).

Incidentally, about a year and a half ago I discussed this with another
list member, on- and off-list. We agreed that some sort of text-based
encoding of flags would be an interesting project, but disagreed as to
whether this was a Unicode problem.

The present discussion seems to approach the issue from the other side:
treat it as *only* a Unicode problem, and assume that the encoding
problem has been solved by TLD registration.

See also http://www.unicode.org/faq/emoji_dingbats.html#12 . This is the
Unicode Consortium talking, not me.

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Alfred Zett
OK, I will now try to answer all of you in one mail, otherwise it gets 
hard to overlook...


Shervin Afshar:
All of the requirements mentioned here can be (and are) implemented in 
higher levels of software (like IDEs). IMO, there isn't any need for 
adding new characters to Unicode to address these issues.
But then it would be incompatible from IDE to IDE, like Python is 
incompatible using 2 spaces, 4 spaces and tabs.

It's the data that is important, not the software.


Additionally, people tend to forget that simply because Unicode is 
doing emoji out of compatibility (or other) requirements, it does not 
mean that now anything goes. I refer folks to TR51[1] (specifically 
sections 1.3, 8, and Annex C).


[1]: http://www.unicode.org/reports/tr51

You know, the fact that this consortium ever took emoji into 
consideration immediately justifies to include everything everyone ever 
wanted. There is no such thing as important data including emoji. :)


Jean-Francois Colson:
I need a few tens of characters for a conlang I’m developping. ☺ 

Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this: 
http://unicode.org/charts/PDF/U1F700.pdf


The problem is that Unicode only encodes characters which are 
effectively used today or which have been used in the past. It doesn’t 
encode characters which could perhaps be used in a hypothetical new 
programing language in the future. 
So you want the font encoding scheme to be a limitating factor for new 
things?


Pierpaolo Bernardi:

How would your proposed character be displayed as plain text?

There is no such thing as plain text.
Even line breaks and tabs are a matter of interpretation. It's just that 
they usually have typographic semantics, even in programming editors, 
with all the side effects.


In very simple (and with that I mean shitty or not even remotely 
programming oriented) editors, it may show like a control character, like ␄.


Browsers and any editor passing the based on scintilla complexity mark 
of course should display something that makes more sense, like an arrow 
or ⍈ plus surrounding space.



Unicode is a standard for plain text.  If you require a special IDE
for your programming language then why use plain text at all?
Because binary custom encoded databases or blob files are the death of 
interoperability.


Konstantin Ritt:
Easier than latin1, a layout one could find on [almost] every 
keyboard? Good luck.

Also:

Jean-Francois Colson:
Hard to input? Not harder than the new symbols you’d like to propose. 
That’s only a matter of keyboard layout and input method. 


Indent by pressing tab and insert the literal thing by pressing . 
Nothing changes, the IDE/editor does the work on the fly.

Just that you have clean semantics, interoperability and customizability.

Beat that, APL. Where you would 10 key bindings or an annoying software 
keyboard.


I’ve never used APL so I don’t remember the meanings of its symbols, 
but couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E 
APL FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a 
new programming language? 

That's a good idea.

That still leaves the indentation character, which is harder than that, 
because one would want a control character with certain semantics.
E.G.: For programming editors it would make sense to only allow it after 
line breaks and convert other occurences into tabs.


If the IDE inputs your new character when you press tab, then your new 
character is a tab… 

Not if it detects the beginning of a line.

Best regards

A. Z.

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Doug Ewell
I can't count:

 It can be argued — and was, repeatedly and persuasively — that
 the initial collection of emoji in Unicode 6.1

6.0

 But the additional emoji added to Unicode 6.2 and 7.0

6.1 and 7.0

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Shervin Afshar

 There is no longer any requirement that the robot faces and
 burritos appear first in any sort of industry character set extension,
 with which Unicode is then obliged to maintain compatibility.


Only if you don't consider existing usage and popular requests as
requirement and precedence; for example Gmail had Robot Face for a long
time.

↪ Shervin

On Mon, Feb 9, 2015 at 11:25 AM, Doug Ewell d...@ewellic.org wrote:

 Frédéric Grosshans frederic dot grosshans at gmail dot com wrote:

  The including of emoji was a considerable debate here, with people
  strongly against and strongly for. The trick is that they were already
  used as digital characters by Japanese Telcos and their millions of
  customers. They were de facto encoded as characters in Japanese text
  messages. At the time of encoding, the spread of smartphones made them
  appear in other places (emails, web forums, etc.)

 Sorry, I can't let the compatibility argument go unchallenged again.

 It can be argued — and was, repeatedly and persuasively — that the
 initial collection of emoji in Unicode 6.1 [1] were added for
 compatibility with Japanese telco extensions to JIS.

 But the additional emoji added to Unicode 6.2 and 7.0, and planned for
 8.0, do not have even this provenance; they were added on foot of novel
 proposals sent directly to Unicode, or (more recently) by popular
 request. There is no longer any requirement that the robot faces and
 burritos appear first in any sort of industry character set extension,
 with which Unicode is then obliged to maintain compatibility.

 [1] No, I am not counting the ARIB symbols or any other long-encoded
 symbols that have been retroactively defined as emoji, to help
 legitimize the latter.

 Alfred Zett alfred underscore z at web dot de) replied:

  The trick is that one doesn't bargain with Telcos and similar
  criminals. Gotta drop them hard and the pest will go away from itself
  after five years or so.

 This does not help to make a case for or against encoding of anything.

 --
 Doug Ewell | Thornton, CO, USA | http://ewellic.org


 ___
 Unicode mailing list
 Unicode@unicode.org
 http://unicode.org/mailman/listinfo/unicode

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Hans Aberg

 On 9 Feb 2015, at 19:17, Ken Whistler kenwhist...@att.net wrote:
...
 The use in C of = and == was badly designed
 from the start, and is the source of bezillions of inadvertent programming
 errors in practice.

It is the ample oversupply of implicit conversions in combination with the lack 
of a proper boolean type that is causing those programming errors.

 But if a left arrow, for example, might be a better choice for an assignment
 operator in a programming language, and a two-character ASCII operator
 like := or - doesn't seem appropriate or causes other confusion, there
 still isn't a character *encoding* issue here. Just use ←, which already 
 exists (U+2190),
 and is a fine left arrow!

There are also
  ≔ COLON EQUALS U+2254
and others.

No problems using such characters in Flex:

The problem is the lack of input methods. 



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: About cultural/languages communities flags

2015-02-09 Thread Doug Ewell
And just another follow-up, to try to explain *why* the mechanism for
Regional Indicator Codes might be so closely tied to ISO 3166-1 alpha-2
code elements:

ISO 3166-1 codes are derived from code elements published by the United
Nations Statistics Division. This is the group that ultimately decides
what is and isn't a country for the purposes of these codes. While
there is inevitably some political influence in the UN, many
organizations and projects that use ISO 3166-1 codes do so to avoid
getting embroiled in their own debate over what is a country. The IETF
language-tagging project (BCP 47, RFC 5646; see IETF language tag in
Wikipedia for more information) is one example.

Conversely, it is sometimes the case that groups which seek to extend
the set of ISO 3166-1 codes unilaterally, or to establish a competing or
supplemental coding system, might do so in order to gain acceptance or
establish credibility for a nation or territory that is not recognized
as such by UNSD.

It is entirely reasonable (IMHO) to suggest that if Unicode were to
attempt, by whatever means, to enable encoding of flags for entities
beyond those encoded in ISO 3166-1, that the door would be opened wide
for unrecognized nations and separatist groups to claim that the Unicode
Consortium supports their cause by supporting display of their flag.
It's very possible that Unicode has thought of this and does not want to
put itself in that position.

--
Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)

2015-02-09 Thread Shervin Afshar
 It was either from the WG2 Principles and Procedures document, or some
 other bit of Unicode/10646 folklore that I've read over the past 22
years of keeping up with Unicode/10646. I should look up the exact
 wording.

Yes, please. I would like to have that policy noted for my future use.

 Of course, Unicode can encode anything they please. That's not in
 question. But in order to claim compatibility as the basis for
 encoding something, these specific, rigid definitions and criteria
 have historically been required. Compatibility with any random JPEG or
 meme that makes the rounds on the Internet was not enough.

It's not about encoding what they please. Compatibility was the issue
with the first set of emoji symbols. The rest of symbols are being added
for various other reasons; e.g. diversity, parity, requests, etc. Also,
random JPEG and meme don't apply here and you're mistaken to assume that
GChat and Facebook fit in this category.

 Great. Go ahead and encode them, UTC. But don't say it's because your
 hands are tied and you have no choice.

Quoting an official UTC communication?

 I'll take my chances. I've been called out before for discouraging list
 members from requesting things that were out of scope according to the
 old rules. All I'm saying now is, if the old rules no longer apply, say
 so.

AFAIK, rules haven't changed. Unicode didn't have a policy regarding emoji
and symbols with similar usage. Now it does. For a longer while now, some
folks tend to use emoji as means to an end other than what is in the scope
of conversation regarding emoji. And that is not acceptable.


↪ Shervin

On Mon, Feb 9, 2015 at 2:17 PM, Doug Ewell d...@ewellic.org wrote:

 Shervin Afshar shervinafshar at gmail dot com wrote:

  The issue is with your very rigid interpretation of the criteria for
  encoding new symbols. Is appearing in an industry character set
  extension an official phrasing that you keep referring to?

 It was either from the WG2 Principles and Procedures document, or some
 other bit of Unicode/10646 folklore that I've read over the past 22
 years of keeping up with Unicode/10646. I should look up the exact
 wording.

 Of course, Unicode can encode anything they please. That's not in
 question. But in order to claim compatibility as the basis for
 encoding something, these specific, rigid definitions and criteria
 have historically been required. Compatibility with any random JPEG or
 meme that makes the rounds on the Internet was not enough.

  Robot Face is available on Gmail (GChat), Facebook, and Twitch among
  others (calculating the size of user community is left as an
  assignment for the reader). That's enough usage for consideration by
  the UTC even if the symbol is not present in a character encoding
  standard. Also, since Unicode is an industry standard maintained by
  industry members (among others), then if there is enough request to
  these corporations from communities of users, then there might be some
  reason for considering those symbols. I think that's the case for the
  newer symbols.

 Great. Go ahead and encode them, UTC. But don't say it's because your
 hands are tied and you have no choice.

  IMO, Unicode officers seems to have low patience for such sentiments.
  You might want to reconsider your tone. There is a time and place for
  sarcasm.

 I'll take my chances. I've been called out before for discouraging list
 members from requesting things that were out of scope according to the
 old rules. All I'm saying now is, if the old rules no longer apply, say
 so.

 --
 Doug Ewell | Thornton, CO, USA | http://ewellic.org


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Ken Whistler

To follow up on Doug Ewell's response, the mechanism currently
standardized in the Unicode Standard for regional indicator codes
has an interpretation tied to the two-letter codes of ISO 3166-1,
and *not* to TLD's. The two are not directly connected.

If anyone really wants to pursue getting a Scots flag into general
implementation via Unicode regional indicator codes, the correct
way to make that happen is for somebody to get off their duff
and convince the BSI (British Standards Institute) to put in for
an exceptional reservation of a two-letter code for Scotland in
ISO 3166-1 by petitioning the ISO 3166/MA. See:

http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

for the full context, and for the current 26x26 letter matrix which is
the basis for the flag glyph implementations of regional indicator
code pairs on smartphones.

SC, SO, ST are already taken, but might I suggest putting in for registering
AB for Alba? That one is currently unassigned.

Yeah, yeah, what is the likelihood of BSI pushing for a Scots two-letter
code?! But seriously, if folks are planning ahead for Scots independence
or even some kind of greater autonomy, this is an issue that needs to
be worked, anyway.

In the meantime, let me reiterate that there is *no* formal relationship
between TLD's and the regional indicator codes in Unicode (or the 
implementations
built upon them). Well, yes, a bunch of registered TLD's do match the 
country

codes, but there is no two-letter constraint on TLD's. This should already
be apparent, as Scotland has registered .scot At this point there 
isn't even

a limitation of TLD's to ASCII letters, so there is no way to map them
to the limited set of regional indicator codes in the Unicode Standard.

Not having a two letter country code for Scotland that matches the
four letter TLD for Scotland might indeed be a problem for someone,
but I don't see *this* as a problem that the Unicode Standard needs
to solve.

--Ken


On 2/9/2015 2:38 PM, Doug Ewell wrote:

Joan Montané joan at montane dot cat wrote:


I don't request flag support for every flag in the world. I requested
flags for culture/language communities *with* an approved TLD (Top
Level Domain).

Incidentally, about a year and a half ago I discussed this with another
list member, on- and off-list. We agreed that some sort of text-based
encoding of flags would be an interesting project, but disagreed as to
whether this was a Unicode problem.

The present discussion seems to approach the issue from the other side:
treat it as *only* a Unicode problem, and assume that the encoding
problem has been solved by TLD registration.




___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread Alfred Zett


Frédéric Grosshans:

Le 09/02/2015 13:55, Alfred Zett a écrit :


Additionally, people tend to forget that simply because Unicode is 
doing emoji out of compatibility (or other) requirements, it does 
not mean that now anything goes. I refer folks to TR51[1] 
(specifically sections 1.3, 8, and Annex C).


[1]: http://www.unicode.org/reports/tr51

You know, the fact that this consortium ever took emoji into 
consideration immediately justifies to include everything everyone 
ever wanted. There is no such thing as important data including 
emoji. :)
The including of emoji was a considerable debate here, with people 
strongly against and strongly for. The trick is that they were already 
used as digital characters by Japanese Telcos and their millions of 
customers. They were de facto encoded as characters in Japanese text 
messages. At the time of encoding, the spread of smartphones made them 
appear in other places (emails, web forums, etc.)



The trick is that one doesn't bargain with Telcos and similar criminals.
Gotta drop them hard and the pest will go away from itself after five 
years or so.

Jean-Francois Colson:
I need a few tens of characters for a conlang I’m developping. ☺ 

Except two or three control characters don't make a con language.
Also, if you don't like con languages in Unicode, what's this: 
http://unicode.org/charts/PDF/U1F700.pdf
I doubt that “not liking con languages” is a faithful description of 
Jean-François ;-)


On a more serious notes, this block is actually a set of “scientific” 
(at his time) notations used by Isaac Newton in its time. They were 
encoded in Unicode following an academic project to digitize his 
manuscripts. So here, you have characters used 3 centuries ago by no 
less than Isaac Newton, most of them having a much longer history, and 
useful for science historians. See 
http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details.



That's actually interesting. Good to know, thanks.
I think everyone her knows what you are saying, and that the notion of 
plain text is a bit fuzzy. But if you cannot argue that your character 
has a meaning in plaint text, for some value of “plain text”, then you 
can not hope for an encoding in Unicode.



OK, in this case I agree it makes little sense to hope for such characters.

Best regards,

A. Z.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Christopher Fynn
Using flags to indicate particular languages on websites has plenty of
problems - languages need a better indicator.

Scripts could be indicated by a representative glyph.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode