subject:"Re\: Updating D beyond Unicode 2.0"

Re: Updating D beyond Unicode 2.0

2018-09-29 Thread Shachar Shemesh via Digitalmars-d


On Saturday, 29 September 2018 at 16:19:38 UTC, ag0aep6g wrote:

On 09/29/2018 04:19 PM, Shachar Shemesh wrote:

On 29/09/18 16:52, Dukc wrote:

[...]
I know you meant Sarn, but still... can you please be a bit 
less aggresive with our wording?


 From the article (the furthest point I read in it):
When I ask myself what I've found life is too short for, the 
word that pops into my head is "bullshit."


Dukc didn't post that link. sarn did.


You are 100% correct. My most sincere apologies.

I am going to stop responding to this thread now.

Shachar

Re: Updating D beyond Unicode 2.0

2018-09-29 Thread ag0aep6g via Digitalmars-d


On 09/29/2018 04:19 PM, Shachar Shemesh wrote:

On 29/09/18 16:52, Dukc wrote:

[...]
I know you meant Sarn, but still... can you please be a bit less 
aggresive with our wording?


 From the article (the furthest point I read in it):
When I ask myself what I've found life is too short for, the word that 
pops into my head is "bullshit."


Dukc didn't post that link. sarn did.

Re: Updating D beyond Unicode 2.0

2018-09-29 Thread Shachar Shemesh via Digitalmars-d


On 29/09/18 16:52, Dukc wrote:

On Saturday, 29 September 2018 at 02:22:55 UTC, Shachar Shemesh wrote:
I missed something he said in one of the other (as of this writing, 
98) posts of this thread, and thus causing Dukc to label me a 
bullshitter.


I know you meant Sarn, but still... can you please be a bit less 
aggresive with our wording?


From the article (the furthest point I read in it):

When I ask myself what I've found life is too short for, the word that pops into my head 
is "bullshit."



That is the word used by the article *you* linked to, in reference to 
me. If it offends you enough to be accused of *calling* someone that, 
just imagine how I felt being *called* that very same name.


Seriously, I don't make it a habit of being offended by random people on 
the Internet, but this is more a conscious decision than a naturally 
thick skin. Seeing that label hurt.


Don't worry. I've been on the Internet since 1991. That's longer than 
the median age here (i.e. - I've been on the Internet since before most 
of you have been born). I've had my own fair share of flame wars, 
include some that, to my chagrin, I've started.


In other words, I got over it. I did not reply, big though the 
temptation was.


But the right time to be sensitive about what words are being used was 
*before* you linked to the article. Taking offense from being called out 
for calling someone something you find offensive is hypocritical.


I never understood the focus on words. It's not the use of that word 
that offended me, it's the fact that you thought anything I did 
justified using it. I don't think using "cattle excrement" instead would 
have been any less hurtful.


And it's not that the rest of your post was thoughtful, considerate and 
took pains to give constructive criticism, with or without hurting 
anyone's feelings. It's just that it doesn't seem to be that part 
bothers you.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-29 Thread Dukc via Digitalmars-d

On Saturday, 29 September 2018 at 02:22:55 UTC, Shachar Shemesh 
wrote:
I missed something he said in one of the other (as of this 
writing, 98) posts of this thread, and thus causing Dukc to 
label me a bullshitter.


I know you meant Sarn, but still... can you please be a bit less 
aggresive with our wording?

Re: Updating D beyond Unicode 2.0

2018-09-28 Thread Shachar Shemesh via Digitalmars-d


On 28/09/18 14:37, Dukc wrote:

On Friday, 28 September 2018 at 02:23:32 UTC, sarn wrote:


Shachar seems to be aiming for an internet high score by shooting down 
threads without reading them.  You have better things to do.

http://www.paulgraham.com/vb.html


I believe you're being too harsh. It's easy to miss a part of a post 
sometimes.


A minor correction: Aliak is not accusing me of missing a part of the 
post. He's accusing me of not taking into account something he said in a 
different part of the *thread*. I.e. - I missed something he said in one 
of the other (as of this writing, 98) posts of this thread, and thus 
causing Dukc to label me a bullshitter.

Re: Updating D beyond Unicode 2.0

2018-09-28 Thread sarn via Digitalmars-d


On Friday, 28 September 2018 at 11:37:10 UTC, Dukc wrote:

It's easy to miss a part of a post sometimes.


That's very true, and it's always good to give people the benefit 
of the doubt.  But most people are able to post constructively 
here without


* Abrasively and condescendingly declaring others' posts to be 
completely pointless
* Doing that based on one single aspect of a post, without 
bothering to check the whole post or parent post
* Doubling down even after getting a hint that the poster might 
not have posted 100% cluelessly

* Doing all this more than once in a thread

If Shachar starts posting constructively, I'll happily engage.  I 
mean that.  Otherwise I won't waste my time, and I'll tell others 
not to waste theirs, too.

Re: Updating D beyond Unicode 2.0

2018-09-28 Thread Dukc via Digitalmars-d


On Friday, 28 September 2018 at 02:23:32 UTC, sarn wrote:


Shachar seems to be aiming for an internet high score by 
shooting down threads without reading them.  You have better 
things to do.

http://www.paulgraham.com/vb.html


I believe you're being too harsh. It's easy to miss a part of a 
post sometimes.

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread sarn via Digitalmars-d


On Thursday, 27 September 2018 at 16:34:37 UTC, aliak wrote:
On Thursday, 27 September 2018 at 13:59:48 UTC, Shachar Shemesh 
wrote:

On 27/09/18 16:38, aliak wrote:
The point was that being able to use non-English in code is 
demonstrably both helpful and useful to people. Norwegian 
happens to be easily anglicize-able. I've already linked to 
non ascii code versions in a previous post if you want that 
too.


If you wish to make a point about something irrelevant to the 
discussion, that's fine. It is, however, irrelevant, mostly 
because it is uncontested.


This thread is about the use of non-English in *identifiers*. 
This thread is not about comments. It is not about literals 
(i.e. - strings). Only about identifiers (function names, 
variable names etc.).


If you have real world examples of those, that would be both 
interesting and relevant.


Shachar


English doesn't mean ascii. You can write non-English in ascii, 
which you would've noticed if you'd opened the link, which had 
identifiers in Norwegian (which is not English).


And again, I've already posted a link that shows non-ascii 
identifiers. I'll paste it again here incase you don't want to 
read the thread:


https://speakerdeck.com/codelynx/programming-swift-in-japanese


Shachar seems to be aiming for an internet high score by shooting 
down threads without reading them.  You have better things to do.

http://www.paulgraham.com/vb.html

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread aliak via Digitalmars-d

On Thursday, 27 September 2018 at 13:59:48 UTC, Shachar Shemesh 
wrote:

On 27/09/18 16:38, aliak wrote:
The point was that being able to use non-English in code is 
demonstrably both helpful and useful to people. Norwegian 
happens to be easily anglicize-able. I've already linked to 
non ascii code versions in a previous post if you want that 
too.


If you wish to make a point about something irrelevant to the 
discussion, that's fine. It is, however, irrelevant, mostly 
because it is uncontested.


This thread is about the use of non-English in *identifiers*. 
This thread is not about comments. It is not about literals 
(i.e. - strings). Only about identifiers (function names, 
variable names etc.).


If you have real world examples of those, that would be both 
interesting and relevant.


Shachar


English doesn't mean ascii. You can write non-English in ascii, 
which you would've noticed if you'd opened the link, which had 
identifiers in Norwegian (which is not English).


And again, I've already posted a link that shows non-ascii 
identifiers. I'll paste it again here incase you don't want to 
read the thread:


https://speakerdeck.com/codelynx/programming-swift-in-japanese

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread Shachar Shemesh via Digitalmars-d


On 27/09/18 16:38, aliak wrote:
The point was that being able to use non-English in code is demonstrably 
both helpful and useful to people. Norwegian happens to be easily 
anglicize-able. I've already linked to non ascii code versions in a 
previous post if you want that too.


If you wish to make a point about something irrelevant to the 
discussion, that's fine. It is, however, irrelevant, mostly because it 
is uncontested.


This thread is about the use of non-English in *identifiers*. This 
thread is not about comments. It is not about literals (i.e. - strings). 
Only about identifiers (function names, variable names etc.).


If you have real world examples of those, that would be both interesting 
and relevant.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread aliak via Digitalmars-d

On Thursday, 27 September 2018 at 08:16:00 UTC, Shachar Shemesh 
wrote:

On 27/09/18 10:35, aliak wrote:
Here's an example from this years spring semester and NTNU 
(norwegian uni): 
http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp


... That's the basic programming course. Whether the professor 
would use that I guess would depend on ratio of 
English/non-English speakers. But it's there nonetheless.


I'm sorry I keep bringing this up, but context is really 
important here.


The program you link to has non-ASCII in the comments and in 
the literals, but not in the identifiers. Nobody is opposed to 
having those.


Shachar


The point was that being able to use non-English in code is 
demonstrably both helpful and useful to people. Norwegian happens 
to be easily anglicize-able. I've already linked to non ascii 
code versions in a previous post if you want that too.

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread Walter Bright via Digitalmars-d


On 9/27/2018 12:35 AM, aliak wrote:
Anyway, on a related note: D itself (not identifiers, but std) also supports 
unicode 6 or something. That's from 2010. That's a decade ago. We're at unicode 
11 now. And I've already had someone tell me (while trying to get them to use D) 
- "hold on it supports unicode from a decade ago? Nah I'm not touching it". Not 
that it's the same as supporting identifiers in code, but still the reaction is 
relevant.


Nobody is suggesting D not support Unicode in strings, comments, and the 
standard library. Please file any issues on Bugzilla, and PRs to fix them.

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread Shachar Shemesh via Digitalmars-d


On 27/09/18 10:35, aliak wrote:
Here's an example from this years spring semester and NTNU (norwegian 
uni): http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp


... That's the basic programming course. Whether the professor would use 
that I guess would depend on ratio of English/non-English speakers. But 
it's there nonetheless.


I'm sorry I keep bringing this up, but context is really important here.

The program you link to has non-ASCII in the comments and in the 
literals, but not in the identifiers. Nobody is opposed to having those.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-27 Thread aliak via Digitalmars-d

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright 
wrote:

On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
This is a non-starter. We can't break people's code, 
especially for trivial reasons like 'you shouldn't code that 
way because others don't like it'. I'm pretty sure Walter 
would be against removing Unicode support for identifiers.


We're not going to remove it, because there's not much to gain 
from it.


But expanding it seems of vanishingly little value. Note that 
each thing that gets added to D adds weight to it, and it needs 
to pull its weight. Nothing is free.


I don't see a scenario where someone would be learning D and 
not know English. Non-English D instructional material is 
nearly non-existent. dlang.org is all in English. Don't most 
languages have a Romanji-like representation?


It's not that they don't know English. It's that non-English 
speakers can process words and sentences in non-English much more 
efficiently than in English. Knowing a language is not binary.


Here's an example from this years spring semester and NTNU 
(norwegian uni): 
http://folk.ntnu.no/frh/grprog/eksempel/eks_20.cpp


... That's the basic programming course. Whether the professor 
would use that I guess would depend on ratio of 
English/non-English speakers. But it's there nonetheless.


Of course Norway is a bad example because the English level here 
is, arguably, higher than many English countries :p But it's a 
great example because even if you're great at English, still 
sometimes people are more comfortable/confident/efficient/ in 
their own native language.


Some tech meetups from different countries try and do things in 
English and mostly it works. But it's been seen consistently with 
non-English audiences that presentations given in English result 
in silence whereas if it's in their native language you have 
actual engagement.


I fail to understand how supporting a version of unicode from 
(not sure when it was released) 3 billion decades ago should just 
be left as is and also cannot be removed when there's someone 
who's willing to update it.




C/C++ have made efforts in the past to support non-ASCII coding 
- digraphs, trigraphs, and alternate keywords. They've all 
failed miserably. The only people who seem to know those 
features even exist are language lawyers.


This is not relevant. Trigraphs and digraphs did indeed fail 
miserably but they do not represent any non-ascii characters. The 
existential reasons for those abominations were different.


Anyway, on a related note: D itself (not identifiers, but std) 
also supports unicode 6 or something. That's from 2010. That's a 
decade ago. We're at unicode 11 now. And I've already had someone 
tell me (while trying to get them to use D) - "hold on it 
supports unicode from a decade ago? Nah I'm not touching it". Not 
that it's the same as supporting identifiers in code, but still 
the reaction is relevant.


Cheers,
- Ali

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Jonathan M Davis via Digitalmars-d

On Sunday, September 23, 2018 2:49:39 PM MDT Walter Bright via Digitalmars-d 
wrote:
> There's a reason why dmd doesn't have international error messages. My
> experience with it is that international users don't want it. They prefer
> the english messages.

It reminds me of one of the reasons that Bryan Cantrill thinks that many
folks use Linux - they want to be able to google their stack traces. Of
course, that same argument would be a reason to use C/C++ rather than
switching to D, but having an error be in a format that's more common and
therefore more likely to have been posted somewhere where you might be able
to find a discussion on it and therefore maybe be able to find the solution
for it can be valuable - and that's without even getting into all of the
translation issues discussed elsewher in this thread. And it's not like
compiler error messages - or programming speak in general - are really
traditional English anyway.

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Neia Neutuladh via Digitalmars-d


On 09/26/2018 01:43 PM, Walter Bright wrote:
Don't most languages have a Romanji-like 
representation?


Yes, a lot of languages that don't use the Latin alphabet have standard 
transcriptions into the Latin alphabet. Standard transcriptions into 
ASCII are much less common, and newer Unicode versions include more 
Latin characters to better support languages (and other use cases) using 
the Latin alphabet.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Steven Schveighoffer via Digitalmars-d


On 9/26/18 4:43 PM, Walter Bright wrote:
But expanding it seems of vanishingly little value. Note that each thing 
that gets added to D adds weight to it, and it needs to pull its weight. 
Nothing is free.


It may be the weight is already there in the form of unicode symbol 
support, just the range of the characters supported isn't good enough 
for some languages. It might be like replacing your refrigerator -- you 
get an upgrade, but it's not going to take up any more space because you 
get rid of the old one. I would like to see the PR before passing 
judgment on the heft of the change.


The value is simply in the consistency -- when some of the words for 
your language can be valid symbols but others can't, then it becomes a 
weird guessing game as to what is supported. It would be like saying all 
identifiers can have any letters except `q`. Sure, you can get around 
that, but it's weirdly exclusive.


I claim complete ignorance as to what is required, it hasn't been 
technically laid out what is at stake, and I'm not bilingual anyway. It 
could be true that I'm completely misunderstanding the positions of others.


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 26 September 2018 at 20:43:47 UTC, Walter Bright 
wrote:
I don't see a scenario where someone would be learning D and 
not know English. Non-English D instructional material is 
nearly non-existent.


http://ddili.org/ders/d/

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Walter Bright via Digitalmars-d


On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:

Does this need a DIP?


Feel free to write one, but its chances of getting incorporated are remote and 
would require a pretty strong rationale that I haven't seen yet.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Walter Bright via Digitalmars-d


On 9/26/2018 5:46 AM, Steven Schveighoffer wrote:
This is a non-starter. We can't break people's code, especially for trivial 
reasons like 'you shouldn't code that way because others don't like it'. I'm 
pretty sure Walter would be against removing Unicode support for identifiers.


We're not going to remove it, because there's not much to gain from it.

But expanding it seems of vanishingly little value. Note that each thing that 
gets added to D adds weight to it, and it needs to pull its weight. Nothing is free.


I don't see a scenario where someone would be learning D and not know English. 
Non-English D instructional material is nearly non-existent. dlang.org is all in 
English. Don't most languages have a Romanji-like representation?


C/C++ have made efforts in the past to support non-ASCII coding - digraphs, 
trigraphs, and alternate keywords. They've all failed miserably. The only people 
who seem to know those features even exist are language lawyers.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Walter Bright via Digitalmars-d


On 9/25/2018 11:50 PM, Shachar Shemesh wrote:
This sounded like a very compelling example, until I gave it a second thought. I 
now fail to see how this example translates to a real-life scenario.


Also, there are usually common ASCII versions of city names, such as Cologne for 
Köln.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Andrea Fontana via Digitalmars-d


On Sunday, 23 September 2018 at 20:49:39 UTC, Walter Bright wrote:

On 9/23/2018 9:52 AM, aliak wrote:

There's a reason why dmd doesn't have international error 
messages. My experience with it is that international users 
don't want it. They prefer the english messages.


Yes please. Keep them in english.
But please, add an error code too in front of them.

I'm sure if you look hard enough you'll find someone using 
non-ASCII characters in identifiers.


It depends on what I'm developing.
If I'm writing a public library I'm planning to release on 
github, I use english identifiers.


But of course if is a piece of software for my company or for 
myself, I use italian identifiers.


Andrea

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Steven Schveighoffer via Digitalmars-d


On 9/26/18 5:54 AM, rjframe wrote:

On Fri, 21 Sep 2018 16:27:46 +, Neia Neutuladh wrote:


I've got this coded up and can submit a PR, but I thought I'd get
feedback here first.

Does anyone see any horrible potential problems here?

Or is there an interestingly better option?

Does this need a DIP?


I just want to point out since this thread is still living that there have
been very few answers to the actual question ("should I submit my PR?").

Walter did answer the question, with the reasons that Unicode identifier
support is not useful/helpful and could cause issues with tooling. Which
is likely correct; and if we really want to follow this logic, Unicode
identifier support should be removed from D entirely.


This is a non-starter. We can't break people's code, especially for 
trivial reasons like 'you shouldn't code that way because others don't 
like it'. I'm pretty sure Walter would be against removing Unicode 
support for identifiers.




I don't recall seeing anyone in favor providing technical reasons, save
the OP.


There doesn't necessarily need to be a technical reason. In fact, there 
really isn't one -- people can get by with using ASCII identifiers just 
fine (and many/most people do). Supporting Unicode would be purely for 
social or inclusive reasons (it may make D more approachable to 
non-English speaking schoolchildren for instance).


As an only-English speaking person, it doesn't bother me either way to 
have Unicode identifiers. But the fact that we *already* support Unicode 
identifiers leads me to expect that we support *all* Unicode 
identifiers. It doesn't make a whole lot of sense to only support some 
of them.




Especially since the work is done, it makes sense to me to ask for the PR
for review. Worst case scenario, it sits there until we need it.


I suggested this as well.

https://forum.dlang.org/post/poaq1q$its$1...@digitalmars.com

I think it stands a good chance of getting incorporated, just for the 
simple fact that it's enabling and not disruptive.


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Steven Schveighoffer via Digitalmars-d


On 9/26/18 2:50 AM, Shachar Shemesh wrote:

On 25/09/18 15:35, Dukc wrote:
Another reason is that something may not have a good translation to 
English. If there is an enum type listing city names, it is IMO better 
to write them as normal, using Unicode. CityName.seinäjoki, not 
CityName.seinaejoki.


This sounded like a very compelling example, until I gave it a second 
thought. I now fail to see how this example translates to a real-life 
scenario.


City names (data, changes over time) as enums (compile time set) seem 
like a horrible idea.


That may sound like a very technical objection to an otherwise valid 
point, but it really think that's not the case. The properties that 
cause city names to be poor candidates for enum values are the same as 
those that make them Unicode candidates.


Hm... I could see actually some "clever" use of opDispatch being used to 
define cities or other such names.


In any case, I think the biggest pro for supporting Unicode symbol names 
is -- we already support Unicode symbol names. It doesn't make a whole 
lot of sense to only support some of them.


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread rjframe via Digitalmars-d

On Fri, 21 Sep 2018 16:27:46 +, Neia Neutuladh wrote:

> I've got this coded up and can submit a PR, but I thought I'd get
> feedback here first.
> 
> Does anyone see any horrible potential problems here?
> 
> Or is there an interestingly better option?
> 
> Does this need a DIP?

I just want to point out since this thread is still living that there have 
been very few answers to the actual question ("should I submit my PR?").

Walter did answer the question, with the reasons that Unicode identifier 
support is not useful/helpful and could cause issues with tooling. Which 
is likely correct; and if we really want to follow this logic, Unicode 
identifier support should be removed from D entirely.

I don't recall seeing anyone in favor providing technical reasons, save 
the OP.

Especially since the work is done, it makes sense to me to ask for the PR 
for review. Worst case scenario, it sits there until we need it.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Dukc via Digitalmars-d

On Wednesday, 26 September 2018 at 07:37:28 UTC, Shachar Shemesh 
wrote:
The other type of answer is "it's being done in the real 
world". If it's in active use in the real world, it might make 
sense to support it, even if we can agree that the design is 
not optimal.


Shachar


Two years ago, I taked part in implementing a commerical game. It 
was made in C# (Unity) but I don't think that matters, since D 
would have faced the same thing, were it used.


Anyway, the game has three characters with completely different 
abilites. The abilites were unique enough that it made sense to 
name some functions after the characters. One of the characters 
really has a non-ASCII character in his name, and that meant 
naming him differently in the code.

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Shachar Shemesh via Digitalmars-d


On 26/09/18 10:26, Dukc wrote:

On Wednesday, 26 September 2018 at 06:50:47 UTC, Shachar Shemesh wrote:
The properties that cause city names to be poor candidates for enum 
values are the same as those that make them Unicode candidates.


How so?

City names (data, changes over time) as enums (compile time set) seem 
like a horrible idea.


In most cases yes. But not always. You might me doing some sort of game 
where certain cities are a central concept, not just data with 
properties. Another possibility is that you're using code as data, AKA 
scripting.


And who says anyway you can't make a program that's designed 
specificially for certain cities?


Sure you can. It's just very poor design.

I think, when asking such questions, two types of answers are relevant. 
One is hypotheticals where you say "this design requires this". For such 
answers, the design needs to be a good one. It makes no sense to design 
a language to support a hypothetical design which is not a good one.


The other type of answer is "it's being done in the real world". If it's 
in active use in the real world, it might make sense to support it, even 
if we can agree that the design is not optimal.


Since your answer is hypothetical, I think arguing this is not a good 
way to code is a valid one.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Dukc via Digitalmars-d

On Wednesday, 26 September 2018 at 06:50:47 UTC, Shachar Shemesh 
wrote:
The properties that cause city names to be poor candidates for 
enum values are the same as those that make them Unicode 
candidates.


How so?

City names (data, changes over time) as enums (compile time 
set) seem like a horrible idea.


In most cases yes. But not always. You might me doing some sort 
of game where certain cities are a central concept, not just data 
with properties. Another possibility is that you're using code as 
data, AKA scripting.


And who says anyway you can't make a program that's designed 
specificially for certain cities?

Re: Updating D beyond Unicode 2.0

2018-09-26 Thread Shachar Shemesh via Digitalmars-d


On 25/09/18 15:35, Dukc wrote:
Another reason is that something may not have a good translation to 
English. If there is an enum type listing city names, it is IMO better 
to write them as normal, using Unicode. CityName.seinäjoki, not 
CityName.seinaejoki.


This sounded like a very compelling example, until I gave it a second 
thought. I now fail to see how this example translates to a real-life 
scenario.


City names (data, changes over time) as enums (compile time set) seem 
like a horrible idea.


That may sound like a very technical objection to an otherwise valid 
point, but it really think that's not the case. The properties that 
cause city names to be poor candidates for enum values are the same as 
those that make them Unicode candidates.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-25 Thread Jacob Carlborg via Digitalmars-d


On 2018-09-21 18:27, Neia Neutuladh wrote:

D's currently accepted identifier characters are based on Unicode 2.0:

* ASCII range values are handled specially.
* Letters and combining marks from Unicode 2.0 are accepted.
* Numbers outside the ASCII range are accepted.
* Eight random punctuation marks are accepted.

This follows the C99 standard.

Many languages use the Unicode standard explicitly: C#, Go, Java, 
Python, ECMAScript, just to name a few. A small number of languages 
reject non-ASCII characters: Dart, Perl. Some languages are weirdly 
generous: Swift and C11 allow everything outside the Basic Multilingual 
Plane.


I'd like to update that so that D accepts something as a valid 
identifier character if it's a letter or combining mark or modifier 
symbol that's present in Unicode 11, or a non-ASCII number. This allows 
the 146 most popular writing systems and a lot more characters from 
those writing systems. This *would* reject those eight random 
punctuation marks, so I'll keep them in as legacy characters.


It would mean we don't have to reference the C99 standard when 
enumerating the allowed characters; we just have to refer to the Unicode 
standard, which we already need to talk about in the lexical part of the 
spec.


It might also make the lexer a tiny bit faster; it reduces the number of 
valid-ident-char segments to search from 245 to 134. On the other hand, 
it will change the ident char ranges from wchar to dchar, which means 
the table takes up marginally more memory.


And, of course, it lets you write programs entirely in Linear B, and 
that's a marketing ploy not to be missed.


I've got this coded up and can submit a PR, but I thought I'd get 
feedback here first.


Does anyone see any horrible potential problems here?

Or is there an interestingly better option?

Does this need a DIP?


I'm not a native English speaker but I write all my public and private 
code in English. Anyone I work with, I will expect them and make sure 
they're writing the code in English as well. English is not enough 
either, it has to be American English.


Despite this I think that D should support as much of the Unicode as 
possible (including using Unicode for identifiers). It should not be up 
to the programming language to decide which language the developer 
should write the code in.


--
/Jacob Carlborg

Re: Updating D beyond Unicode 2.0

2018-09-25 Thread Dukc via Digitalmars-d

When I make code that I expect to be only used around here, I 
generally write the code itself in english but comments in my own 
language. I agree that in general, it's better to stick with 
english in identifiers when the programming language and the 
standard library is English.


On Tuesday, 25 September 2018 at 09:28:33 UTC, FeepingCreature 
wrote:

On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
In all seriousness I hate it when someone thought its funny to 
use the lambda symbol as an identifier and I have to copy that 
symbol whenever I want to use it because there's no convenient 
way to type it.

(This is already supported in D.)


I just want to chime in that I've definitely used greek letters 
in "ordinary" code - it's handy when writing math and feeling 
lazy.


On the other hand, Unicode identifiers till have their value IMO. 
The quote above is one reason for that -if there is a very 
specialized codebase it may be just inpractical to letterize 
everything.


Another reason is that something may not have a good translation 
to English. If there is an enum type listing city names, it is 
IMO better to write them as normal, using Unicode. 
CityName.seinäjoki, not CityName.seinaejoki.

Re: Updating D beyond Unicode 2.0

2018-09-25 Thread FeepingCreature via Digitalmars-d


On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:
In all seriousness I hate it when someone thought its funny to 
use the lambda symbol as an identifier and I have to copy that 
symbol whenever I want to use it because there's no convenient 
way to type it.

(This is already supported in D.)


I just want to chime in that I've definitely used greek letters 
in "ordinary" code - it's handy when writing math and feeling 
lazy.


Note that on Linux, with a simple configuration tweak (Windows 
key mapped to Compose, and 
https://gist.githubusercontent.com/zkat/6718053/raw/4535a2e2a988aa90937a69dbb8f10eb6a43b4010/.XCompose ), you can for instance type " l a m" to make the lambda symbol, or other greek letters very easily.

Re: Updating D beyond Unicode 2.0

2018-09-25 Thread Walter Bright via Digitalmars-d


On 9/23/2018 12:06 PM, Abdulhaq wrote:
The early history of computer science is completely dominated by cultures who 
use latin script based characters,


Small character sets are much more implementable on primitive systems like 
telegraphs and electro-mechanical ttys.


It wasn't even practical to display a rich character set until the early 1980's 
or so. There wasn't enough memory. Glass ttys at the time could barely, and I 
mean barely, display ASCII. I know because I designed and built one.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/24/18 3:18 PM, Patrick Schluter wrote:

On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer wrote:
2. There are no rules about what *encoding* is acceptable, it's 
implementation defined. So various compilers have different rules as 
to what will be accepted in the actual source code. In fact, I read 
somewhere that not even ASCII is guaranteed to be supported.


Indeed. IBM mainframes have C compilers too but not ASCII. They code in 
EBCDIC. That's why for instance it's not portable to do things like


  if(c >= 'A' && c <= 'Z') printf("CAPITAL LETTER\n");

is not true in EBCDIC.


Right. But it's just a side-note -- I'd guess all modern compilers 
support ASCII, and definitely ones that we would want to interoperate with.


Besides, that example is more concerned about *input data* encoding, not 
*source code* encoding. If the above is written in ASCII, then I would 
assume that the bytes in the source file are the ASCII bytes, and 
probably the IBM compilers would not know what to do with such files (it 
would all be gibberish if you opened on an EBCDIC editor). You'd first 
have to translate it to EBCDIC, which is a red flag that likely this 
isn't going to work :)


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Patrick Schluter via Digitalmars-d

On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
Schveighoffer wrote:
2. There are no rules about what *encoding* is acceptable, it's 
implementation defined. So various compilers have different 
rules as to what will be accepted in the actual source code. In 
fact, I read somewhere that not even ASCII is guaranteed to be 
supported.


Indeed. IBM mainframes have C compilers too but not ASCII. They 
code in EBCDIC. That's why for instance it's not portable to do 
things like


 if(c >= 'A' && c <= 'Z') printf("CAPITAL LETTER\n");

is not true in EBCDIC.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/24/18 2:20 PM, Martin Tschierschke wrote:

On Monday, 24 September 2018 at 14:34:21 UTC, Steven Schveighoffer wrote:

On 9/24/18 10:14 AM, Adam D. Ruppe wrote:
On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer 
wrote:
Part of the reason, which I haven't read here yet, is that all the 
keywords are in English.


Eh, those are kinda opaque sequences anyway, since the meanings 
aren't quite what the normal dictionary definition is anyway. Look up 
"int" in the dictionary... or "void", or even "string". They are just 
a handful of magic sequences we learn with the programming language. 
(And in languages like Rust, "fn", lol.)


Well, even on top of that, the standard library is full of English 
words that read very coherently when used together (if you understand 
English).


I can't imagine a long chain of English algorithms with some Chinese 
one pasted in the middle looks very good :) I suppose you could alias 
them all...



You might get really funny error messages.

 can't be casted to int.


Haha, it could be cynical as well

int can’t be casted to int樂

Oh, the games we could play.

-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Martin Tschierschke via Digitalmars-d

On Monday, 24 September 2018 at 14:34:21 UTC, Steven 
Schveighoffer wrote:

On 9/24/18 10:14 AM, Adam D. Ruppe wrote:
On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
Schveighoffer wrote:
Part of the reason, which I haven't read here yet, is that 
all the keywords are in English.


Eh, those are kinda opaque sequences anyway, since the 
meanings aren't quite what the normal dictionary definition is 
anyway. Look up "int" in the dictionary... or "void", or even 
"string". They are just a handful of magic sequences we learn 
with the programming language. (And in languages like Rust, 
"fn", lol.)


Well, even on top of that, the standard library is full of 
English words that read very coherently when used together (if 
you understand English).


I can't imagine a long chain of English algorithms with some 
Chinese one pasted in the middle looks very good :) I suppose 
you could alias them all...


-Steve

You might get really funny error messages.

 can't be casted to int.

:-)

And if you have to increment the number of cars you can write: 
++; This might give really funny looking programs!

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread 0xEAB via Digitalmars-d


On Monday, 24 September 2018 at 15:17:14 UTC, 0xEAB wrote:
Back then, when I coding C# in VS 2010 I was happy with the 
German error messages.


addendum: I've been using the English version since VS2017

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread 0xEAB via Digitalmars-d


On Sunday, 23 September 2018 at 20:49:39 UTC, Walter Bright wrote:
There's a reason why dmd doesn't have international error 
messages. My experience with it is that international users 
don't want it. They prefer the english messages.


I'm a native German speaker.
As for my part, I agree on this, indeed.


There are several reasons for this:
- Usually such translations are terrible, simply put.
- Uncontinuous translations [0]
- Non-idiomatic sentences that still sound like English somehow.
- Translations of tech terms [1]
- Non-idiomatic translations of tech terms [2]

However, well done translations might be quite nice at the 
beginning when learning programming. Back then, when I coding C# 
in VS 2010 I was happy with the German error messages. I'm not 
sure whether it was just delusion but I think it got worse with 
some later version, though.





[0] There's nothing worse than every single sentence being 
treated on its own during the translation process. At least 
that's what you'd often think when you face a longer error 
message. Usually you're confronted with non-linked and 
kindergarten-like sentences that don't seem to be meant to be put 
together. Often you'd think there were several translators. 
Favorite problem with this: 2 different terms for the same thing 
in two sentences.


[1] e.g. "integer type" -> "ganzzahliger Datentyp"
This just sounds weird. Anyone using "int" in their code knows 
what it means anyway...
Nevertheless, there are some common translations that are fine 
(primarily because they're common), e.g. "error" -> "Fehler"


[2] e.g. "assertion" -> "Assertionsfehler"
This particular one can be found in Windows 10 and is not even 
proper German.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/24/18 10:14 AM, Adam D. Ruppe wrote:

On Monday, 24 September 2018 at 13:26:14 UTC, Steven Schveighoffer wrote:
Part of the reason, which I haven't read here yet, is that all the 
keywords are in English.


Eh, those are kinda opaque sequences anyway, since the meanings aren't 
quite what the normal dictionary definition is anyway. Look up "int" in 
the dictionary... or "void", or even "string". They are just a handful 
of magic sequences we learn with the programming language. (And in 
languages like Rust, "fn", lol.)


Well, even on top of that, the standard library is full of English words 
that read very coherently when used together (if you understand English).


I can't imagine a long chain of English algorithms with some Chinese one 
pasted in the middle looks very good :) I suppose you could alias them 
all...


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Adam D. Ruppe via Digitalmars-d

On Monday, 24 September 2018 at 10:36:50 UTC, Jonathan M Davis 
wrote:
Given that the typical keyboard has none of those characters, 
maintaining code that used any of them would be a royal pain.


It is pretty easy to type them with a little keyboard config 
change, and like vim can pick those up from comments in the file 
even, though you have to train your fingers to know how to use it 
effectively too... but if you were maintaining something long 
term, you'd just do that.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Adam D. Ruppe via Digitalmars-d

On Monday, 24 September 2018 at 13:26:14 UTC, Steven 
Schveighoffer wrote:
Part of the reason, which I haven't read here yet, is that all 
the keywords are in English.


Eh, those are kinda opaque sequences anyway, since the meanings 
aren't quite what the normal dictionary definition is anyway. 
Look up "int" in the dictionary... or "void", or even "string". 
They are just a handful of magic sequences we learn with the 
programming language. (And in languages like Rust, "fn", lol.)


One group which I believe hasn't spoken up yet is the group 
making the hunt framework, whom I believe are all Chinese? At 
least their web site is.


I know they used a lot of my code as a starting point, and I, of 
course, wrote it in English, so that could have biased it a bit 
too. Though that might be a general point where you want to use 
these libraries and they are in a language.


Just even so, I still find it kinda hard to believe that 
everybody everywhere uses only English in all their code. Maybe 
our efforts should be going toward the Chinese market via natural 
language support instead of competing with Rust on computer 
language features :P


It would be good to hear from a group like that which has large 
experience writing mature D code (it appears all to be in 
English) and how they feel about the support.


definitely.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/22/18 12:56 PM, Neia Neutuladh wrote:

On Saturday, 22 September 2018 at 12:35:27 UTC, Steven Schveighoffer wrote:
But aren't we arguing about the wrong thing here? D already accepts 
non-ASCII identifiers.


Walter was doing that thing that people in the US who only speak English 
tend to do: forgetting that other people speak other languages, and that 
people who speak English can learn other languages to work with people 
who don't speak English.


I don't think he was doing that. I think what he was saying was, D tried 
to accommodate users who don't normally speak English, and they still 
use English (for the most part) for coding.


I'm actually surprised there isn't much code out there that is written 
with other identifiers besides ASCII, given that C99 supported them. I 
assumed it was because they weren't supported. Now I learn that they are 
supported, yet almost all C code I've ever seen is written in English. 
Perhaps that's just because I don't frequent foreign language sites 
though :) But many people here speak English as a second language, and 
vouch for their cultures still using English to write code.


He was saying it's inevitably a mistake to use 
non-ASCII characters in identifiers and that nobody does use them in 
practice.


I would expect people probably do try to use them in practice, it's just 
that the problems they run into aren't worth the effort 
(tool/environment support). But I have no first or even second hand 
experience with this. It does seem like Walter has a lot of experience 
with it though.


Walter talking like that sounds like he'd like to remove support for 
non-ASCII identifiers from the language. I've gotten by without 
maintaining a set of personal patches on top of DMD so far, and I'd like 
it if I didn't have to start.


I don't think he was saying that. I think he was against expanding 
support for further Unicode identifiers because the the first effort did 
not produce any measurable benefit. I'd be shocked from the recent 
positions of Walter and Andrei if they decided to remove non-ASCII 
identifiers that are currently supported, thereby breaking any existing 
code.


What languages need an upgrade to unicode symbol names? In other 
words, what symbols aren't possible with the current support?


Chinese and Japanese have gained about eleven thousand symbols since 
Unicode 2.


Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. Just 
updating to Unicode 3 would give us Cherokee, Ge'ez (multiple 
languages), Khmer (Cambodian), Mongolian, Burmese, Sinhala (Sri Lanka), 
Thaana (Maldivian), Canadian aboriginal syllabics, and Yi (Nuosu).


Very interesting! I would agree that we should at least add support for 
unicode symbols that are used in spoken languages, especially if we 
already have support for symbols that aren't ASCII already. I don't see 
the downside, especially if you can already use Unicode 2.0 symbols for 
identifiers (the ship has already sailed).


It could be a good incentive to get kids in countries where English 
isn't commonly spoken to try D out as a first programming language ;) 
Using your native language to show example code could be a huge benefit 
for teaching coding.


My recommendation is to put the PR up for review (that you said you had 
ready) and see what happens. Having an actual patch to talk about could 
change minds. At the very least, it's worth not wasting your efforts 
that you have already spent. Even if it does need a DIP, the PR can show 
that one less piece of effort is needed to get it implemented.


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/24/18 12:23 AM, Neia Neutuladh wrote:

On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:

On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard for what 
characters to allow. Do you want to update to match C11? It's been 
out for the better part of a decade, after all.


I wasn't aware it changed in C11.


http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDF 
numbering) or 504 (internal numbering).


Outside the BMP, almost everything is allowed, including many things 
that are not currently mapped to any Unicode value. Within the BMP, a 
heck of a lot of stuff is allowed, including a lot that D doesn't 
currently allow.


GCC hasn't even updated to the C99 standard here, as far as I can tell, 
but clang-5.0 is up to date.


I searched around for the current state of symbol names in C, and found 
some really crappy rules, though maybe this site isn't up to date?:


https://en.cppreference.com/w/c/language/identifier

What I understand from that is:

1. Yes, you can use any unicode character you want in C/C++ (seemingly 
since C99)
2. There are no rules about what *encoding* is acceptable, it's 
implementation defined. So various compilers have different rules as to 
what will be accepted in the actual source code. In fact, I read 
somewhere that not even ASCII is guaranteed to be supported.


The result being, that you have to write the identifiers with an ASCII 
escape sequence in order for it to be actually portable. Which to me, 
completely defeats the purpose of using such identifiers in the first place.


For example, on that page, they have a line that works in clang, not in 
GCC (tagged as implementation defined):


char * = "cat";

The portable version looks like this:

char *\U0001f431 = "cat";

Seriously, who wants to use that?

Now, D can potentially do better (especially when all front-ends are the 
same) and support such things in the spec, but I think the argument 
"because C supports it" is kind of bunk.


Or am I reading it wrong?

In any case, I would expect that symbol name support should be focused 
only on languages which people use, not emojis. If there are words in 
Chinese or Japanese that can't be expressed using D, while other words 
can, it would seem inconsistent to a Chinese or Japanese speaking user, 
and I think we should work to fix that. I just have no idea what the 
state of that is.


I also tend to agree that most code is going to be written in English, 
even when the primary language of the user is not. Part of the reason, 
which I haven't read here yet, is that all the keywords are in English. 
Someone has to kind of understand those to get the meaning of some 
constructs, and it's going to read strangely with the non-english words.


One group which I believe hasn't spoken up yet is the group making the 
hunt framework, whom I believe are all Chinese? At least their web site 
is. It would be good to hear from a group like that which has large 
experience writing mature D code (it appears all to be in English) and 
how they feel about the support.


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Steven Schveighoffer via Digitalmars-d


On 9/22/18 8:58 AM, Jonathan M Davis wrote:

On Saturday, September 22, 2018 6:37:09 AM MDT Steven Schveighoffer via
Digitalmars-d wrote:

On 9/22/18 4:52 AM, Jonathan M Davis wrote:

I was laughing out loud when reading about composing "family"
emojis with zero-width joiners. If you told me that was a tech
parody, I'd have believed it.


Honestly, I was horrified to find out that emojis were even in Unicode.
It makes no sense whatsover. Emojis are supposed to be sequences of
characters that can be interepreted as images. Treating them like
Unicode symbols is like treating entire words like Unicode symbols.
It's just plain stupid and a clear sign that Unicode has gone
completely off the rails (if it was ever on them). Unfortunately, it's
the best tool that we have for the job.

But aren't some (many?) Chinese/Japanese characters representing whole
words?


It's true that they're not characters in the sense that Roman characters are
characters, but they're still part of the alphabets for those languages.
Emojis are specifically formed from sequences of characters - e.g. :) is two
characters which are already expressible on their own. They're meant to
represent a smiley face, but it's a sequence of characters already. There's
no need whatsoever to represent anything extra Unicode. It's already enough
of a disaster that there are multiple ways to represent the same character
in Unicode without nonsense like emojis. It's stuff like this that really
makes me wish that we could come up with a new standard that would replace
Unicode, but that's likely a pipe dream at this point.


But there are tons of emojis that have nothing to do with sequences of 
characters. Like houses, or planes, or whatever. I don't even know what 
the sequences of characters are for them.


I think it started out like that, but turned into something else.

Either way, I can't imagine any benefit from using emojis in symbol names.

-Steve

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Dennis via Digitalmars-d

On Monday, 24 September 2018 at 10:36:50 UTC, Jonathan M Davis 
wrote:
Given that the typical keyboard has none of those characters, 
maintaining code that used any of them would be a royal pain.


Note that I'm not trying to argue either way, it's just that I 
used to think of Walter's stance on D and Unicode as:
"D would fully embrace Unicode if only editors/debuggers etc. 
would embrace it too"


But now I read:

D supports Unicode in identifiers because C and C++ do, and we 
want to be able to interoperate with them."


So I wonder what changed. I guess it's mostly answered in the 
first reply:


When I originally started with D, I thought non-ASCII 
identifiers with Unicode was a good idea. I've since slowly 
become less and less enthusiastic about it.

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Jonathan M Davis via Digitalmars-d

On Monday, September 24, 2018 4:19:31 AM MDT Dennis via Digitalmars-d wrote:
> On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote:
> > D the language is well suited to the development of Unicode
> > apps. D source code is another matter.
>
> But in the article you specifically talk about the use of Unicode
> in the context of source code instead of apps:
>
> "With the D programming language, we continuously run up against
> the problem that ASCII has reached its expressivity limits."
>
> "There are the chevrons « and » which serve as another set of
> brackets to lighten the overburdened ambiguities of ( ). There
> are the dot-product and cross-product characters · and × which
> would make lovely infix operator tokens for math libraries. The
> greek letters would be great for math variable names."

Given that the typical keyboard has none of those characters, maintaining
code that used any of them would be a royal pain. It's one thing if they're
used in the occasional string as data, but it's quite another if they're
used as identifiers or operators. I don't see how that would be at all
maintainable. You'd be forced to constantly copy and paste rather than type.

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-24 Thread Dennis via Digitalmars-d


On Monday, 24 September 2018 at 01:32:38 UTC, Walter Bright wrote:
D the language is well suited to the development of Unicode 
apps. D source code is another matter.


But in the article you specifically talk about the use of Unicode 
in the context of source code instead of apps:


"With the D programming language, we continuously run up against 
the problem that ASCII has reached its expressivity limits."


"There are the chevrons « and » which serve as another set of 
brackets to lighten the overburdened ambiguities of ( ). There 
are the dot-product and cross-product characters · and × which 
would make lovely infix operator tokens for math libraries. The 
greek letters would be great for math variable names."

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Neia Neutuladh via Digitalmars-d


On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:

On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard 
for what characters to allow. Do you want to update to match 
C11? It's been out for the better part of a decade, after all.


I wasn't aware it changed in C11.


http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 
522 (PDF numbering) or 504 (internal numbering).


Outside the BMP, almost everything is allowed, including many 
things that are not currently mapped to any Unicode value. Within 
the BMP, a heck of a lot of stuff is allowed, including a lot 
that D doesn't currently allow.


GCC hasn't even updated to the C99 standard here, as far as I can 
tell, but clang-5.0 is up to date.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Shachar Shemesh via Digitalmars-d


On 23/09/18 15:38, sarn wrote:

On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh wrote:

On 23/09/18 04:29, sarn wrote:

You can find a lot more Japanese D code on this blogging platform:
https://qiita.com/tags/dlang

Here's the most recent post to save you a click:
https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62


Comments in Japanese. Identifiers in English. Not advancing your 
point, I think.


Shachar


Well, I knew that when I posted, so I honestly have no idea what point 
you assumed I was making.


I don't know what point you were trying to make. That's precisely why I 
posted.


I don't think D currently or ever enforces what type of (legal UTF-8) 
text you could use in comments or strings. This thread is about what's 
legal to use in identifiers.


The example you brought does not use Unicode in identifiers, and is, 
therefor, irrelevant to the discussion we're having.


That was the point *I* was trying to make.

Shachar

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Walter Bright via Digitalmars-d


On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard for what characters 
to allow. Do you want to update to match C11? It's been out for the better part 
of a decade, after all.


I wasn't aware it changed in C11.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Walter Bright via Digitalmars-d


On 9/23/2018 6:06 PM, Dennis wrote:

Have you changed your mind since?


D the language is well suited to the development of Unicode apps. D source code 
is another matter.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Dennis via Digitalmars-d


On Sunday, 23 September 2018 at 21:12:13 UTC, Walter Bright wrote:
D supports Unicode in identifiers because C and C++ do, and we 
want to be able to interoperate with them. Extending Unicode 
identifier support off into other directions, especially ones 
that break such interoperability, is just doing a disservice to 
users.


I always thought D supported Unicode with the goal of going 
forward with it while C was stuck with ASCII:

http://www.drdobbs.com/cpp/time-for-unicode/228700405

"The D programming language has already driven stakes in the 
ground, saying it will not support 16 bit processors, processors 
that don't have 8 bit bytes, and processors with crippled, 
non-IEEE floating point. Is it time to drive another stake in and 
say the time for Unicode has come? "


Have you changed your mind since?

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Neia Neutuladh via Digitalmars-d


On Sunday, 23 September 2018 at 21:12:13 UTC, Walter Bright wrote:
D supports Unicode in identifiers because C and C++ do, and we 
want to be able to interoperate with them. Extending Unicode 
identifier support off into other directions, especially ones 
that break such interoperability, is just doing a disservice to 
users.


Okay, that's why you previously selected C99 as the standard for 
what characters to allow. Do you want to update to match C11? 
It's been out for the better part of a decade, after all.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Walter Bright via Digitalmars-d


On 9/22/2018 6:01 PM, Jonathan M Davis wrote:

For better or worse, English is the international language of science and
engineering, and that includes programming.
In the earlier days of D, I put on the web pages a google widget what would 
automatically translate the page into any language google supported. This was 
eventually removed (not by me) because nobody wanted it.


Nobody (besides me) even noticed it was removed. And the D community is a very 
international one.


Supporting Unicode in identifiers gives users a false sense that it's a good 
idea to use them. Lots of programming tools don't work well with Unicode. Even 
Windows doesn't by default - you've got to run "chcp 65001" each time you open a 
console window. Filesystems don't work reliably with Unicode. Heck, the reason 
module names should be lower case in D is because mixed case doesn't work 
reliably across filesystems.


D supports Unicode in identifiers because C and C++ do, and we want to be able 
to interoperate with them. Extending Unicode identifier support off into other 
directions, especially ones that break such interoperability, is just doing a 
disservice to users.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Walter Bright via Digitalmars-d


On 9/23/2018 9:52 AM, aliak wrote:
Not seeing identifiers in languages you don't program in or can read in is 
expected.


On the other hand, I've been programming for 40 years. I've customized my C++ 
compiler to emit error messages in various languages:


https://github.com/DigitalMars/Compiler/blob/master/dm/src/dmc/msgsx.c

I've implemented SHIFT-JIS encodings, along with .950 (Chinese) and .949 
(Korean) code pages in the C++ compiler.


I've worked in Japan writing software for Japanese companies.

I've sold compilers internationally for 30 years (mostly to Germany and Japan). 
I did the tech support, meaning I'd see their code.


---

There's a reason why dmd doesn't have international error messages. My 
experience with it is that international users don't want it. They prefer the 
english messages.


I'm sure if you look hard enough you'll find someone using non-ASCII characters 
in identifiers.


---

When I visited Remedy Games in Finland a few years back, I was surprised that 
everyone in the company was talking in english. I asked if they were doing that 
out of courtesy to me. They laughed, and said no, they talked in English because 
they came from all over the world, and english was the only language they had in 
common.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Abdulhaq via Digitalmars-d

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis
wrote:

Honestly, I was horrified to find out that emojis were even in
Unicode. It makes no sense whatsover. Emojis are supposed to be
sequences of characters that can be interepreted as images.
Treating them like Unicode symbols is like treating entire
words like Unicode symbols. It's just plain stupid and a clear
sign that Unicode has gone completely off the rails (if it was
ever on them). Unfortunately, it's the best tool that we have
for the job.

According to the Unicode website,
http://unicode.org/standard/WhatIsUnicode.html,

"""
Support of Unicode forms the foundation for the representation of
languages and symbols in all major operating systems, search
engines, browsers, laptops, and smart phones—plus the Internet
and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.)"""

Note, unicode supports symbols, not just characters.

The smiley face symbol predates its ':-)' usage in ascii text,
https://www.smithsonianmag.com/arts-culture/who-really-invented-the-smiley-face-2058483/. It's fundamentally a symbol, not a sequence of characters. Therefore it is not unreasonable for it to be encoded with a unicode number. I do agree though, of course, that it would seem bizarre to use an emoji as a D identifier.

The early history of computer science is completely dominated by
cultures who use latin script based characters, and hence, quiet
reasonably, text encoding and its automated visual representation
by compute based devices is dominated by the requirements of
latin script languages. However, the world keeps turning and,
despite DT's best efforts, China et al. look to become dominant.
Even if not China, the chances are that eventually a non-latin
script based language will become very important. Parochial views
like "all open source code should be in ASCII" will look silly.

However, until that time D developers have to spend their time
where it can be most useful. Hence the condition of whether to
apply Neia's patch / ideas or not mainly depends on how much
effort the donwstream effort will be (debuggers etc. as Walter
pointed out), and how much the gain is. As unicode 2.0 is already
supported I would take a guess that the vast majority of people
with access to a computer can already enter identifiers in D that
are rich enough for them. As Adam said though, it would be a good
idea to at least ask!

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread aliak via Digitalmars-d

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
If there was a contingent of Japanese or Chinese users doing 
that then surely they would speak up here or in Bugzilla to 
advocate for this feature?


https://forum.dlang.org/post/piwvbtetcwyxlaloc...@forum.dlang.org

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread aliak via Digitalmars-d


On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
When I originally started with D, I thought non-ASCII 
identifiers with Unicode was a good idea. I've since slowly 
become less and less enthusiastic about it.


First off, D source text simply must (and does) fully support 
Unicode in comments, characters, and string literals. That's 
not an issue.


But identifiers? I haven't seen hardly any use of non-ascii 
identifiers in C, C++, or D. In fact, I've seen zero use of it 
outside of test cases. I don't see much point in expanding the 
support of it. If people use such identifiers, the result would 
most likely be annoyance rather than illumination when people 
who don't know that language have to work on the code.


Not seeing identifiers in languages you don't program in or can 
read in is expected.


If it's supported it will be used:

Japanese Swift: 
https://speakerdeck.com/codelynx/programming-swift-in-japanese




Extending it further will also cause problems for all the tools 
that work with D object code, like debuggers, disassemblers, 
linkers, filesystems, etc.


Absent a much more compelling rationale for it, I'd say no.


More compelling than: "there're 6 billion people in this world 
who don't speak english?"


Allowing people to program in their own language while reducing 
the cognitive friction for people who want to learn programming 
in the majority of the world seems like a no-brainer thing to do.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread sarn via Digitalmars-d

On Sunday, 23 September 2018 at 06:53:21 UTC, Shachar Shemesh 
wrote:

On 23/09/18 04:29, sarn wrote:
You can find a lot more Japanese D code on this blogging 
platform:

https://qiita.com/tags/dlang

Here's the most recent post to save you a click:
https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62


Comments in Japanese. Identifiers in English. Not advancing 
your point, I think.


Shachar


Well, I knew that when I posted, so I honestly have no idea what 
point you assumed I was making.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Kagamin via Digitalmars-d


On Sunday, 23 September 2018 at 11:18:42 UTC, Ali Çehreli wrote:

Hence, non-Unicode is unacceptable in Turkish code


You even contributed to 
http://code.google.com/p/trileri/source/browse/trunk/tr/yazi.d

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Kagamin via Digitalmars-d


On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:

A: Wait. Using emojis as identifiers is not a good idea?
B: Yes.
A: But the cool kids are doing it:

https://codepen.io/andresgalante/pen/jbGqXj


It's not like we have a lot of good fonts (I know only one), and 
even fewer of them are suitable for code, and they can't be 
realistically expected to do everything, monospace fonts are even 
often ascii-only.

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Ali Çehreli via Digitalmars-d

On 09/22/2018 09:27 AM, Neia Neutuladh wrote:

> Logographic writing systems. There is one logographic writing system
> still in common use, and it's the standard writing system for Chinese
> and Japanese.

I had the misconception of each Chinese character meaning a word until I 
read "The Chinese Language, Fact and Fantasy" by John DeFrancis. One 
thing I learned was that Chinese is not purely logographic.

Ali

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Ali Çehreli via Digitalmars-d

On 09/21/2018 04:18 PM, Adam D. Ruppe wrote:

> Well, for example, with a Chinese company, they may very well find
> forced English identifiers to be an annoyance.

Fully aggreed but as far as I know, Turkish companies use English in 
source code.

Turkish alphabet is Latin based where dotted and undotted versions of 
Latin letters are distinct and  produce different meanings. Quick examples:

sık: dense (n), squeeze (v), ...
sik: penis (n), f*ck (v) [1]
şık: one of multiple choices (1), swanky (2)
döndür: return
dondur: make frozen
sök: disassemble, dismantle, ...
sok: insert, install, ...
şok: shock

Hence, non-Unicode is unacceptable in Turkish code unless we reserve 
programming to English speakers only, which is unacceptable because it 
would be exclusionary and would produce English identifiers that are 
frequently amusing. I've seen the latter in code of English learners. :)

Ali

[1] 
https://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

Re: Updating D beyond Unicode 2.0

2018-09-23 Thread Shachar Shemesh via Digitalmars-d


On 23/09/18 04:29, sarn wrote:

On Sunday, 23 September 2018 at 00:18:06 UTC, Adam D. Ruppe wrote:
I have seen Japanese D code before on twitter, but cannot find it now 
(surely because the search engines also share this bias).


You can find a lot more Japanese D code on this blogging platform:
https://qiita.com/tags/dlang

Here's the most recent post to save you a click:
https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62


Comments in Japanese. Identifiers in English. Not advancing your point, 
I think.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread sarn via Digitalmars-d


On Sunday, 23 September 2018 at 00:18:06 UTC, Adam D. Ruppe wrote:
I have seen Japanese D code before on twitter, but cannot find 
it now (surely because the search engines also share this bias).


You can find a lot more Japanese D code on this blogging platform:
https://qiita.com/tags/dlang

Here's the most recent post to save you a click:
https://qiita.com/ShigekiKarita/items/9b3aa8f716848278ef62

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread sarn via Digitalmars-d

On Saturday, 22 September 2018 at 12:37:09 UTC, Steven 
Schveighoffer wrote:
But aren't some (many?) Chinese/Japanese characters 
representing whole words?


-Steve


Kind of hair-splitting, but it's more accurate to say that some 
Chinese/Japanese words can be written with one character.  Like 
how English speakers wouldn't normally say that "A" and "I" are 
characters representing whole words.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Jonathan M Davis via Digitalmars-d

On Saturday, September 22, 2018 10:07:38 AM MDT Neia Neutuladh via 
Digitalmars-d wrote:
> On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis
>
> wrote:
> > Unicode identifiers may make sense in a code base that is going
> > to be used solely by a group of developers who speak a
> > particular language that uses a number a of non-ASCII
> > characters (especially languages like Chinese or Japanese), but
> > it has no business in any code that's intended for
> > international use. It just causes problems.
>
> You have a problem when you need to share a codebase between two
> organizations using different languages. "Just use ASCII" is not
> the solution. "Use a language that most developers in both
> organizations can use" is. That's *usually* going to be English,
> but not always. For instance, a Belorussian company doing
> outsourcing work for a Russian company might reasonably write
> code in Russian.
>
> If you're writing for a global audience, as most open source code
> is, you're usually going to use the most widely spoken language.

My point is that if your code base is definitely only going to be used
within a group of people who are using a keyboard that supports a Unicode
character that you want to use, then it's not necessarily a problem to use
it, but if you're writing code that may be seen or used by a general
audience (especially if it's going to be open source), then it needs to be
in ASCII, or it's a serious problem. Even if it's a character like lambda
that most everyone is going to understand, many, many programmers are not
going to be able type it on their keyboards, and that's going to cause
nothing but problems.

For better or worse, English is the international language of science and
engineering, and that includes programming. So, any programs that are
intended to be seen and used by the world at large need to be in ASCII. And
the biggest practical issue with that is whether a character is even on a
typical keyboard. Using a Unicode character in a program makes it so that
make programmers cannot type it. And even given the large breadth of Unicode
characters, you could even have a keyboard that supports a number of Unicode
characters and still not have the Unicode character in question. So, open
source programs need to be in ASCII.

Now, I don't know that it's a problem to support a wide range of Unicode
characters in identifiers when you consider the issues of folks whose native
language is not English (especially when it's a language like Chinese or
Japanese), but open source programs should only be using ASCII identifiers.
And unfortunately, sometimes, the fact that a language supports Unicode
identifiers has lead English speakers to do stupid things like use the
lambda character in identifiers. So, I can understand Walter's reticence to
go further with supporting Unicode identifiers, but on the other hand, when
you consider how many people there are on the planet who use a language that
doesn't even use the latin alphabet, it's arguably a good idea to fully
support Unicode identifiers.

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Adam D. Ruppe via Digitalmars-d

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
Nobody in this thread so far has said they are programming in 
non-ASCII.


This is the obvious observation bias I alluded to before: of 
course people who don't read and write English aren't in this 
thread, since they cannot read or write the English used in this 
thread! Ditto for bugzilla.


Absence of evidence CAN be evidence of absence... but not when 
the absence is so easily explained by our shared bias.


Neia Neutuladh posted one link. I have seen Japanese D code 
before on twitter, but cannot find it now (surely because the 
search engines also share this bias). Perhaps those are the only 
two examples in existence, but I stand by my belief that we must 
reach out to these other communities somehow and do a proper, 
proactive study before dismissing the possibility.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Neia Neutuladh via Digitalmars-d

On Saturday, 22 September 2018 at 19:59:42 UTC, Erik van Velzen 
wrote:
Nobody in this thread so far has said they are programming in 
non-ASCII.


I did. https://git.ikeran.org/dhasenan/muzikilo

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Erik van Velzen via Digitalmars-d

On Saturday, 22 September 2018 at 16:56:10 UTC, Neia Neutuladh 
wrote:
On Saturday, 22 September 2018 at 16:56:10 UTC, Neia Neutuladh 
wrote:


Walter was doing that thing that people in the US who only 
speak English tend to do: forgetting that other people speak 
other languages, and that people who speak English can learn 
other languages to work with people who don't speak English. He 
was saying it's inevitably a mistake to use non-ASCII 
characters in identifiers and that nobody does use them in 
practice.




There's a more charitable view and that's that even furriners 
usually use English identifiers.


Nobody in this thread so far has said they are programming in 
non-ASCII.


If there was a contingent of Japanese or Chinese users doing that 
then surely they would speak up here or in Bugzilla to advocate 
for this feature?

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Neia Neutuladh via Digitalmars-d

On Saturday, 22 September 2018 at 12:35:27 UTC, Steven 
Schveighoffer wrote:
But aren't we arguing about the wrong thing here? D already 
accepts non-ASCII identifiers.


Walter was doing that thing that people in the US who only speak 
English tend to do: forgetting that other people speak other 
languages, and that people who speak English can learn other 
languages to work with people who don't speak English. He was 
saying it's inevitably a mistake to use non-ASCII characters in 
identifiers and that nobody does use them in practice.


Walter talking like that sounds like he'd like to remove support 
for non-ASCII identifiers from the language. I've gotten by 
without maintaining a set of personal patches on top of DMD so 
far, and I'd like it if I didn't have to start.


What languages need an upgrade to unicode symbol names? In 
other words, what symbols aren't possible with the current 
support?


Chinese and Japanese have gained about eleven thousand symbols 
since Unicode 2.


Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. 
Just updating to Unicode 3 would give us Cherokee, Ge'ez 
(multiple languages), Khmer (Cambodian), Mongolian, Burmese, 
Sinhala (Sri Lanka), Thaana (Maldivian), Canadian aboriginal 
syllabics, and Yi (Nuosu).

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Neia Neutuladh via Digitalmars-d

On Saturday, 22 September 2018 at 12:24:49 UTC, Shachar Shemesh 
wrote:
If memory serves me right, hieroglyphs actually represent 
consonants (vowels are implicit), and as such, are most 
definitely "characters".


Egyptian hieroglyphics uses logographs (symbols representing 
whole words, which might be multiple syllables), letters, and 
determinants (which don't represent any word but disambiguate the 
surrounding words).


Looking things up serves me better than memory, usually.

The only language I can think of, off the top of my head, where 
words have distinct signs is sign language.


Logographic writing systems. There is one logographic writing 
system still in common use, and it's the standard writing system 
for Chinese and Japanese. That's about 1.4 billion people. It was 
used in Korea until hangul became popularized.


Unicode also aims to support writing systems that aren't used 
anymore. That means Mayan, cuneiform (several variants), Egyptian 
hieroglyphics and demotic script, several extinct variants on the 
Chinese writing system, and Luwian.


Sign languages generally don't have writing systems. They're also 
not generally related to any ambient spoken languages (for 
instance, American Sign Language is derived from French Sign 
Language), so if you speak sign language and can write, you're 
bilingual. Anyway, without writing systems, sign languages are 
irrelevant to Unicode.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Neia Neutuladh via Digitalmars-d

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis 
wrote:
Unicode identifiers may make sense in a code base that is going 
to be used solely by a group of developers who speak a 
particular language that uses a number a of non-ASCII 
characters (especially languages like Chinese or Japanese), but 
it has no business in any code that's intended for 
international use. It just causes problems.


You have a problem when you need to share a codebase between two 
organizations using different languages. "Just use ASCII" is not 
the solution. "Use a language that most developers in both 
organizations can use" is. That's *usually* going to be English, 
but not always. For instance, a Belorussian company doing 
outsourcing work for a Russian company might reasonably write 
code in Russian.


If you're writing for a global audience, as most open source code 
is, you're usually going to use the most widely spoken language.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Jonathan M Davis via Digitalmars-d

On Saturday, September 22, 2018 6:37:09 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
> On 9/22/18 4:52 AM, Jonathan M Davis wrote:
> >> I was laughing out loud when reading about composing "family"
> >> emojis with zero-width joiners. If you told me that was a tech
> >> parody, I'd have believed it.
> >
> > Honestly, I was horrified to find out that emojis were even in Unicode.
> > It makes no sense whatsover. Emojis are supposed to be sequences of
> > characters that can be interepreted as images. Treating them like
> > Unicode symbols is like treating entire words like Unicode symbols.
> > It's just plain stupid and a clear sign that Unicode has gone
> > completely off the rails (if it was ever on them). Unfortunately, it's
> > the best tool that we have for the job.
> But aren't some (many?) Chinese/Japanese characters representing whole
> words?

It's true that they're not characters in the sense that Roman characters are
characters, but they're still part of the alphabets for those languages.
Emojis are specifically formed from sequences of characters - e.g. :) is two
characters which are already expressible on their own. They're meant to
represent a smiley face, but it's a sequence of characters already. There's
no need whatsoever to represent anything extra Unicode. It's already enough
of a disaster that there are multiple ways to represent the same character
in Unicode without nonsense like emojis. It's stuff like this that really
makes me wish that we could come up with a new standard that would replace
Unicode, but that's likely a pipe dream at this point.

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Steven Schveighoffer via Digitalmars-d


On 9/22/18 4:52 AM, Jonathan M Davis wrote:

I was laughing out loud when reading about composing "family"
emojis with zero-width joiners. If you told me that was a tech
parody, I'd have believed it.


Honestly, I was horrified to find out that emojis were even in Unicode. It
makes no sense whatsover. Emojis are supposed to be sequences of characters
that can be interepreted as images. Treating them like Unicode symbols is
like treating entire words like Unicode symbols. It's just plain stupid and
a clear sign that Unicode has gone completely off the rails (if it was ever
on them). Unfortunately, it's the best tool that we have for the job.


But aren't some (many?) Chinese/Japanese characters representing whole 
words?


-Steve

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Steven Schveighoffer via Digitalmars-d


On 9/21/18 9:08 PM, Neia Neutuladh wrote:

On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
But identifiers? I haven't seen hardly any use of non-ascii 
identifiers in C, C++, or D. In fact, I've seen zero use of it outside 
of test cases. I don't see much point in expanding the support of it. 
If people use such identifiers, the result would most likely be 
annoyance rather than illumination when people who don't know that 
language have to work on the code.


you *do* know that not every codebase has people working on it who 
only know English, right?


If I took a software development job in China, I'd need to learn 
Chinese. I'd expect the codebase to be in Chinese. Because a Chinese 
company generally operates in Chinese, and they're likely to have a lot 
of employees who only speak Chinese.


And no, you can't just transcribe Chinese into ASCII.

Same for Spanish, Norwegian, German, Polish, Russian -- heck, it's 
almost easier to list out the languages you *don't* need non-ASCII 
characters for.


Anyway, here's some more D code using non-ASCII identifiers, in case you 
need examples: https://git.ikeran.org/dhasenan/muzikilo


But aren't we arguing about the wrong thing here? D already accepts 
non-ASCII identifiers. What languages need an upgrade to unicode symbol 
names? In other words, what symbols aren't possible with the current 
support?


Or maybe I'm misunderstanding something.

-Steve

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Shachar Shemesh via Digitalmars-d


On 22/09/18 15:13, Thomas Mader wrote:

Would you suggest to remove such writing systems out of Unicode?
What should a museum do which is in need of a software to somehow manage 
Egyptian hieroglyphs?


If memory serves me right, hieroglyphs actually represent consonants 
(vowels are implicit), and as such, are most definitely "characters".


The only language I can think of, off the top of my head, where words 
have distinct signs is sign language. It is a good question whether 
Unicode should include such a language (difficulty of representing 
motion in a font aside).


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Thomas Mader via Digitalmars-d

On Saturday, 22 September 2018 at 11:28:48 UTC, Jonathan M Davis 
wrote:
Unicode is supposed to be a universal way of representing every 
character in every language. Emojis are not characters. They 
are sequences of characters that people use to represent 
images. I do not understand how an argument can even be made 
that they belong in Unicode. As I said, it's exactly the same 
as arguing that words should be represented in Unicode. 
Unfortunately, however, at least some of them are in there. :|


At least since the incorporation of Emojis it's not supposed to 
be a universal way of representing characters anymore. :-)
Maybe there was a time when that was true I don't know but I 
think they see Unicode as a way to express all language symbols.
And Emojis is nothing else than a language were each symbol 
stands for an emotion/word/sentence.
If Unicode only allows languages with characters which are used 
to form words it's excluding languages which use other ways of 
expressing something.


Would you suggest to remove such writing systems out of Unicode?
What should a museum do which is in need of a software to somehow 
manage Egyptian hieroglyphs?


Unicode was made to support all sorts of writing systems and 
using multiple characters per word is just one system to form a 
writing system.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Shachar Shemesh via Digitalmars-d


On 22/09/18 14:28, Jonathan M Davis wrote:

As I said, it's exactly the same
as arguing that words should be represented in Unicode. Unfortunately,
however, at least some of them are in there. :|

- Jonathan M Davis


To be fair to them, that word is part of the "Arabic-representation 
forms" section. The "Presentation forms" sections are meant as backwards 
compatibility toward code points that existed before, and are not meant 
to be generated by Unicode aware applications.


Shachar

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Jonathan M Davis via Digitalmars-d

On Saturday, September 22, 2018 4:51:47 AM MDT Thomas Mader via Digitalmars-
d wrote:
> On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh
>
> wrote:
> > Thank Allah that someone said it before I had to. I could not
> > agree more. Encoding whole words as single Unicode code points
> > makes no sense.
>
> The goal of Unicode is to support diversity, if you argue against
> that you don't need Unicode at all.
> What you are saying is basically that you would remove Chinese
> too.
>
> Emojis are not my world either but it is an expression system /
> language.

Unicode is supposed to be a universal way of representing every character in
every language. Emojis are not characters. They are sequences of characters
that people use to represent images. I do not understand how an argument can
even be made that they belong in Unicode. As I said, it's exactly the same
as arguing that words should be represented in Unicode. Unfortunately,
however, at least some of them are in there. :|

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Thomas Mader via Digitalmars-d

On Saturday, 22 September 2018 at 10:24:48 UTC, Shachar Shemesh 
wrote:
Thank Allah that someone said it before I had to. I could not 
agree more. Encoding whole words as single Unicode code points 
makes no sense.


The goal of Unicode is to support diversity, if you argue against 
that you don't need Unicode at all.
What you are saying is basically that you would remove Chinese 
too.


Emojis are not my world either but it is an expression system / 
language.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Shachar Shemesh via Digitalmars-d


On 22/09/18 11:52, Jonathan M Davis wrote:


Honestly, I was horrified to find out that emojis were even in Unicode. It
makes no sense whatsover. Emojis are supposed to be sequences of characters
that can be interepreted as images. Treating them like Unicode symbols is
like treating entire words like Unicode symbols. It's just plain stupid and
a clear sign that Unicode has gone completely off the rails (if it was ever
on them). Unfortunately, it's the best tool that we have for the job.

- Jonathan M Davis


Thank Allah that someone said it before I had to. I could not agree 
more. Encoding whole words as single Unicode code points makes no sense.


U+FDF2

Shachar

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Thomas Mader via Digitalmars-d

On Saturday, 22 September 2018 at 01:08:26 UTC, Neia Neutuladh 
wrote:
...you *do* know that not every codebase has people working on 
it who only know English, right?


This topic boils down to diversity vs. productivity.

If supporting diversity in this case is questionable.

I work in a German speaking company and we have no developers who 
are not speaking German for now. In fact all are native speakers.

Still we write our code, comments and commit messages in English.
Even at university you learn that you should use English to code.

The reasoning is simple. You never know who will work on your 
code in the future.
If a company writes code in Chinese, they will have a hard time 
to expand the development of their codebase even though Chinese 
is spoken by that many people.


So even though you could use all sorts of characters, in a 
productive environment you better choose not to do so.

You might end up shooting yourself in the foot in the long run.

Diversity is important in other areas but I don't see much 
advantage here.
At least for now because the spoken languages of today don't 
differ tremendously in what they are capable of expressing.


This is also true for todays programming languages. Most of them 
are just different syntax for the very same ideas and concepts. 
That's not very helpful to bring people together and advance.


My understanding is that even life with it's great diversity just 
has one language (DNA) to define it.

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Jonathan M Davis via Digitalmars-d

On Friday, September 21, 2018 10:54:59 PM MDT Joakim via Digitalmars-d 
wrote:
> I'm torn. I completely agree with Adam and others that people
> should be able to use any language they want. But the Unicode
> spec is such a tire fire that I'm leery of extending support for
> it.

Unicode identifiers may make sense in a code base that is going to be used
solely by a group of developers who speak a particular language that uses a
number a of non-ASCII characters (especially languages like Chinese or
Japanese), but it has no business in any code that's intended for
international use. It just causes problems. At best, a particular, regional
keyboard may be able to handle a particular symbol, but most other keyboards
won't be able too. So, using that symbol causes problems for all of the
developers from other parts of the world even if those developers also have
Unicode symbols in their native languages.

> Someone linked this Swift chapter on Unicode handling in an
> earlier forum thread, read the section on emoji in particular:
>
> https://oleb.net/blog/2017/11/swift-4-strings/
>
> I was laughing out loud when reading about composing "family"
> emojis with zero-width joiners. If you told me that was a tech
> parody, I'd have believed it.

Honestly, I was horrified to find out that emojis were even in Unicode. It
makes no sense whatsover. Emojis are supposed to be sequences of characters
that can be interepreted as images. Treating them like Unicode symbols is
like treating entire words like Unicode symbols. It's just plain stupid and
a clear sign that Unicode has gone completely off the rails (if it was ever
on them). Unfortunately, it's the best tool that we have for the job.

- Jonathan M Davis

Re: Updating D beyond Unicode 2.0

2018-09-22 Thread Neia Neutuladh via Digitalmars-d


On Saturday, 22 September 2018 at 04:54:59 UTC, Joakim wrote:

To wit, Windows linker error with Unicode symbol:

https://github.com/ldc-developers/ldc/pull/2850#issuecomment-422968161


That's a good argument for sticking to ASCII for name mangling.

I'm torn. I completely agree with Adam and others that people 
should be able to use any language they want. But the Unicode 
spec is such a tire fire that I'm leery of extending support 
for it.


The compiler doesn't have to do much with Unicode processing, 
fortunately.

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Joakim via Digitalmars-d


On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
When I originally started with D, I thought non-ASCII 
identifiers with Unicode was a good idea. I've since slowly 
become less and less enthusiastic about it.


First off, D source text simply must (and does) fully support 
Unicode in comments, characters, and string literals. That's 
not an issue.


But identifiers? I haven't seen hardly any use of non-ascii 
identifiers in C, C++, or D. In fact, I've seen zero use of it 
outside of test cases. I don't see much point in expanding the 
support of it. If people use such identifiers, the result would 
most likely be annoyance rather than illumination when people 
who don't know that language have to work on the code.


Extending it further will also cause problems for all the tools 
that work with D object code, like debuggers, disassemblers, 
linkers, filesystems, etc.


To wit, Windows linker error with Unicode symbol:

https://github.com/ldc-developers/ldc/pull/2850#issuecomment-422968161


Absent a much more compelling rationale for it, I'd say no.


I'm torn. I completely agree with Adam and others that people 
should be able to use any language they want. But the Unicode 
spec is such a tire fire that I'm leery of extending support for 
it.


Someone linked this Swift chapter on Unicode handling in an 
earlier forum thread, read the section on emoji in particular:


https://oleb.net/blog/2017/11/swift-4-strings/

I was laughing out loud when reading about composing "family" 
emojis with zero-width joiners. If you told me that was a tech 
parody, I'd have believed it.


I believe Swift just punts their Unicode support to ICU, like 
most any other project these days. That's a horrible sign, that 
you've created a spec so grotesquely complicated that most 
everybody relies on a single project to not have to deal with it.

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread rikki cattermole via Digitalmars-d


On 22/09/2018 11:17 AM, Seb wrote:
In all seriousness I hate it when someone thought its funny to use the 
lambda symbol as an identifier and I have to copy that symbol whenever I 
want to use it because there's no convenient way to type it.

(This is already supported in D.)


This can be strongly mitigated by using a compose key. But they are not 
terribly common unfortunately.

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Neia Neutuladh via Digitalmars-d


On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
But identifiers? I haven't seen hardly any use of non-ascii 
identifiers in C, C++, or D. In fact, I've seen zero use of it 
outside of test cases. I don't see much point in expanding the 
support of it. If people use such identifiers, the result would 
most likely be annoyance rather than illumination when people 
who don't know that language have to work on the code.


...you *do* know that not every codebase has people working on it 
who only know English, right?


If I took a software development job in China, I'd need to learn 
Chinese. I'd expect the codebase to be in Chinese. Because a 
Chinese company generally operates in Chinese, and they're likely 
to have a lot of employees who only speak Chinese.


And no, you can't just transcribe Chinese into ASCII.

Same for Spanish, Norwegian, German, Polish, Russian -- heck, 
it's almost easier to list out the languages you *don't* need 
non-ASCII characters for.


Anyway, here's some more D code using non-ASCII identifiers, in 
case you need examples: https://git.ikeran.org/dhasenan/muzikilo

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Neia Neutuladh via Digitalmars-d


On Friday, 21 September 2018 at 23:17:42 UTC, Seb wrote:

A: Wait. Using emojis as identifiers is not a good idea?
B: Yes.
A: But the cool kids are doing it:


The C11 spec says that emoji should be allowed in identifiers 
(ISO publication N1570 page 504/522), so it's not just the cool 
kids.


I'm not in favor of emoji in identifiers.

In all seriousness I hate it when someone thought its funny to 
use the lambda symbol as an identifier and I have to copy that 
symbol whenever I want to use it because there's no convenient 
way to type it.


It's supported because λ is a letter in a language spoken by 
thirteen million people. I mean, would you want to have to name a 
variable "lumиnosиty" because someone got annoyed at people using 
"i" as a variable name?

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Adam D. Ruppe via Digitalmars-d


On Friday, 21 September 2018 at 20:25:54 UTC, Walter Bright wrote:
But identifiers? I haven't seen hardly any use of non-ascii 
identifiers in C, C++, or D. In fact, I've seen zero use of it 
outside of test cases.


Do you look at Japanese D code much? Or Turkish? Or Chinese?

I know there are decently sized D communities in those languages, 
and I am pretty sure I have seen identifiers in their languages 
before, but I can't find it right now.


Just there's a pretty clear potential for observation bias here. 
Even our search engine queries are going to be biased toward 
English-language results, so there can be a whole D world kinda 
invisible to you and I.


We should reach out and get solid stats before making a final 
decision.


most likely be annoyance rather than illumination when people 
who don't know that language have to work on the code.


Well, for example, with a Chinese company, they may very well 
find forced English identifiers to be an annoyance.

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Seb via Digitalmars-d

On Friday, 21 September 2018 at 23:00:45 UTC, Erik van Velzen 
wrote:

Agreed with Walter.

I'm all on board with i18n but I see no need for non-ascii 
identifiers.


Even identifiers with a non-latin origin are usually written in 
the latin script.


As for real-world usage I've seen Cyrillic identifiers a few 
times in PHP.


A: Wait. Using emojis as identifiers is not a good idea?
B: Yes.
A: But the cool kids are doing it:

https://codepen.io/andresgalante/pen/jbGqXj

In all seriousness I hate it when someone thought its funny to 
use the lambda symbol as an identifier and I have to copy that 
symbol whenever I want to use it because there's no convenient 
way to type it.

(This is already supported in D.)

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Erik van Velzen via Digitalmars-d


Agreed with Walter.

I'm all on board with i18n but I see no need for non-ascii 
identifiers.


Even identifiers with a non-latin origin are usually written in 
the latin script.


As for real-world usage I've seen Cyrillic identifiers a few 
times in PHP.

Re: Updating D beyond Unicode 2.0

2018-09-21 Thread Walter Bright via Digitalmars-d

When I originally started with D, I thought non-ASCII identifiers with Unicode 
was a good idea. I've since slowly become less and less enthusiastic about it.


First off, D source text simply must (and does) fully support Unicode in 
comments, characters, and string literals. That's not an issue.


But identifiers? I haven't seen hardly any use of non-ascii identifiers in C, 
C++, or D. In fact, I've seen zero use of it outside of test cases. I don't see 
much point in expanding the support of it. If people use such identifiers, the 
result would most likely be annoyance rather than illumination when people who 
don't know that language have to work on the code.


Extending it further will also cause problems for all the tools that work with D 
object code, like debuggers, disassemblers, linkers, filesystems, etc.


Absent a much more compelling rationale for it, I'd say no.

94 matches

Mail list logo